CN106024010A - Speech signal dynamic characteristic extraction method based on formant curves - Google Patents
Speech signal dynamic characteristic extraction method based on formant curves Download PDFInfo
- Publication number
- CN106024010A CN106024010A CN201610340935.3A CN201610340935A CN106024010A CN 106024010 A CN106024010 A CN 106024010A CN 201610340935 A CN201610340935 A CN 201610340935A CN 106024010 A CN106024010 A CN 106024010A
- Authority
- CN
- China
- Prior art keywords
- formant
- curve
- voice signal
- frame
- formant curve
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/15—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
Abstract
The invention provides a speech signal dynamic characteristic extraction method based on formant curves, belonging to the technical field of Chinese speech signal dynamic characteristic extraction. The method comprises the following steps: acquiring speech signals; carrying out preprocessingon the speech signals; extracting formant frequency characteristics of the speech signals; according to the sequence from the first frame to the last frame, combining the first formant frequency characteristic values of all the frames of preprocessed speech signals to obtain a first formant curve, and then obtaining a second formant curve, a third formant curve, and a fourth formant curve in the same manner; carrying out rapid Fourier transform on each obtained formant curve to obtain a linear frequency spectrum; obtaining an energy spectrum according to the linear frequency spectrum; obtaining logarithm energy according to the energy spectrum; and carrying out discrete cosine transform on the logarithm energy. Compared with the existing method, the method provided by the invention has the advantages that the speech signal dynamic characteristics are extracted, the temporal correlation is available, therefore, the close relevance before and after the speech signals and between the adjacent speech signals is disclosed, and the speech recognition property is improved.
Description
Technical field
The invention belongs to Chinese phonetic signal dynamics Feature Extraction Technology field, be specifically related to a kind of voice based on formant curve letter
Number dynamic feature extraction method.
Background technology
China's the Research of Speech Recognition work is started in the fifties, but until just starts to develop rapidly the seventies.The Chinese Academy of Sciences,
Tsing-Hua University, Deng Duojia research unit of Peking University are being engaged in the exploitation of Chinese speech recognition system, continuous to large vocabulary at present
The research of speech recognition system is already close to external top level;In the 8th Five-Year Plan for national economic and social development of China and " 863 " in the works, Chinese
The research of speech recognition has obtained supporting energetically, and National 863 " intelligent computer theme " expert group is exclusively for the Research of Speech Recognition
Project verification, simultaneously because grow with each passing day in China status in the world, and the critical role residing in terms of economy and market, Chinese
Speech recognition is the most increasingly paid attention to by foreign study mechanism and company, IBM, Microsoft, APPLE, Motorola, Intel,
The companies such as L&H set up research institution the most at home, in succession put in the exploitation of Chinese speech recognition system, promote forcefully
The development of Mandarin speech recognition research;
While it is true, it is far away apart from the real man-machine boundary freely exchanged;Present existing commercial system all also exists one
A little problems, such as the most not fully up to expectations for the phonetic recognization rate under noise circumstance and robustness etc.;
The most basic most important exploitation link of speech recognition is the extraction of phonic signal character parameter;As far back as the forties in last century,
R.K.Potter et al. proposes the concept of " Visible Speech ", it is indicated that sound spectrograph has the strongest descriptive power to voice signal,
And term spectrum information of trying carries out speech recognition, which forms phonetic feature the earliest.To the fifties, it has been found that
It is identified voice signal being necessary for from speech waveform extracting some parameter that can reflect characteristics of speech sounds, is so possible not only to
Reduce template number, operand and amount of storage, and redundancy useless in voice signal can be filtered, then occur as soon as
Amplitude, short time frame average energy, short time frame zero-crossing rate, in short-term autocorrelation coefficient etc..Along with the development of the technology of identification, Ren Menfa
Its stability of characteristic parameter and separating capacity in current territory are not the most fine, then start with frequency domain parameter as voice signal
Feature, such as pitch period, formant frequency, linear predictor coefficient (LPC), line spectrum pair (LSP), cepstrum coefficient etc., mesh
Front the most widely used characteristic parameter is MFCC cepstrum (MFCC) based on human auditory model;But these parameters are once
When being applied to noise circumstance, its performance can drastically decline;
And characteristic parameter suggested above all reflects the static nature of voice, the dynamic characteristic of voice signal refers to from the most several
The characteristic parameter extracted in frame voice, such as can be obtained by the differential parameter of static nature and acceleration parameter, but difference
Multidate information can not be excavated the most abundant by parameter and acceleration parameter, so they still can not reflect voice signal well
Dynamic characteristic.
Summary of the invention
For the deficiencies in the prior art, the present invention proposes a kind of voice signal dynamic feature extraction method based on formant curve,
To reach to expand application, the performance of raising speech recognition, realize fast and effeciently grasping the behavioral characteristics of signal and realize existing
The purpose of speech recognition technology is applied under strong noise environment.
A kind of voice signal dynamic feature extraction method based on formant curve, comprises the following steps:
Step 1, collection voice signal;
Step 2, voice signal is carried out pretreatment, including preemphasis, framing windowing and end-point detection;
Step 3, employing method based on Hibert-Huang conversion, carried out the formant frequency feature of voice signal after pretreatment
Estimation, it is thus achieved that every first formant eigenvalue of frame voice signal, the second formant eigenvalue, the 3rd formant eigenvalue and the
Four formant eigenvalues;
Step 4, composition formant curve, particularly as follows:
According to from the frame sequence of the first frame to last frame, the first formant eigenvalue of pretreated every frame voice signal is carried out
Combination obtains the first formant curve;
According to from the frame sequence of the first frame to last frame, the second formant eigenvalue of pretreated every frame voice signal is carried out
Combination obtains the second formant curve;
According to from the frame sequence of the first frame to last frame, the 3rd formant eigenvalue of pretreated every frame voice signal is carried out
Combination obtains the 3rd formant curve;
According to from the frame sequence of the first frame to last frame, the 4th formant eigenvalue of pretreated every frame voice signal is carried out
Combination obtains the 4th formant curve;
Step 5, to obtain the first formant curve, the second formant curve, the 3rd formant curve and the 4th formant curve
Carry out fast Fourier transform, it is thus achieved that the linear spectral of every formant curve;
Step 6, obtain the energy spectrum of every formant curve according to linear spectral;
Step 7, obtain the logarithmic energy of every formant curve according to energy spectrum;
Step 8, above-mentioned logarithmic energy is carried out discrete cosine transform obtain cepstral domains, i.e. obtain voice signal dynamic feature coefficient.
Described in step 2, voice signal is carried out pretreatment, including preemphasis, framing windowing and end-point detection, wherein,
Described preemphasis: realized by single order digital pre-emphasis filter, the coefficient value scope of preemphasis filter be 0.93~
0.97;
Described framing windowing: carry out framing with frame length 256, and the voice signal after framing is added Hamming window;
Described end-point detection: use short-time energy-zero-product method to detect.
The the first formant curve obtained, the second formant curve, the 3rd formant curve and the 4th are resonated described in step 5
Peak curve carries out fast Fourier transform, it is thus achieved that the linear spectral of every formant curve;
Concrete formula is as follows:
Wherein, XiK () represents the linear spectral that i-th formant curve obtains after carrying out fast Fourier transform;I=1,2,3,4;
K=0,1,2 ..., N-1, N are the frame number of voice signal;xiN () represents i-th formant curve;J is imaginary unit, and e is
Constant.
The discrete cosine transform that carries out above-mentioned logarithmic energy described in step 8 obtains cepstral domains, i.e. obtains voice signal the most special
Levy parameter;
Concrete publicity is as follows:
Wherein, CiT () represents the dynamic feature coefficient of i-th formant curve;I=1,2,3,4;T=1,2 ..., T, T represent and set
Fixed cepstrum coefficient number, span is 12~16;LiK () represents the logarithmic energy of i-th formant curve;
K=0,1,2 ..., N-1, N are the frame number of voice signal.
The invention have the advantages that
1, the voice signal dynamic feature coefficient that the present invention obtains is mainly used in the dictation machine of computer, and with telephone network or
The Speech information query and service system that the Internet combines, the most also can be applicable in miniaturization, portable speech production, as
The aspects such as dialing on wireless phone, the Voice command of automobile equipment, intelligent toy, household remote;
What 2, the present invention extracted is voice signal behavioral characteristics, and it has temporal correlation, before and after disclosing voice signal and phase
The close association existed between neighbour, compared to traditional MFCC method, substantially increases the performance of speech recognition;
3, the present invention uses method based on Hibert-Huang conversion to estimate pretreated Speech formant frequency feature,
Wherein one group of intrinsic mode function containing different scale (IMF) is become to divide signal decomposition by Empirical mode decomposition (EMD)
Amount, represents a frequency content through decomposing each the IMF component obtained, and these frequency contents can effectively highlight signal
Local characteristics and variations in detail, this will assist in the behavioral characteristics fast and effeciently grasping signal;
4, the present invention constitute formant curve there is temporal correlation, before and after disclosing voice signal and adjacent between also exist
Close association;This characteristic so that apply speech recognition technology to become possibility under strong noise environment.
Accompanying drawing explanation
Fig. 1 is the voice signal dynamic feature extraction method flow chart based on formant curve of an embodiment of the present invention;
Fig. 2 be an embodiment of the present invention white noise in the case of parameter recognition performance curve comparison diagram;
Fig. 3 be an embodiment of the present invention powder noise situations under parameter recognition performance curve comparison diagram;
Fig. 4 be an embodiment of the present invention street noise in the case of parameter recognition performance curve comparison diagram;
Fig. 5 be an embodiment of the present invention tank noise situations under parameter recognition performance curve comparison diagram.
Detailed description of the invention
Below in conjunction with the accompanying drawings an embodiment of the present invention is described further.
A kind of voice signal dynamic feature extraction method based on formant curve, method flow diagram is as it is shown in figure 1, include following step
Rapid:
Step 1, collection voice signal;
In the embodiment of the present invention, utilize mike to input speech data, and processed by computer, single-chip microcomputer or dsp chip etc. single
Unit carries out sample quantization with the sample frequency of 11.025KHz, the quantified precision of 16bit, it is thus achieved that corresponding voice signal;The present invention
Embodiment uses computer as processing unit;
Step 2, voice signal is carried out pretreatment, including preemphasis, framing windowing and end-point detection;
In the embodiment of the present invention, described preemphasis: realized by single order digital pre-emphasis filter, preemphasis filter be
Number span is 0.93~0.97, and in the embodiment of the present invention, value is 0.9375;Described framing windowing: with frame length 256
Point carries out framing, and the voice signal after framing is added Hamming window;Described end-point detection: use short-time energy-zero-product method to examine
Survey;
Step 3, employing method based on Hibert-Huang conversion, carried out the formant frequency feature of voice signal after pretreatment
Estimation, it is thus achieved that every first formant eigenvalue F1 of frame voice signal, the second formant eigenvalue F2, the 3rd formant eigenvalue
F3 and the 4th formant eigenvalue F4;
In the embodiment of the present invention, fast Fourier transform (FFT) each rank formant frequency of the voice signal gone out according to a preliminary estimate is true
Determine the parameter of respective band pass filters, and by this parameter, voice signal is made Filtering Processing, filtered signal is carried out Empirical Mode
State is decomposed (EMD) and is obtained family's intrinsic mode function (IMF), determines the IMF containing formant frequency by energy maximum principle,
The instantaneous frequency and the Hilbert that calculate this IMF compose the formant frequency parameter i.e. obtaining voice signal;
Step 4, composition formant curve, particularly as follows:
In the embodiment of the present invention, according to from the frame sequence of the first frame to last frame, by the first of pretreated every frame voice signal
Formant frequency eigenvalue F1 is combined obtaining the first formant curve x1(n), n=0,1,2 ..., N-1, N are voice signal
Frame number;According to from the frame sequence of the first frame to last frame, by special for the second formant frequency of pretreated every frame voice signal
Value indicative F2 is combined obtaining the second formant curve x2(n);According to from the frame sequence of the first frame to last frame, after pretreatment
The 3rd formant frequency eigenvalue F3 of every frame voice signal be combined obtaining the 3rd formant curve x3(n);According to from
4th formant frequency eigenvalue F4 of pretreated every frame voice signal, to the frame sequence of last frame, is combined obtaining by one frame
Obtain the 4th formant curve x4(n);
Step 5, to obtain the first formant curve, the second formant curve, the 3rd formant curve and the 4th formant curve
Carry out fast Fourier transform, it is thus achieved that the linear spectral of every formant curve;
In the embodiment of the present invention, concrete formula is as follows:
Wherein, XiK () represents the linear spectral that i-th formant curve obtains after carrying out fast Fourier transform;I=1,2,3,4;
K=0,1,2 ..., N-1, N are the frame number of voice signal;xiN () represents i-th formant curve;J is imaginary unit, and e is
Constant, approximation is 2.7;
Step 6, obtain the energy spectrum of every formant curve according to linear spectral;
In the embodiment of the present invention, take above-mentioned linear spectral Xi(k) mould square obtain corresponding energy spectrum Si(k), formula is as follows:
Si(k)=| Xi(k)|2 (3)
Wherein, SiK () represents the energy spectrum of i-th formant curve;
Step 7, obtain the logarithmic energy of every formant curve according to energy spectrum;
In the embodiment of the present invention, in order to make result have more preferable robustness to noise, by the energy spectrum S of above-mentioned acquisitioniK () is taken the logarithm,
Logarithmic energy L can be obtainedi(k), formula is as follows:
Li(k)=Log (Si(k)) (4)
Wherein, LiK () is the logarithmic energy of i-th formant curve;
Step 8, above-mentioned logarithmic energy is carried out discrete cosine transform obtain cepstral domains, i.e. obtain voice signal dynamic feature coefficient.
Concrete publicity is as follows:
Wherein, CiT () represents the dynamic feature coefficient of i-th formant curve;I=1,2,3,4;T=1,2 ..., T, T represent and set
Fixed cepstrum coefficient number, span is 12~16, and the embodiment of the present invention takes T=12;
In the embodiment of the present invention, use 50 typical Chinese words to remit and test;Owing to considering that identification system is easily by environment
The impact of the factors such as noise, channel variation and speaker's change, therefore, the training set of the embodiment of the present invention uses under quiet environment
Speech data, and test set uses containing noisy data;
In order to verify this feature parameter robustness to different speaker's changes, training set data is recorded into for twice by front and back, totally 50 people,
Everyone every word pronounces one time, obtains 5000 data altogether, and test set data are also to record at twice, totally 30 people, everyone every word
Pronounce one time, totally 3000 data;In order to verify the robustness that different channels is changed by this feature parameter, use different every time
Mike is recorded;In order to verify the robustness that varying environment noise is changed by this feature parameter, the embodiment of the present invention is at test set
Each voice in be manually adding to four kinds of noises, including white noise, powder noise, street noise, tank noise, constitute letter
Make an uproar than for 15dB, the noisy speech signal of 10dB, 5dB, 0dB ,-5dB.
Using the wavelet neural network improved based on genetic algorithm to be used as grader in the embodiment of the present invention, network input layer has 48
Individual neuron, output layer has 50 neurons, hidden layer node number to be determined by genetic algorithm;
In the embodiment of the present invention, Fig. 2, Fig. 3, Fig. 4 and Fig. 5 are to use the MFCC method with embodiment of the present invention the same terms
Bent with the embodiment of the present invention method system identification performance under white noise, powder noise, street noise and tank noise jamming respectively
Line;It can be seen that signal to noise ratio is relatively low when, embodiment of the present invention method discrimination compared with MFCC method carries
High a lot.
Claims (4)
1. a voice signal dynamic feature extraction method based on formant curve, it is characterised in that comprise the following steps:
Step 1, collection voice signal;
Step 2, voice signal is carried out pretreatment, including preemphasis, framing windowing and end-point detection;
Step 3, employing method based on Hibert-Huang conversion, enter the formant frequency feature of voice signal after pretreatment
Row estimation, it is thus achieved that every first formant eigenvalue of frame voice signal, the second formant eigenvalue, the 3rd formant eigenvalue and
4th formant eigenvalue;
Step 4, composition formant curve, particularly as follows:
According to from the frame sequence of the first frame to last frame, the first formant eigenvalue of pretreated every frame voice signal is carried out
Combination obtains the first formant curve;
According to from the frame sequence of the first frame to last frame, the second formant eigenvalue of pretreated every frame voice signal is carried out
Combination obtains the second formant curve;
According to from the frame sequence of the first frame to last frame, the 3rd formant eigenvalue of pretreated every frame voice signal is carried out
Combination obtains the 3rd formant curve;
According to from the frame sequence of the first frame to last frame, the 4th formant eigenvalue of pretreated every frame voice signal is carried out
Combination obtains the 4th formant curve;
Step 5, the first formant curve to obtaining, the second formant curve, the 3rd formant curve and the 4th formant are bent
Line carries out fast Fourier transform, it is thus achieved that the linear spectral of every formant curve;
Step 6, obtain the energy spectrum of every formant curve according to linear spectral;
Step 7, obtain the logarithmic energy of every formant curve according to energy spectrum;
Step 8, above-mentioned logarithmic energy is carried out discrete cosine transform obtain cepstral domains, i.e. obtain voice signal behavioral characteristics ginseng
Number.
Voice signal dynamic feature extraction method based on formant curve the most according to claim 1, it is characterised in that step
Described in rapid 2, voice signal is carried out pretreatment, including preemphasis, framing windowing and end-point detection, wherein,
Described preemphasis: realized by single order digital pre-emphasis filter, the coefficient value scope of preemphasis filter is
0.93~0.97;
Described framing windowing: carry out framing with frame length 256, and the voice signal after framing is added Hamming window;
Described end-point detection: use short-time energy-zero-product method to detect.
Voice signal dynamic feature extraction method based on formant curve the most according to claim 1, it is characterised in that step
The first formant curve, the second formant curve, the 3rd formant curve and the 4th formant curve to acquisition described in rapid 5
Carry out fast Fourier transform, it is thus achieved that the linear spectral of every formant curve;
Concrete formula is as follows:
Wherein, XiK () represents the linear spectral that i-th formant curve obtains after carrying out fast Fourier transform;
I=1,2,3,4;K=0,1,2 ..., N-1, N are the frame number of voice signal;xiN () represents i-th formant curve,
N=0,1,2 ..., N-1;J is imaginary unit, and e is constant.
Voice signal dynamic feature extraction method based on formant curve the most according to claim 1, it is characterised in that step
The discrete cosine transform that carries out above-mentioned logarithmic energy described in rapid 8 obtains cepstral domains, i.e. obtains voice signal behavioral characteristics ginseng
Number;
Concrete publicity is as follows:
Wherein, CiT () represents the dynamic feature coefficient of i-th formant curve;I=1,2,3,4;T=1,2 ..., T, T represent and set
Fixed cepstrum coefficient number, span is 12~16;LiK () represents the logarithmic energy of i-th formant curve;
K=0,1,2 ..., N-1, N are the frame number of voice signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610340935.3A CN106024010B (en) | 2016-05-19 | 2016-05-19 | A kind of voice signal dynamic feature extraction method based on formant curve |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610340935.3A CN106024010B (en) | 2016-05-19 | 2016-05-19 | A kind of voice signal dynamic feature extraction method based on formant curve |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106024010A true CN106024010A (en) | 2016-10-12 |
CN106024010B CN106024010B (en) | 2019-08-20 |
Family
ID=57095695
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610340935.3A Expired - Fee Related CN106024010B (en) | 2016-05-19 | 2016-05-19 | A kind of voice signal dynamic feature extraction method based on formant curve |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106024010B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106596002A (en) * | 2016-12-14 | 2017-04-26 | 东南大学 | High-speed railway steel truss arch bridge vehicle-bridge resonance curve measuring method |
CN108053842A (en) * | 2017-12-13 | 2018-05-18 | 电子科技大学 | Shortwave sound end detecting method based on image identification |
CN109410971A (en) * | 2018-11-13 | 2019-03-01 | 无锡冰河计算机科技发展有限公司 | A kind of method and apparatus for beautifying sound |
CN110135291A (en) * | 2019-04-29 | 2019-08-16 | 西北工业大学 | A kind of method for parameter estimation of Low SNR signal |
CN110663080A (en) * | 2017-02-13 | 2020-01-07 | 法国国家科研中心 | Method and apparatus for dynamically modifying the timbre of speech by frequency shifting of spectral envelope formants |
CN111726728A (en) * | 2020-06-30 | 2020-09-29 | 联想(北京)有限公司 | Resonance suppression method and device |
CN111899724A (en) * | 2020-08-06 | 2020-11-06 | 中国人民解放军空军预警学院 | Voice feature coefficient extraction method based on Hilbert-Huang transform and related equipment |
CN112966528A (en) * | 2021-03-01 | 2021-06-15 | 郑州铁路职业技术学院 | English voice translation fuzzy matching system |
CN114598565A (en) * | 2022-05-10 | 2022-06-07 | 深圳市发掘科技有限公司 | Kitchen electrical equipment remote control system and method and computer equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101067929A (en) * | 2007-06-05 | 2007-11-07 | 南京大学 | Method for enhancing and extracting phonetic resonance hump trace utilizing formant |
CN102231281A (en) * | 2011-07-18 | 2011-11-02 | 渤海大学 | Voice visualization method based on integration characteristic and neural network |
CN102820037A (en) * | 2012-07-21 | 2012-12-12 | 渤海大学 | Chinese initial and final visualization method based on combination feature |
CN102855408A (en) * | 2012-09-18 | 2013-01-02 | 福州大学 | ICA (independent component analysis)-based EMD (empirical mode decomposition) improvement process IMF (intrinsic mode function) judgment method |
CN103021405A (en) * | 2012-12-05 | 2013-04-03 | 渤海大学 | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter |
CN104835507A (en) * | 2015-03-30 | 2015-08-12 | 渤海大学 | Serial-parallel combined multi-mode emotion information fusion and identification method |
-
2016
- 2016-05-19 CN CN201610340935.3A patent/CN106024010B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101067929A (en) * | 2007-06-05 | 2007-11-07 | 南京大学 | Method for enhancing and extracting phonetic resonance hump trace utilizing formant |
CN102231281A (en) * | 2011-07-18 | 2011-11-02 | 渤海大学 | Voice visualization method based on integration characteristic and neural network |
CN102820037A (en) * | 2012-07-21 | 2012-12-12 | 渤海大学 | Chinese initial and final visualization method based on combination feature |
CN102855408A (en) * | 2012-09-18 | 2013-01-02 | 福州大学 | ICA (independent component analysis)-based EMD (empirical mode decomposition) improvement process IMF (intrinsic mode function) judgment method |
CN103021405A (en) * | 2012-12-05 | 2013-04-03 | 渤海大学 | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter |
CN104835507A (en) * | 2015-03-30 | 2015-08-12 | 渤海大学 | Serial-parallel combined multi-mode emotion information fusion and identification method |
Non-Patent Citations (4)
Title |
---|
乐莎莎: "基于HHT的咳嗽音识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
王洪海: "基于声学特征的自动语言辨识研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
莫家玲: "基于不变集多小波的语音特征参数提取研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
顾亚强: "非特定人语音识别关键技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106596002A (en) * | 2016-12-14 | 2017-04-26 | 东南大学 | High-speed railway steel truss arch bridge vehicle-bridge resonance curve measuring method |
CN110663080A (en) * | 2017-02-13 | 2020-01-07 | 法国国家科研中心 | Method and apparatus for dynamically modifying the timbre of speech by frequency shifting of spectral envelope formants |
CN108053842A (en) * | 2017-12-13 | 2018-05-18 | 电子科技大学 | Shortwave sound end detecting method based on image identification |
CN108053842B (en) * | 2017-12-13 | 2021-09-14 | 电子科技大学 | Short wave voice endpoint detection method based on image recognition |
CN109410971B (en) * | 2018-11-13 | 2021-08-31 | 无锡冰河计算机科技发展有限公司 | Method and device for beautifying sound |
CN109410971A (en) * | 2018-11-13 | 2019-03-01 | 无锡冰河计算机科技发展有限公司 | A kind of method and apparatus for beautifying sound |
CN110135291A (en) * | 2019-04-29 | 2019-08-16 | 西北工业大学 | A kind of method for parameter estimation of Low SNR signal |
CN110135291B (en) * | 2019-04-29 | 2023-03-24 | 西北工业大学 | Parameter estimation method for low signal-to-noise ratio signal |
CN111726728A (en) * | 2020-06-30 | 2020-09-29 | 联想(北京)有限公司 | Resonance suppression method and device |
CN111899724A (en) * | 2020-08-06 | 2020-11-06 | 中国人民解放军空军预警学院 | Voice feature coefficient extraction method based on Hilbert-Huang transform and related equipment |
CN112966528A (en) * | 2021-03-01 | 2021-06-15 | 郑州铁路职业技术学院 | English voice translation fuzzy matching system |
CN112966528B (en) * | 2021-03-01 | 2023-09-19 | 郑州铁路职业技术学院 | English speech translation fuzzy matching system |
CN114598565A (en) * | 2022-05-10 | 2022-06-07 | 深圳市发掘科技有限公司 | Kitchen electrical equipment remote control system and method and computer equipment |
Also Published As
Publication number | Publication date |
---|---|
CN106024010B (en) | 2019-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106024010B (en) | A kind of voice signal dynamic feature extraction method based on formant curve | |
CN103236260B (en) | Speech recognition system | |
CN102509547B (en) | Method and system for voiceprint recognition based on vector quantization based | |
CN103345923B (en) | A kind of phrase sound method for distinguishing speek person based on rarefaction representation | |
CN103310789B (en) | A kind of sound event recognition method of the parallel model combination based on improving | |
CN104900229A (en) | Method for extracting mixed characteristic parameters of voice signals | |
CN102968990B (en) | Speaker identifying method and system | |
CN102568476B (en) | Voice conversion method based on self-organizing feature map network cluster and radial basis network | |
CN103065629A (en) | Speech recognition system of humanoid robot | |
CN101226743A (en) | Method for recognizing speaker based on conversion of neutral and affection sound-groove model | |
CN104123933A (en) | Self-adaptive non-parallel training based voice conversion method | |
CN104183245A (en) | Method and device for recommending music stars with tones similar to those of singers | |
CN113012720B (en) | Depression detection method by multi-voice feature fusion under spectral subtraction noise reduction | |
CN108597505A (en) | Audio recognition method, device and terminal device | |
CN110136709A (en) | Audio recognition method and video conferencing system based on speech recognition | |
CN111192598A (en) | Voice enhancement method for jump connection deep neural network | |
CN109036458A (en) | A kind of multilingual scene analysis method based on audio frequency characteristics parameter | |
CN103021405A (en) | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter | |
CN106531174A (en) | Animal sound recognition method based on wavelet packet decomposition and spectrogram features | |
CN105679312A (en) | Phonetic feature processing method of voiceprint identification in noise environment | |
CN102237083A (en) | Portable interpretation system based on WinCE platform and language recognition method thereof | |
CN100543840C (en) | Method for distinguishing speek person based on emotion migration rule and voice correction | |
CN109192196A (en) | A kind of audio frequency characteristics selection method of the SVM classifier of anti-noise | |
CN106373559A (en) | Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting | |
CN110728991A (en) | Improved recording equipment identification algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190820 Termination date: 20200519 |
|
CF01 | Termination of patent right due to non-payment of annual fee |