KR101673221B1 - Apparatus for feature extraction in glottal flow signals for speaker recognition - Google Patents
Apparatus for feature extraction in glottal flow signals for speaker recognition Download PDFInfo
- Publication number
- KR101673221B1 KR101673221B1 KR1020150183988A KR20150183988A KR101673221B1 KR 101673221 B1 KR101673221 B1 KR 101673221B1 KR 1020150183988 A KR1020150183988 A KR 1020150183988A KR 20150183988 A KR20150183988 A KR 20150183988A KR 101673221 B1 KR101673221 B1 KR 101673221B1
- Authority
- KR
- South Korea
- Prior art keywords
- frequency
- input signal
- frequency response
- response spectrum
- unit
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
Abstract
Description
The present invention relates to an apparatus for extracting characteristic parameters of an input signal and an apparatus for recognizing a speaker using the apparatus.
Speech processing technology, which computer processes and understands human speech, is a promising technology that can be used in various fields. In particular, a speaker recognition technique for identifying a speaker based on the input voice may be used for identity verification in a security system or for user identification in an intelligent robot.
In general, speech recognition including speaker recognition extracts a feature vector from a speech input signal and compares it with previously stored data to recognize the information. However, the speech recognition rate of the present technology is limited to be commercialized in various fields, and among them, the recognition rate of the speaker recognition technology is not high, and continuous research and development is necessary.
It is an object of the present invention to provide a feature parameter extraction apparatus capable of improving the speaker recognition rate and a speaker recognition apparatus using the same.
An apparatus for extracting feature parameters according to an embodiment of the present invention includes: a preprocessor for preprocessing an input signal; A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency in a frequency response spectrum of the input signal; And a feature parameter calculator for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum.
The input signal may comprise a glottal signal obtained from a voice signal.
The preprocessing unit may include a line emphasis unit for compensating for a high frequency component lost in the process of generating the input signal.
The pre-processing unit may include a window function applying unit for applying a predetermined window function to the input signal.
The pre-processor may include: a frequency domain transformer for transforming the input signal in the time domain to the frequency response spectrum in the frequency domain.
Wherein the spectrum transformer comprises: a boundary frequency estimator for estimating the boundary frequency with respect to the input signal; And a high frequency band elimination unit for removing a frequency band higher than the estimated boundary frequency in the frequency response spectrum.
The boundary frequency estimator may model a log value of the frequency response spectrum as an exponential function and determine a frequency corresponding to a predetermined threshold logarithm in the modeled exponential function as the boundary frequency.
Wherein the boundary frequency estimator calculates the coefficient and the exponent that minimize the cost based on the difference between the log value of the frequency response spectrum and the exponential function model defined by the exponent and the exponent, A value obtained by dividing the log value of the value divided by the coefficient by the calculated exponent may be determined as the boundary frequency.
Wherein the spectrum transformer further comprises: a spectral resolution increasing unit for increasing a frequency domain resolution of the frequency response spectrum, wherein the high frequency band elimination unit comprises: Can be removed.
The spectral resolution increasing unit may up-sample the frequency response spectrum by a predetermined multiple in the frequency domain.
The spectrum transforming unit may further include a spectrum extending unit that extends the frequency band of the frequency response spectrum in which the frequency band higher than the estimated boundary frequency is removed.
The spectrum extension unit may perform interpolation based on a plurality of sample values included in a frequency response spectrum in which a frequency band higher than the estimated boundary frequency is removed.
Wherein the feature parameter calculator comprises: a mel-frequency response acquiring unit for acquiring a mel-frequency response by applying a mel-frequency filter to the modified frequency response spectrum; And a cepstral coefficient acquiring unit for performing inverse discrete cosine transform of the mel-frequency response to obtain a cepstrum coefficient.
A speaker recognition apparatus according to an embodiment of the present invention includes: a voice collection unit for collecting a voice of a speaker; A voice processing unit for processing the collected voice and discriminating whether or not it matches the voice of a previously registered user; And a storage unit for storing information on the voice of the user, wherein the voice processor comprises: a preprocessor for preprocessing an input signal; A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency in a frequency response spectrum of the input signal; And a feature parameter calculator for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum.
According to the embodiment of the present invention, the speech recognition rate can be increased in the speech processing, and in particular, the speaker recognition rate can be improved.
1 is an exemplary block diagram of a speaker recognition apparatus according to an embodiment of the present invention.
2 is an exemplary block diagram of a feature parameter extraction unit according to an embodiment of the present invention.
3 is an exemplary graph of a frequency response spectrum of an input signal from which a high frequency band is removed based on a boundary frequency and an exponential function model thereof according to an embodiment of the present invention.
FIG. 4 is an exemplary graph for explaining a process of calculating coefficients and exponents of an exponential function model according to an embodiment of the present invention.
5 is an exemplary graph illustrating a process of increasing the frequency domain resolution of a frequency response spectrum according to an embodiment of the present invention.
FIG. 6 is an exemplary graph for explaining a process of extending a frequency band of a frequency response spectrum according to an embodiment of the present invention. Referring to FIG.
7 is an exemplary flowchart of a feature parameter extraction method in accordance with an embodiment of the present invention.
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings attached hereto.
1 is an exemplary block diagram of a
1, the
The
The
The storage unit 130 stores information on the user's voice. According to an embodiment of the present invention, the storage unit 130 is a storage device capable of storing data or various programs. For example, the storage unit 130 may store various types of memories such as RAM, ROM, , Cache, and the like.
According to an embodiment of the present invention, the
2 is an exemplary block diagram of a feature
2, the feature
The pre-processor 1211 preprocesses the input signal. The
According to the embodiment of the present invention, the input signal processed by the feature
2, the
The
The window
Where N w is the frame size of the short-term input signal.
According to an embodiment of the present invention, the window
The
Here, N is the size of the discrete Fourier transform.
As described above, when the
Referring to FIG. 2, the
The boundary
3 is an exemplary graph of a frequency response spectrum of an input signal from which a high frequency band is removed based on a boundary frequency k c and an exponential function model thereof according to an embodiment of the present invention.
First, the
The boundary
For example, referring to FIG. 3, the
According to an embodiment of the present invention, the boundary
4 is an exemplary graph for explaining the process of calculating the coefficient A and the exponent alpha of the exponential function model Ae- alpha k according to an embodiment of the present invention.
According to an embodiment of the present invention, the boundary
For example, referring to FIG. 4, the boundary
Then, the boundary frequency estimator (12 121) is a coefficient A opt and the group frequency corresponding to the threshold value L TH logs set in the exponential function e -α A opt optk modeled by index α opt can be determined by the transition frequency k c have.
According to an embodiment of the invention, the boundary frequency estimator (12 121) is the critical log values after dividing the L TH to the calculated coefficients A opt, divided by the factor α opt the calculation by taking the logarithm of the transition frequency k c Can be calculated. In other words, the boundary frequency k c can be calculated by the following equation.
3, the high
The present inventor has found that a somewhat flat portion of the high frequency band in the frequency response of the short-term speech signal g (n) does not greatly contribute to improving discrimination in speaker recognition. Thus, embodiments of the present invention estimate a boundary frequency k c for identifying an inclined portion and a flat portion of the high frequency of the low frequency from the logarithm L (k) of the frequency response spectrum, and high than that, based on the boundary frequency k c We propose a technique to extract the feature parameters of the input signal based on the lower frequency band. As described above, when the speaker recognition is performed using the extracted feature parameters according to the embodiment of the present invention, the speaker recognition rate can be greatly improved.
2, the
As described above, when the feature parameter is extracted using only the low frequency band lower than the boundary frequency k c , it may be difficult to extract the feature parameter that preferably characterizes the input signal because the frequency domain resolution of the spectrum is low.
Therefore, this embodiment can further increase the frequency domain resolution by a predetermined multiple before removing the high frequency band in the frequency response spectrum by further including the spectral
5 is an exemplary graph illustrating a process of increasing the frequency domain resolution of a frequency response spectrum according to an embodiment of the present invention.
According to an embodiment of the present invention, the spectral
For example, referring to FIG. 5, the spectral
When the
2, the
As described above, when the spectrum is transformed by removing the high frequency band based on the boundary frequency k c after increasing the frequency domain resolution of the frequency response spectrum, the modified spectrum has a frequency response value at the frequency index k, not an integer .
Accordingly, this embodiment may further include the
FIG. 6 is an exemplary graph for explaining a process of extending a frequency band of a frequency response spectrum according to an embodiment of the present invention. Referring to FIG.
According to an embodiment of the present invention, the
For example, as shown in FIG. 6, the
K 0 and k 1 used in the above equation can be defined as the following equation.
Here, the k c, 2 = 2k c.
As described above, if the
2, the
The mel-frequency
The mel-frequency
Here, G 'MEL (m) is the m-th mel - the frequency response, FB (m) (k) is the m-th Mel - a k-th response of the frequency filter bank, k 1 (m) and k 2 (m) is Is the start frequency index and end frequency index of the mth M-frequency filter bank, respectively.
Then, the cepstral
Thereafter, the cepstral
Where M is the number of mel-filter banks and D is the order of the cepstrum.
The thus calculated cepstral coefficients c SMFCC and LP (tau) can be used as feature parameters of the input signal in the
According to an embodiment of the present invention, the
In addition, according to an embodiment of the present invention, the
7 is an exemplary flow diagram of a feature
The feature
7, the characteristic
The step of pre-processing the input signal g (n) (S210) may include a line emphasis step of compensating a high frequency component lost in the process of generating the input signal g (n).
Step S210 of preprocessing the input signal g (n) may include applying a predetermined window function w (n) to the input signal g (n).
Pre-processing the input signal g (n) (S210) may include converting the time domain input signal g (n) into a frequency domain frequency response spectrum G (k).
The step S220 of modifying the frequency response spectrum may include estimating a boundary frequency k c for the input signal and removing a frequency band higher than the estimated boundary frequency k c in the frequency response spectrum .
According to an embodiment of the present invention, the step of estimating the boundary frequency k c includes modeling the logarithm L (k) of the frequency response spectrum as an exponential function Ae -? K , and modeling the exponential function A opt e - ? and determining the frequency corresponding to the predetermined threshold logarithmic value L TH at the optical frequency as the boundary frequency k c .
For example, the phase frequency response spectrum of the logarithm L (k) and coefficients A and index α exponential model coefficients Ae difference accrues at least based on the between -αk defined by estimating the boundary frequency k c A opt and an index alpha opt , and determining a value obtained by dividing the log value of the value obtained by dividing the critical log value L TH by the calculated coefficient A opt by the calculated index alpha opt as a boundary frequency k c . ≪ / RTI >
Further, the step (S220) of modifying the frequency response spectrum may further include the step of increasing the frequency domain resolution prior to the step of removing higher frequency band than a threshold frequency k c in the frequency response spectrum, the frequency response spectrum. In this case, the step of removing a higher frequency band than the boundary frequency k c may include the step of removing higher frequency band than a threshold frequency k c in the frequency response spectrum L 2 (k) of the frequency domain resolution increases.
Here, increasing the frequency domain resolution of the frequency response spectrum may include upsampling the frequency response spectrum by a predetermined multiple in the frequency domain.
Step (S220) and further, modifying the frequency response of the spectrum can further include the step of expanding the frequency band of the frequency response spectrum to remove higher frequency band than a threshold frequency k c.
Here, the step may include a step for performing interpolation based on a plurality of sample values included in the boundary frequency k c a frequency response spectrum to remove higher frequency band than the frequency band to expand.
The step S230 of calculating the feature parameter may include applying a mel-frequency filter to the modified frequency response spectrum L '(k) to obtain a mel-frequency response G' MEL (m) And G ' MEL (m) to obtain the cepstral coefficients c SMFCC , LP (tau).
The feature
While the present invention has been described with reference to the exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. Those skilled in the art will appreciate that various modifications may be made to the embodiments described above. The scope of the present invention is defined only by the interpretation of the appended claims.
10: Speaker recognition device
110:
120:
121: Feature parameter extracting unit
130:
1211:
1212: Spectral deformation part
1213: Feature parameter calculating section
12111:
12112: Window function application part
12113: Frequency domain transform unit
12121:
12122: Spectral resolution increasing unit
12123: High frequency band elimination
12124: Spectrum Expander
12131: mel-frequency response acquisition unit
12132: Capstrum coefficient acquisition unit
Claims (14)
A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency determined based on the input signal in a frequency response spectrum of the input signal; And
A feature parameter calculating unit for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum;
Lt; / RTI >
The spectral transformations comprise:
A boundary frequency estimator for estimating the boundary frequency with respect to the input signal; And
A high frequency band eliminating unit for removing a frequency band higher than the estimated boundary frequency in the frequency response spectrum;
Lt; / RTI >
Wherein the boundary frequency estimator comprises:
Modeling the log value of the frequency response spectrum as an exponential function,
And determines a frequency corresponding to a predetermined threshold logarithm in the modeled exponential function as the boundary frequency.
Wherein the input signal includes a glottal signal obtained from a voice signal.
Wherein the pre-
And a line emphasis unit for compensating a high frequency component lost in the process of generating the input signal.
Wherein the pre-
And a window function applying unit for applying a predetermined window function to the input signal.
Wherein the pre-
And a frequency domain transformer for transforming the input signal in the time domain into the frequency response spectrum in the frequency domain.
Wherein the boundary frequency estimator comprises:
Estimating the coefficient and the exponent having a minimum cost based on a difference between a log value of the frequency response spectrum and an exponential function model defined by a coefficient and an exponent,
And a log value of a value obtained by dividing the critical log value by the estimated coefficient is divided by the calculated index as the boundary frequency.
A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency in a frequency response spectrum of the input signal; And
And a feature parameter calculation unit for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum,
The spectral transformations comprise:
A boundary frequency estimator for estimating the boundary frequency with respect to the input signal; And
A high frequency band eliminating unit for removing a frequency band higher than the estimated boundary frequency in the frequency response spectrum;
Lt; / RTI >
The spectral transformations comprise:
And a spectral resolution increasing unit for increasing a frequency domain resolution of the frequency response spectrum,
Wherein the high frequency band elimination unit comprises:
And removes a frequency band higher than the estimated boundary frequency in the frequency response spectrum in which the frequency domain resolution is increased.
The spectral resolution increasing unit includes:
And upsamples the frequency response spectrum by a predetermined multiple in the frequency domain.
A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency in a frequency response spectrum of the input signal; And
And a feature parameter calculation unit for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum,
The spectral transformations comprise:
A boundary frequency estimator for estimating the boundary frequency with respect to the input signal; And
A high frequency band eliminating unit for removing a frequency band higher than the estimated boundary frequency in the frequency response spectrum;
Lt; / RTI >
The spectral transformations comprise:
And a spectrum expanding unit for expanding a frequency band of a frequency response spectrum in which a frequency band higher than the estimated boundary frequency is removed.
Wherein the spectrum extension comprises:
Wherein the interpolation is performed based on a plurality of sample values included in a frequency response spectrum in which a frequency band higher than the estimated boundary frequency is removed.
Wherein the feature parameter calculator comprises:
A mel-frequency response acquiring unit for acquiring a mel-frequency response by applying a Mel-Frequency filter to the modified frequency response spectrum; And
A cepstral coefficient acquiring unit for performing inverse discrete cosine transform of the Mel-frequency response to obtain a cepstrum coefficient;
And a characteristic parameter extracting unit.
A voice processing unit for processing the collected voice and discriminating whether or not it matches the voice of a previously registered user; And
And a storage unit for storing information on the voice of the user,
Wherein the voice processing unit comprises:
A preprocessor for preprocessing an input signal;
A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency determined based on the input signal in a frequency response spectrum of the input signal; And
A feature parameter calculating unit for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum;
/ RTI >
The spectral transformations comprise:
A boundary frequency estimator for estimating the boundary frequency with respect to the input signal; And
A high frequency band eliminating unit for removing a frequency band higher than the estimated boundary frequency in the frequency response spectrum;
Lt; / RTI >
Wherein the boundary frequency estimator comprises:
Modeling the log value of the frequency response spectrum as an exponential function,
And determines a frequency corresponding to a preset threshold logarithm in the modeled exponential function as the boundary frequency.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150183988A KR101673221B1 (en) | 2015-12-22 | 2015-12-22 | Apparatus for feature extraction in glottal flow signals for speaker recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020150183988A KR101673221B1 (en) | 2015-12-22 | 2015-12-22 | Apparatus for feature extraction in glottal flow signals for speaker recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
KR101673221B1 true KR101673221B1 (en) | 2016-11-07 |
Family
ID=57529852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020150183988A KR101673221B1 (en) | 2015-12-22 | 2015-12-22 | Apparatus for feature extraction in glottal flow signals for speaker recognition |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR101673221B1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR970012285A (en) * | 1995-08-26 | 1997-03-29 | 김광호 | Pitch detection method of voice signal |
JP2006189799A (en) * | 2004-12-31 | 2006-07-20 | Taida Electronic Ind Co Ltd | Voice inputting method and device for selectable voice pattern |
-
2015
- 2015-12-22 KR KR1020150183988A patent/KR101673221B1/en active IP Right Grant
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR970012285A (en) * | 1995-08-26 | 1997-03-29 | 김광호 | Pitch detection method of voice signal |
JP2006189799A (en) * | 2004-12-31 | 2006-07-20 | Taida Electronic Ind Co Ltd | Voice inputting method and device for selectable voice pattern |
Non-Patent Citations (1)
Title |
---|
강지훈, 정상배, ‘유/무성음 구분 및 이중적 특징파라미터 결합을 이용한 화자인식 성능 개선’, 한국정보통신학회논문지, Vol.18, No.6, pp.1294~1301, June 2014.* * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106971741B (en) | Method and system for voice noise reduction for separating voice in real time | |
CN106935248B (en) | Voice similarity detection method and device | |
CN108281146B (en) | Short voice speaker identification method and device | |
US9224392B2 (en) | Audio signal processing apparatus and audio signal processing method | |
CN112053695A (en) | Voiceprint recognition method and device, electronic equipment and storage medium | |
CN109767756B (en) | Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient | |
Kumar et al. | Analysis of MFCC and BFCC in a speaker identification system | |
CN113327626B (en) | Voice noise reduction method, device, equipment and storage medium | |
CN108922543B (en) | Model base establishing method, voice recognition method, device, equipment and medium | |
Nasr et al. | Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients | |
WO2019232826A1 (en) | I-vector extraction method, speaker recognition method and apparatus, device, and medium | |
CN109147798B (en) | Speech recognition method, device, electronic equipment and readable storage medium | |
KR20160102815A (en) | Robust audio signal processing apparatus and method for noise | |
CN110942766A (en) | Audio event detection method, system, mobile terminal and storage medium | |
KR100893123B1 (en) | Method and apparatus for generating audio fingerprint data and comparing audio data using the same | |
KR100897555B1 (en) | Apparatus and method of extracting speech feature vectors and speech recognition system and method employing the same | |
US7966179B2 (en) | Method and apparatus for detecting voice region | |
KR101671305B1 (en) | Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same | |
CN112466276A (en) | Speech synthesis system training method and device and readable storage medium | |
Mu et al. | MFCC as features for speaker classification using machine learning | |
KR101673221B1 (en) | Apparatus for feature extraction in glottal flow signals for speaker recognition | |
CN110197657A (en) | A kind of dynamic speech feature extracting method based on cosine similarity | |
CN111402898B (en) | Audio signal processing method, device, equipment and storage medium | |
Roy et al. | A hybrid VQ-GMM approach for identifying Indian languages | |
Ghezaiel et al. | Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A201 | Request for examination | ||
GRNT | Written decision to grant | ||
FPAY | Annual fee payment |
Payment date: 20191029 Year of fee payment: 4 |