KR101673221B1 - Apparatus for feature extraction in glottal flow signals for speaker recognition - Google Patents

Apparatus for feature extraction in glottal flow signals for speaker recognition Download PDF

Info

Publication number
KR101673221B1
KR101673221B1 KR1020150183988A KR20150183988A KR101673221B1 KR 101673221 B1 KR101673221 B1 KR 101673221B1 KR 1020150183988 A KR1020150183988 A KR 1020150183988A KR 20150183988 A KR20150183988 A KR 20150183988A KR 101673221 B1 KR101673221 B1 KR 101673221B1
Authority
KR
South Korea
Prior art keywords
frequency
input signal
frequency response
response spectrum
unit
Prior art date
Application number
KR1020150183988A
Other languages
Korean (ko)
Inventor
정상배
김영일
강지훈
Original Assignee
경상대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 경상대학교 산학협력단 filed Critical 경상대학교 산학협력단
Priority to KR1020150183988A priority Critical patent/KR101673221B1/en
Application granted granted Critical
Publication of KR101673221B1 publication Critical patent/KR101673221B1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Abstract

The present invention relates to an apparatus to extract a feature parameter of an input signal, and a speaker recognition apparatus using the same. According to an embodiment of the present invention, a feature parameter extracting apparatus may comprise: a preprocessor to preprocess an input signal; a spectrum transforming part to transform a frequency response spectrum by removing a high frequency flat band higher than a boundary frequency in the frequency response spectrum of the input signal; and a feature parameter calculating part to calculate a feature parameter which characterizes the input signal based on the transformed frequency response spectrum. As such, a speaker recognition rate is able to be improved.

Description

[0001] APPARATUS FOR FEATURE EXTRACTION IN GLOTTAL FLOW SIGNALS FOR SPEAKER RECOGNITION [0002]

The present invention relates to an apparatus for extracting characteristic parameters of an input signal and an apparatus for recognizing a speaker using the apparatus.

Speech processing technology, which computer processes and understands human speech, is a promising technology that can be used in various fields. In particular, a speaker recognition technique for identifying a speaker based on the input voice may be used for identity verification in a security system or for user identification in an intelligent robot.

In general, speech recognition including speaker recognition extracts a feature vector from a speech input signal and compares it with previously stored data to recognize the information. However, the speech recognition rate of the present technology is limited to be commercialized in various fields, and among them, the recognition rate of the speaker recognition technology is not high, and continuous research and development is necessary.

It is an object of the present invention to provide a feature parameter extraction apparatus capable of improving the speaker recognition rate and a speaker recognition apparatus using the same.

An apparatus for extracting feature parameters according to an embodiment of the present invention includes: a preprocessor for preprocessing an input signal; A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency in a frequency response spectrum of the input signal; And a feature parameter calculator for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum.

The input signal may comprise a glottal signal obtained from a voice signal.

The preprocessing unit may include a line emphasis unit for compensating for a high frequency component lost in the process of generating the input signal.

The pre-processing unit may include a window function applying unit for applying a predetermined window function to the input signal.

The pre-processor may include: a frequency domain transformer for transforming the input signal in the time domain to the frequency response spectrum in the frequency domain.

Wherein the spectrum transformer comprises: a boundary frequency estimator for estimating the boundary frequency with respect to the input signal; And a high frequency band elimination unit for removing a frequency band higher than the estimated boundary frequency in the frequency response spectrum.

The boundary frequency estimator may model a log value of the frequency response spectrum as an exponential function and determine a frequency corresponding to a predetermined threshold logarithm in the modeled exponential function as the boundary frequency.

Wherein the boundary frequency estimator calculates the coefficient and the exponent that minimize the cost based on the difference between the log value of the frequency response spectrum and the exponential function model defined by the exponent and the exponent, A value obtained by dividing the log value of the value divided by the coefficient by the calculated exponent may be determined as the boundary frequency.

Wherein the spectrum transformer further comprises: a spectral resolution increasing unit for increasing a frequency domain resolution of the frequency response spectrum, wherein the high frequency band elimination unit comprises: Can be removed.

The spectral resolution increasing unit may up-sample the frequency response spectrum by a predetermined multiple in the frequency domain.

The spectrum transforming unit may further include a spectrum extending unit that extends the frequency band of the frequency response spectrum in which the frequency band higher than the estimated boundary frequency is removed.

The spectrum extension unit may perform interpolation based on a plurality of sample values included in a frequency response spectrum in which a frequency band higher than the estimated boundary frequency is removed.

Wherein the feature parameter calculator comprises: a mel-frequency response acquiring unit for acquiring a mel-frequency response by applying a mel-frequency filter to the modified frequency response spectrum; And a cepstral coefficient acquiring unit for performing inverse discrete cosine transform of the mel-frequency response to obtain a cepstrum coefficient.

A speaker recognition apparatus according to an embodiment of the present invention includes: a voice collection unit for collecting a voice of a speaker; A voice processing unit for processing the collected voice and discriminating whether or not it matches the voice of a previously registered user; And a storage unit for storing information on the voice of the user, wherein the voice processor comprises: a preprocessor for preprocessing an input signal; A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency in a frequency response spectrum of the input signal; And a feature parameter calculator for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum.

According to the embodiment of the present invention, the speech recognition rate can be increased in the speech processing, and in particular, the speaker recognition rate can be improved.

1 is an exemplary block diagram of a speaker recognition apparatus according to an embodiment of the present invention.
2 is an exemplary block diagram of a feature parameter extraction unit according to an embodiment of the present invention.
3 is an exemplary graph of a frequency response spectrum of an input signal from which a high frequency band is removed based on a boundary frequency and an exponential function model thereof according to an embodiment of the present invention.
FIG. 4 is an exemplary graph for explaining a process of calculating coefficients and exponents of an exponential function model according to an embodiment of the present invention.
5 is an exemplary graph illustrating a process of increasing the frequency domain resolution of a frequency response spectrum according to an embodiment of the present invention.
FIG. 6 is an exemplary graph for explaining a process of extending a frequency band of a frequency response spectrum according to an embodiment of the present invention. Referring to FIG.
7 is an exemplary flowchart of a feature parameter extraction method in accordance with an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings attached hereto.

1 is an exemplary block diagram of a speaker recognition apparatus 10 according to an embodiment of the present invention.

1, the speaker recognition apparatus 10 includes a voice collection unit 110, a voice processing unit 120, and a storage unit 130. [

The voice collecting unit 110 collects the voice of the speaker. According to an embodiment of the present invention, the voice collection unit 110 may include a microphone that converts a voice uttered by the speaker into an electrical signal. However, the voice collection unit 110 includes any device that acquires a signal related to the speaker's voice in various ways (for example, data communication via a network) without being micro-constrained to collect the speaker's voice directly from the speaker do.

The voice processing unit 120 processes the collected voice and discriminates whether it matches the voice of the registered user. According to an embodiment of the present invention, the voice processing unit 120 includes a processor for processing an electric signal related to voice (hereinafter referred to as a voice signal) according to a predetermined algorithm, and may include, for example, a CPU But is not limited to. In this case, the voice processing unit 120 can execute a program stored in the storage unit 130 to process the voice signal, and the data obtained through the process can be stored in the storage unit 130 have.

The storage unit 130 stores information on the user's voice. According to an embodiment of the present invention, the storage unit 130 is a storage device capable of storing data or various programs. For example, the storage unit 130 may store various types of memories such as RAM, ROM, , Cache, and the like.

According to an embodiment of the present invention, the voice processing unit 120 extracts a feature parameter that characterizes the voice from the voice of the speaker to determine whether the voice of the speaker coincides with the voice of the previously registered user (121).

2 is an exemplary block diagram of a feature parameter extraction unit 121 according to an embodiment of the present invention.

2, the feature parameter extracting unit 121 may include a pre-processing unit 1211, a spectrum transforming unit 1212, and a feature parameter calculating unit 1213. [

The pre-processor 1211 preprocesses the input signal. The spectrum modifying unit 1212 modifies the frequency response spectrum by removing a high frequency flat band higher than the boundary frequency in the frequency response spectrum of the input signal. The feature parameter calculator 1213 calculates a feature parameter that characterizes the input signal based on the modified frequency response spectrum.

According to the embodiment of the present invention, the input signal processed by the feature parameter extracting unit 121 to obtain the feature parameter may include a glottal signal obtained from the speech signal. When the feature parameter is obtained by processing the gestural signal as in the embodiment of the present invention, the morphological characteristics of the speaker can be reflected in the feature parameter, thereby contributing to the enhancement of the recognition rate.

2, the pre-processing unit 1211 may include a precession unit 12111, a window function application unit 1212, and a frequency domain transform unit 12113.

The precession section 12111 compensates for high frequency components lost in the process of generating an input signal (e.g., a gist signal g (n) in FIG. 2). High frequency components are lost in the process of radiating air from lips to free space when the speaker utters voice. The precession section 12111 applies a line emphasis filter to the input signal to compensate for the lost high frequency component. For example, the line enhancement filter may be implemented to have a transfer function as shown in the following equation.

Figure 112016085516826-pat00018

The window function application unit 12112 applies a predetermined window function to the input signal. The window function application unit 12112 minimizes the distortion caused by the inter-frame discontinuity of the input signal. By applying a predetermined window function to the input signal, the window function application unit 12112 mitigates the resolution degradation of the frequency response due to the utilization of the short-term data. For example, the window function application unit 12112 can generate the output signal g w (n) by applying a window function w (n) to the input signal g p (n) as shown in the following equation.

Figure 112016085516826-pat00019

Where N w is the frame size of the short-term input signal.

According to an embodiment of the present invention, the window function application unit 12112 may apply a Hamming window function to the input signal g p (n). The Hamming window function can be expressed as the following equation.

Figure 112016085516826-pat00020

The frequency domain transformer 12113 transforms the time domain input signal into a frequency domain frequency response spectrum. For example, the frequency domain transform unit 12113 performs discrete Fourier transform (DFT) on the input signal g w (n) in the time domain as shown in the following equation to obtain a frequency response spectrum G (k) Can be converted.

Figure 112016085516826-pat00021

Here, N is the size of the discrete Fourier transform.

As described above, when the pre-processing unit 1211 preprocesses the input signal g (n) and outputs the output signal G (k), the spectrum transform unit 1212 transforms the frequency response spectrum G (k) The high frequency flat band higher than the border frequency is removed to modify the frequency response spectrum G (k).

Referring to FIG. 2, the spectrum transforming unit 1212 may include a boundary frequency estimating unit 12121 and a high-frequency band removing unit 12123.

The boundary frequency estimating unit 12121 estimates a boundary frequency for an input signal. The high frequency band eliminating unit 12123 removes a frequency band higher than the estimated boundary frequency in the frequency response spectrum.

3 is an exemplary graph of a frequency response spectrum of an input signal from which a high frequency band is removed based on a boundary frequency k c and an exponential function model thereof according to an embodiment of the present invention.

First, the spectrum modifier 1212 can take the log value L (k) = log | G (k) | of the frequency response spectrum G (k) before modifying the frequency response spectrum G have.

The boundary frequency estimating unit 12121 models the logarithm L (k) of the frequency response spectrum as an exponential function and calculates a frequency corresponding to a predetermined threshold logarithmic value L TH in the modeled exponential function as a boundary frequency k c You can decide.

For example, referring to FIG. 3, the boundary frequency estimator 12121 may model the logarithm L (k) of the frequency response spectrum oscillating over the frequency index k as an exponential function Ae -? K.

According to an embodiment of the present invention, the boundary frequency estimating unit 12121 estimates the cost based on the difference between the log value L (k) of the frequency response spectrum and the exponential function model Ae -? K defined by the coefficient A and the exponent? The minimum coefficient A opt and the index alpha opt can be calculated. For example, the cost function for calculating the cost based on the difference between the log value L (k) of the frequency response spectrum and the exponential function model Ae -? K is given by the following equation.

Figure 112016085516826-pat00022

4 is an exemplary graph for explaining the process of calculating the coefficient A and the exponent alpha of the exponential function model Ae- alpha k according to an embodiment of the present invention.

According to an embodiment of the present invention, the boundary frequency estimating unit 12121 can calculate the coefficient A opt and the index α opt having the minimum cost from the cost function J using the steepest descent method.

For example, referring to FIG. 4, the boundary frequency estimating unit 12121 changes the coefficient A and the exponent α in a direction in which the slope of the cost function is the largest in the cost function J having the coefficients A and α as variables, The minimum coefficient A opt and the index alpha opt can be obtained.

Then, the boundary frequency estimator (12 121) is a coefficient A opt and the group frequency corresponding to the threshold value L TH logs set in the exponential function e A opt optk modeled by index α opt can be determined by the transition frequency k c have.

According to an embodiment of the invention, the boundary frequency estimator (12 121) is the critical log values after dividing the L TH to the calculated coefficients A opt, divided by the factor α opt the calculation by taking the logarithm of the transition frequency k c Can be calculated. In other words, the boundary frequency k c can be calculated by the following equation.

Figure 112016085516826-pat00023

3, the high frequency band elimination 12123 removes a frequency band higher than the boundary frequency k c from the log value L (k) of the frequency response spectrum, and a frequency band lower than the boundary frequency k c is removed Can be maintained.

The present inventor has found that a somewhat flat portion of the high frequency band in the frequency response of the short-term speech signal g (n) does not greatly contribute to improving discrimination in speaker recognition. Thus, embodiments of the present invention estimate a boundary frequency k c for identifying an inclined portion and a flat portion of the high frequency of the low frequency from the logarithm L (k) of the frequency response spectrum, and high than that, based on the boundary frequency k c We propose a technique to extract the feature parameters of the input signal based on the lower frequency band. As described above, when the speaker recognition is performed using the extracted feature parameters according to the embodiment of the present invention, the speaker recognition rate can be greatly improved.

2, the spectrum transforming unit 1212 may further include a spectral resolution increasing unit 12122. The spectral transforming unit 12122 may further include a spectral- The spectral resolution increasing unit 12122 increases the frequency domain resolution of the frequency response spectrum.

As described above, when the feature parameter is extracted using only the low frequency band lower than the boundary frequency k c , it may be difficult to extract the feature parameter that preferably characterizes the input signal because the frequency domain resolution of the spectrum is low.

Therefore, this embodiment can further increase the frequency domain resolution by a predetermined multiple before removing the high frequency band in the frequency response spectrum by further including the spectral resolution increasing portion 12122. [

5 is an exemplary graph illustrating a process of increasing the frequency domain resolution of a frequency response spectrum according to an embodiment of the present invention.

According to an embodiment of the present invention, the spectral resolution increasing unit 12122 may upsample the frequency response spectrum by a predetermined multiple in the frequency domain.

For example, referring to FIG. 5, the spectral resolution increasing unit 12122 inserts 0s between a response and a response of the log frequency response spectrum L (k), applies a low-pass filter, A frequency response spectrum L 2 (k) that is increased by a factor of 2 can be obtained.

When the spectrum modification unit 12122 is further included in the spectrum transform unit 1212, the high frequency band cancellation unit 12123 transforms the frequency response spectrum L 2 (k) The frequency band higher than the boundary frequency k c can be removed.

2, the spectrum transforming unit 1212 may further include a spectrum expanding unit 12124. [0154] The spectrally extended portion (12 124) extends the boundary frequency k c than the frequency band of the high frequency band is removed a frequency response spectrum.

As described above, when the spectrum is transformed by removing the high frequency band based on the boundary frequency k c after increasing the frequency domain resolution of the frequency response spectrum, the modified spectrum has a frequency response value at the frequency index k, not an integer .

Accordingly, this embodiment may further include the spectrum expanding unit 12124 so that the frequency response spectrum transformed by the spectrum transforming unit 1212 finally has a frequency response value at an integer frequency index k.

FIG. 6 is an exemplary graph for explaining a process of extending a frequency band of a frequency response spectrum according to an embodiment of the present invention. Referring to FIG.

According to an embodiment of the present invention, the spectrum extension unit 12124 can perform interpolation based on a plurality of sample values included in a frequency response spectrum in which a high frequency band is removed based on a boundary frequency k c .

For example, as shown in FIG. 6, the spectrum expansion unit 12124 can perform linear interpolation according to the following equation based on two points.

Figure 112016085516826-pat00024

K 0 and k 1 used in the above equation can be defined as the following equation.

Figure 112016085516826-pat00025

Here, the k c, 2 = 2k c.

As described above, if the spectrum modifying unit 1212 modifies the frequency response spectrum by removing the high frequency flat band higher than the boundary frequency k c in the frequency response spectrum G (k) of the input signal, the feature parameter calculating unit 1213 Calculates a feature parameter that characterizes the input signal g (n) based on the modified frequency response spectrum L '(k).

2, the feature parameter calculator 1213 may include a mel-frequency response acquiring unit 12131 and a cepstral coefficient acquiring unit 12132. The mel-

The mel-frequency response obtaining unit 12131 obtains a mel-frequency response by applying a mel-frequency filter to the modified frequency response spectrum. According to the embodiment of the present invention, the Mel-frequency response obtaining unit 12131 obtains an exponent value G (k) of the frequency response spectrum L '(k) modified by the spectrum modifying unit 1212 before applying the Mel- '(k) = exp (L' (k)).

The mel-frequency response obtaining unit 12131 may then apply a mel-scale filter bank preset to the absolute value | G '(k) | of the exponent value of the modified frequency response spectrum as shown in the following equation .

Figure 112016085516826-pat00026

Here, G 'MEL (m) is the m-th mel - the frequency response, FB (m) (k) is the m-th Mel - a k-th response of the frequency filter bank, k 1 (m) and k 2 (m) is Is the start frequency index and end frequency index of the mth M-frequency filter bank, respectively.

Then, the cepstral coefficient obtaining unit 12132 obtains cepstrum coefficients by inverse discrete cosine transform (IDCT) of the mel-frequency response G ' MEL (m). According to the embodiment of the present invention, the cepstral coefficient acquisition unit 12132 takes the log value log (G ' MEL (m)) of the mel-frequency response G' MEL (m) before the inverse discrete cosine transformation .

Thereafter, the cepstral coefficient obtaining unit 12132 obtains a cepstrum by inverse discrete cosine transform of the log value log (G ' MEL (m)) of the Mel-frequency response as shown in the following equation, The cepstral coefficient c SMFCC , LP (tau) can be calculated.

Figure 112016085516826-pat00027

Where M is the number of mel-filter banks and D is the order of the cepstrum.

The thus calculated cepstral coefficients c SMFCC and LP (tau) can be used as feature parameters of the input signal in the speaker recognition apparatus 10. [

According to an embodiment of the present invention, the speaker recognition apparatus 10 recognizes a speaker using a feature parameter c SMFCC, LP (?) Of an input signal through a machine learning algorithm based on a predetermined probability distribution model . For example, the probability distribution model used for speaker recognition may be, but not limited to, a Gaussian Mixture Model (GMM).

In addition, according to an embodiment of the present invention, the speaker recognition apparatus 10 can perform speaker recognition by combining various feature parameters used for speaker recognition in addition to the feature parameters c SMFCC and LP (τ) have.

7 is an exemplary flow diagram of a feature parameter extraction method 20 according to an embodiment of the present invention.

The feature parameter extraction method 20 can be executed by the feature parameter extraction unit 121 according to the embodiment of the present invention described above.

7, the characteristic parameter extraction method 20 is the input signal g (n) the pre-treatment step (S210), the transition frequency high frequency flatness than k c in the frequency response spectrum G (k) of the input signal (S220) of modifying the frequency response spectrum by removing the frequency band of the input signal g (n), and calculating a feature parameter c SMFCC , LP (tau) that characterizes the input signal g Step S230.

The step of pre-processing the input signal g (n) (S210) may include a line emphasis step of compensating a high frequency component lost in the process of generating the input signal g (n).

Step S210 of preprocessing the input signal g (n) may include applying a predetermined window function w (n) to the input signal g (n).

Pre-processing the input signal g (n) (S210) may include converting the time domain input signal g (n) into a frequency domain frequency response spectrum G (k).

The step S220 of modifying the frequency response spectrum may include estimating a boundary frequency k c for the input signal and removing a frequency band higher than the estimated boundary frequency k c in the frequency response spectrum .

According to an embodiment of the present invention, the step of estimating the boundary frequency k c includes modeling the logarithm L (k) of the frequency response spectrum as an exponential function Ae -? K , and modeling the exponential function A opt e - ? and determining the frequency corresponding to the predetermined threshold logarithmic value L TH at the optical frequency as the boundary frequency k c .

For example, the phase frequency response spectrum of the logarithm L (k) and coefficients A and index α exponential model coefficients Ae difference accrues at least based on the between -αk defined by estimating the boundary frequency k c A opt and an index alpha opt , and determining a value obtained by dividing the log value of the value obtained by dividing the critical log value L TH by the calculated coefficient A opt by the calculated index alpha opt as a boundary frequency k c . ≪ / RTI >

Further, the step (S220) of modifying the frequency response spectrum may further include the step of increasing the frequency domain resolution prior to the step of removing higher frequency band than a threshold frequency k c in the frequency response spectrum, the frequency response spectrum. In this case, the step of removing a higher frequency band than the boundary frequency k c may include the step of removing higher frequency band than a threshold frequency k c in the frequency response spectrum L 2 (k) of the frequency domain resolution increases.

Here, increasing the frequency domain resolution of the frequency response spectrum may include upsampling the frequency response spectrum by a predetermined multiple in the frequency domain.

Step (S220) and further, modifying the frequency response of the spectrum can further include the step of expanding the frequency band of the frequency response spectrum to remove higher frequency band than a threshold frequency k c.

Here, the step may include a step for performing interpolation based on a plurality of sample values included in the boundary frequency k c a frequency response spectrum to remove higher frequency band than the frequency band to expand.

The step S230 of calculating the feature parameter may include applying a mel-frequency filter to the modified frequency response spectrum L '(k) to obtain a mel-frequency response G' MEL (m) And G ' MEL (m) to obtain the cepstral coefficients c SMFCC , LP (tau).

The feature parameter extraction method 20 can be stored in a computer-readable recording medium that is manufactured as a program to be executed in a computer. The computer-readable recording medium includes all kinds of storage devices in which data that can be read by a computer system is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like. In addition, the feature parameter extraction method 20 may be implemented as a computer program stored on a medium for execution in association with the computer.

While the present invention has been described with reference to the exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. Those skilled in the art will appreciate that various modifications may be made to the embodiments described above. The scope of the present invention is defined only by the interpretation of the appended claims.

10: Speaker recognition device
110:
120:
121: Feature parameter extracting unit
130:
1211:
1212: Spectral deformation part
1213: Feature parameter calculating section
12111:
12112: Window function application part
12113: Frequency domain transform unit
12121:
12122: Spectral resolution increasing unit
12123: High frequency band elimination
12124: Spectrum Expander
12131: mel-frequency response acquisition unit
12132: Capstrum coefficient acquisition unit

Claims (14)

A preprocessor for preprocessing an input signal;
A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency determined based on the input signal in a frequency response spectrum of the input signal; And
A feature parameter calculating unit for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum;
Lt; / RTI >
The spectral transformations comprise:
A boundary frequency estimator for estimating the boundary frequency with respect to the input signal; And
A high frequency band eliminating unit for removing a frequency band higher than the estimated boundary frequency in the frequency response spectrum;
Lt; / RTI >
Wherein the boundary frequency estimator comprises:
Modeling the log value of the frequency response spectrum as an exponential function,
And determines a frequency corresponding to a predetermined threshold logarithm in the modeled exponential function as the boundary frequency.
The method according to claim 1,
Wherein the input signal includes a glottal signal obtained from a voice signal.
The method according to claim 1,
Wherein the pre-
And a line emphasis unit for compensating a high frequency component lost in the process of generating the input signal.
The method according to claim 1,
Wherein the pre-
And a window function applying unit for applying a predetermined window function to the input signal.
The method according to claim 1,
Wherein the pre-
And a frequency domain transformer for transforming the input signal in the time domain into the frequency response spectrum in the frequency domain.
delete delete The method according to claim 1,
Wherein the boundary frequency estimator comprises:
Estimating the coefficient and the exponent having a minimum cost based on a difference between a log value of the frequency response spectrum and an exponential function model defined by a coefficient and an exponent,
And a log value of a value obtained by dividing the critical log value by the estimated coefficient is divided by the calculated index as the boundary frequency.
A preprocessor for preprocessing an input signal;
A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency in a frequency response spectrum of the input signal; And
And a feature parameter calculation unit for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum,
The spectral transformations comprise:
A boundary frequency estimator for estimating the boundary frequency with respect to the input signal; And
A high frequency band eliminating unit for removing a frequency band higher than the estimated boundary frequency in the frequency response spectrum;
Lt; / RTI >
The spectral transformations comprise:
And a spectral resolution increasing unit for increasing a frequency domain resolution of the frequency response spectrum,
Wherein the high frequency band elimination unit comprises:
And removes a frequency band higher than the estimated boundary frequency in the frequency response spectrum in which the frequency domain resolution is increased.
10. The method of claim 9,
The spectral resolution increasing unit includes:
And upsamples the frequency response spectrum by a predetermined multiple in the frequency domain.
A preprocessor for preprocessing an input signal;
A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency in a frequency response spectrum of the input signal; And
And a feature parameter calculation unit for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum,
The spectral transformations comprise:
A boundary frequency estimator for estimating the boundary frequency with respect to the input signal; And
A high frequency band eliminating unit for removing a frequency band higher than the estimated boundary frequency in the frequency response spectrum;
Lt; / RTI >
The spectral transformations comprise:
And a spectrum expanding unit for expanding a frequency band of a frequency response spectrum in which a frequency band higher than the estimated boundary frequency is removed.
12. The method of claim 11,
Wherein the spectrum extension comprises:
Wherein the interpolation is performed based on a plurality of sample values included in a frequency response spectrum in which a frequency band higher than the estimated boundary frequency is removed.
The method according to claim 1,
Wherein the feature parameter calculator comprises:
A mel-frequency response acquiring unit for acquiring a mel-frequency response by applying a Mel-Frequency filter to the modified frequency response spectrum; And
A cepstral coefficient acquiring unit for performing inverse discrete cosine transform of the Mel-frequency response to obtain a cepstrum coefficient;
And a characteristic parameter extracting unit.
A voice collecting unit for collecting the voice of the speaker;
A voice processing unit for processing the collected voice and discriminating whether or not it matches the voice of a previously registered user; And
And a storage unit for storing information on the voice of the user,
Wherein the voice processing unit comprises:
A preprocessor for preprocessing an input signal;
A spectrum transformer for transforming the frequency response spectrum by removing a high frequency flat band higher than a boundary frequency determined based on the input signal in a frequency response spectrum of the input signal; And
A feature parameter calculating unit for calculating a feature parameter that characterizes the input signal based on the modified frequency response spectrum;
/ RTI >
The spectral transformations comprise:
A boundary frequency estimator for estimating the boundary frequency with respect to the input signal; And
A high frequency band eliminating unit for removing a frequency band higher than the estimated boundary frequency in the frequency response spectrum;
Lt; / RTI >
Wherein the boundary frequency estimator comprises:
Modeling the log value of the frequency response spectrum as an exponential function,
And determines a frequency corresponding to a preset threshold logarithm in the modeled exponential function as the boundary frequency.
KR1020150183988A 2015-12-22 2015-12-22 Apparatus for feature extraction in glottal flow signals for speaker recognition KR101673221B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020150183988A KR101673221B1 (en) 2015-12-22 2015-12-22 Apparatus for feature extraction in glottal flow signals for speaker recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150183988A KR101673221B1 (en) 2015-12-22 2015-12-22 Apparatus for feature extraction in glottal flow signals for speaker recognition

Publications (1)

Publication Number Publication Date
KR101673221B1 true KR101673221B1 (en) 2016-11-07

Family

ID=57529852

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150183988A KR101673221B1 (en) 2015-12-22 2015-12-22 Apparatus for feature extraction in glottal flow signals for speaker recognition

Country Status (1)

Country Link
KR (1) KR101673221B1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR970012285A (en) * 1995-08-26 1997-03-29 김광호 Pitch detection method of voice signal
JP2006189799A (en) * 2004-12-31 2006-07-20 Taida Electronic Ind Co Ltd Voice inputting method and device for selectable voice pattern

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR970012285A (en) * 1995-08-26 1997-03-29 김광호 Pitch detection method of voice signal
JP2006189799A (en) * 2004-12-31 2006-07-20 Taida Electronic Ind Co Ltd Voice inputting method and device for selectable voice pattern

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
강지훈, 정상배, ‘유/무성음 구분 및 이중적 특징파라미터 결합을 이용한 화자인식 성능 개선’, 한국정보통신학회논문지, Vol.18, No.6, pp.1294~1301, June 2014.* *

Similar Documents

Publication Publication Date Title
CN106971741B (en) Method and system for voice noise reduction for separating voice in real time
CN106935248B (en) Voice similarity detection method and device
CN108281146B (en) Short voice speaker identification method and device
US9224392B2 (en) Audio signal processing apparatus and audio signal processing method
CN112053695A (en) Voiceprint recognition method and device, electronic equipment and storage medium
CN109767756B (en) Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient
Kumar et al. Analysis of MFCC and BFCC in a speaker identification system
CN113327626B (en) Voice noise reduction method, device, equipment and storage medium
CN108922543B (en) Model base establishing method, voice recognition method, device, equipment and medium
Nasr et al. Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients
WO2019232826A1 (en) I-vector extraction method, speaker recognition method and apparatus, device, and medium
CN109147798B (en) Speech recognition method, device, electronic equipment and readable storage medium
KR20160102815A (en) Robust audio signal processing apparatus and method for noise
CN110942766A (en) Audio event detection method, system, mobile terminal and storage medium
KR100893123B1 (en) Method and apparatus for generating audio fingerprint data and comparing audio data using the same
KR100897555B1 (en) Apparatus and method of extracting speech feature vectors and speech recognition system and method employing the same
US7966179B2 (en) Method and apparatus for detecting voice region
KR101671305B1 (en) Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same
CN112466276A (en) Speech synthesis system training method and device and readable storage medium
Mu et al. MFCC as features for speaker classification using machine learning
KR101673221B1 (en) Apparatus for feature extraction in glottal flow signals for speaker recognition
CN110197657A (en) A kind of dynamic speech feature extracting method based on cosine similarity
CN111402898B (en) Audio signal processing method, device, equipment and storage medium
Roy et al. A hybrid VQ-GMM approach for identifying Indian languages
Ghezaiel et al. Nonlinear multi-scale decomposition by EMD for Co-Channel speaker identification

Legal Events

Date Code Title Description
A201 Request for examination
GRNT Written decision to grant
FPAY Annual fee payment

Payment date: 20191029

Year of fee payment: 4