KR101671305B1 - Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same - Google Patents

Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same Download PDF

Info

Publication number
KR101671305B1
KR101671305B1 KR1020150183897A KR20150183897A KR101671305B1 KR 101671305 B1 KR101671305 B1 KR 101671305B1 KR 1020150183897 A KR1020150183897 A KR 1020150183897A KR 20150183897 A KR20150183897 A KR 20150183897A KR 101671305 B1 KR101671305 B1 KR 101671305B1
Authority
KR
South Korea
Prior art keywords
signal
input signal
feature parameter
pitch
unit
Prior art date
Application number
KR1020150183897A
Other languages
Korean (ko)
Inventor
정상배
강지훈
김영일
Original Assignee
경상대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 경상대학교 산학협력단 filed Critical 경상대학교 산학협력단
Priority to KR1020150183897A priority Critical patent/KR101671305B1/en
Application granted granted Critical
Publication of KR101671305B1 publication Critical patent/KR101671305B1/en
Priority to PCT/KR2016/014673 priority patent/WO2017111386A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The present invention relates to a characteristic parameter extracting device and a speaker recognizing device, capable of raising a speaker recognition rate by extracting an excitation signal from a periodic section of an input signal and then using a characteristic parameter extracted from the signal. According to an embodiment of the present invention, the characteristic parameter extracting device includes: a periodic signal detecting part detecting a periodic section of an input signal; an excitation signal extracting part extracting an excitation signal from the periodic section of the input signal; and a characteristic parameter calculating part calculating a characteristic parameter, characterizing the input signal, based on a frequency response spectrum of the excitation signal.

Description

TECHNICAL FIELD [0001] The present invention relates to an apparatus for extracting feature parameters of an input signal and a speaker recognizing apparatus using the feature parameter extracting apparatus.

The present invention relates to an apparatus for extracting characteristic parameters of an input signal and an apparatus for recognizing a speaker using the apparatus.

Speech processing technology, which computer processes and understands human speech, is a promising technology that can be used in various fields. In particular, a speaker recognition technique for identifying a speaker based on the input voice may be used for identity verification in a security system or for user identification in an intelligent robot.

In general, speech recognition including speaker recognition extracts a feature vector from a speech input signal and compares it with previously stored data to recognize the information. However, the speech recognition rate of the present technology is limited to be commercialized in various fields, and among them, the recognition rate of the speaker recognition technology is not high, and continuous research and development is necessary.

An object of the present invention is to provide a feature parameter extracting apparatus capable of improving the speaker recognition rate and a speaker recognizing apparatus using the feature parameter extracting apparatus.

An apparatus for extracting feature parameters according to an exemplary embodiment of the present invention includes: a periodic signal detector for detecting a periodic interval of an input signal; An excitation signal extractor for extracting an excitation signal in a periodic interval of the input signal; And a feature parameter calculation unit for calculating a feature parameter that characterizes the input signal based on the frequency response spectrum of the excitation signal.

The periodic signal detector may detect a periodic interval based on a result of an auto-correlation function for the input signal.

The periodic signal detector may determine that the periodic signal has a period T when the input signal is x (n) and R (T) / R (0) for the input signal is greater than or equal to a predetermined threshold, R (T) =? X (n) x (n + T).

Wherein the characteristic parameter extracting unit comprises: a representative pitch detecting unit for determining a representative pitch value among pitch values of the input signal; And a proper pitch detector for determining an appropriate pitch value based on the representative pitch value among the pitch values in the periodic interval of the input signal, wherein the feature parameter extractor extracts the feature parameter Can be calculated.

The representative pitch detection unit may determine the median value as the representative pitch value by arranging the pitch values of the input signals in order of magnitude.

The appropriate pitch detector may determine a value that is closest to the representative pitch value among the pitch values selected based on the result of the autocorrelation function for the input signal of the periodic interval as a proper pitch value.

The excitation signal extraction unit may include a line emphasis unit that preprocesses the input signal of the periodic section and outputs a preprocessed signal in which high frequency components lost in the process of generating the input signal are compensated.

Wherein the excitation signal extracting unit comprises: an autocorrelation function estimator for outputting a result of an autocorrelation function for the preprocessed signal; A speculative coefficient calculator receiving an output of the autocorrelation function estimator and outputting a speculative coefficient based on a Levinson-Durbin algorithm; And an inverse filtering unit performing inverse filtering based on the preprocessing signal and the estimated coefficient to output the excitation signal.

The feature parameter calculator may include a frequency domain transformer for performing a discrete Fourier transform on the excitation signal based on the predetermined pitch value to convert the excitation signal into a discrete Fourier spectrum in the frequency domain.

The feature parameter calculator may calculate the logarithm of the magnitude of the discrete Fourier spectra with the feature parameter.

Wherein the feature parameter calculating unit calculates a feature frequency of the input signal based on a Mel-Frequency (M-F) filter that applies a Mel-Frequency filter to a frequency response spectrum of the input signal with respect to a periodic interval and an aperiodic interval of the input signal, A response obtaining unit; And a cepstral coefficient acquiring unit for performing an inverse discrete cosine transform of the mel-frequency response to obtain a cepstrum coefficient.

The feature parameter calculating unit may calculate a value obtained by multiplying the output of the feature parameter calculating unit and the output of the cepstrum coefficient acquiring unit by predetermined weights, respectively, as a feature parameter.

A speaker recognition apparatus according to an embodiment of the present invention includes: a voice collection unit for collecting a voice of a speaker; A voice processing unit for processing the collected voice and discriminating whether or not it matches the voice of a previously registered user; And a storage unit for storing information on the voice of the user.

The voice processing unit may include: a periodic signal detector for detecting a period of a signal of the input signal; An excitation signal extractor for extracting an excitation signal in a periodic interval of the input signal; And a feature parameter calculation unit for calculating a feature parameter that characterizes the input signal based on the frequency response spectrum of the excitation signal.

According to an embodiment of the present invention, a feature parameter extracting apparatus that increases the speech recognition rate in speech signal processing can be obtained, and a speaker recognition apparatus improved in speaker recognition rate can be obtained.

1 is an exemplary block diagram of a speaker recognition apparatus according to an embodiment of the present invention.
2 is an exemplary block diagram of a feature parameter extraction unit according to an embodiment of the present invention.
3 is an exemplary flowchart of a method of detecting a representative pitch in the feature parameter extracting unit of FIG.
4A and 4B are graphs for explaining a method of detecting a representative pitch in an input signal according to an embodiment of the present invention.
5 is an analysis graph of an autocorrelation function for the periodic signal detected by the periodic signal detecting unit of the characteristic parameter extracting unit of FIG.
FIG. 6 is an exemplary flowchart of a method for detecting an appropriate pitch in the feature parameter extracting unit of FIG. 2. FIG.
7 is a graph for explaining a method of detecting a proper pitch in an input signal according to an embodiment of the present invention.
8 is an exemplary flowchart of a method of extracting an excitation signal in the feature parameter calculating unit of FIG.
9 is a table showing improved speaker recognition rates according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings attached hereto.

1 is an exemplary block diagram of a speaker recognition apparatus 10 according to an embodiment of the present invention.

1, the speaker recognition apparatus 10 according to an embodiment of the present invention includes a voice collection unit 110, a voice processing unit 120, and a storage unit 130. [

The voice collecting unit 110 collects the voice of the speaker. According to an embodiment of the present invention, the voice collection unit 110 may include a microphone that converts a voice uttered by the speaker into an electrical signal. However, the voice collection unit 110 includes any device that acquires a signal related to the speaker's voice in various ways (for example, data communication via a network) without being micro-constrained to collect the speaker's voice directly from the speaker do.

The voice processing unit 120 processes the collected voice and discriminates whether it matches the voice of the registered user. According to an embodiment of the present invention, the voice processing unit 120 includes a processor for processing an electric signal related to voice (hereinafter referred to as a voice signal) according to a predetermined algorithm, and may include, for example, a CPU But is not limited to. In this case, the voice processing unit 120 can execute a program stored in the storage unit 130 to process the voice signal, and the data obtained through the process can be stored in the storage unit 130 have.

The storage unit 130 stores information on the user's voice. According to an embodiment of the present invention, the storage unit 130 is a device capable of storing data or various programs. For example, the storage unit 130 may store not only a large-capacity storage device such as an HDD, an SSD, Cache, and the like.

According to an embodiment of the present invention, the voice processing unit 120 extracts a feature parameter that characterizes the voice from the voice of the speaker to determine whether the voice of the speaker coincides with the voice of the previously registered user (121).

2 is an exemplary block diagram of a feature parameter extraction unit according to an embodiment of the present invention.

2, the feature parameter extraction unit 121 may include a periodic signal detection unit 1211, an excitation signal extraction unit 1212, and a feature parameter calculation unit 1213.

The feature parameter calculating section 1213 may include a representative pitch detecting section 12131 for determining a representative pitch value required for excitation signal extraction and feature parameter calculation.

3 is an exemplary flowchart of a method of detecting a representative pitch in the representative pitch detecting unit 12131 included in the characteristic parameter extracting unit 121 of FIG.

Referring to FIG. 3, the representative pitch detector 12131 detects a pitch by analyzing an autocorrelation function. The autocorrelation function is a function indicating the correlation of a value taken by two arbitrary time points of an arbitrary signal. For example, it is possible to show a correlation between values taken at time t + τ delayed by a predetermined time from t-time and t-time. The pitch can be detected at a position where the autocorrelation function of the speech signal becomes the maximum. Specifically, it can be detected by the following equation.

r x (t) = Σx (n) x (n + τ), (τ = nT 0 , r x (nT 0 ) = r x

n is an integer, and the fundamental period of the periodic signal is defined as T 0 .

Referring back to FIG. 3, the representative pitch detector 12131 may sort the pitch values of the input signals in order of magnitude, and determine the median value as the representative pitch value.

4A and 4B are graphs for explaining a method of detecting a representative pitch in an input signal according to an embodiment of the present invention.

4A is a distribution of pitch values detected from an exemplary input signal. As shown in FIG. 4A, the pitch value may suddenly increase (pitch doubling) or suddenly decrease (pitch haven) may occur. If the pitch value is used to perform speech recognition, the speech recognition rate and the speaker recognition rate Can be lowered. Accordingly, in order to correct such a pitch extraction error, the representative pitch detector 12131 can sort the pitch values in order of magnitude and determine the median value as the representative pitch value as shown in FIG. 4B.

The representative pitch detection unit 12131 can determine a representative pitch of the speaker for an arbitrary speaker and correct the pitch value used in future speech recognition and speaker recognition based on the representative pitch value. That is, the pitch extraction error in the speech frame to be analyzed can be corrected using the representative pitch value. The determined representative pitch value is used to determine an appropriate pitch value in the analysis of the periodic signal. The related contents will be described in detail with reference to FIGS. 6 and 7. FIG.

Referring back to FIG. 2, the periodic signal detector 1211 distinguishes the periodic interval and the non-periodic interval of the input signal.

A voice signal can be input as an input signal. The voices are divided into voiced voices that cause vibrations in the vocal cords and unvoiced voices that do not cause vibrations of the vocal cords. Periodic excitation signals can be detected in voiced intervals. The apparatus for extracting feature parameters according to an embodiment of the present invention detects feature parameters from excitation signals and uses the feature parameters as auxiliary parameters for improving the speech recognition rate. Therefore, it is necessary to divide the voiced part of the input signal, that is, the period of the periodic signal.

According to an embodiment of the present invention, the periodic signal detector 1211 detects a periodic interval based on a result of an auto-correlation function on an input signal. The periodic signal detection unit 1211 can determine that the resultant signal is a periodic signal when the resultant value of the autocorrelation function is equal to or greater than a predetermined threshold value.

5 is an analysis graph of an autocorrelation function for the periodic signal detected by the periodic signal detecting unit of the characteristic parameter extracting unit of FIG. The periodic signal detector 1211 may analyze the graph outline of the autocorrelation function for the input signal to determine the periodicity of the signal. Referring to FIG. 5, it can be determined that the input signal is a periodic signal in a period in which the value of R (T) / R (0) for T is equal to or greater than the threshold value R Th . That is, when the input signal is x (n) and R (T) / R (0) for the input signal is equal to or greater than a predetermined threshold value, the periodic signal detecting unit 1121 detects the periodic signal It can be judged. Where R (T) = Sigmax (n) x (n + T).

Referring again to FIG. 2, the feature parameter calculator 1213 includes a proper pitch detector 12132 that determines an appropriate pitch value among the pitch values in the periodic interval of the input signal. 6 is an exemplary flowchart of a method of detecting an appropriate pitch in the optimum pitch detecting section 12132. Fig.

Referring to FIG. 6, the optimum pitch detector 12132 may estimate an autocorrelation function for an input signal and analyze an outline of an autocorrelation function graph to determine a proper pitch value. 7 is a graph for explaining a method of detecting a proper pitch in an input signal according to an embodiment of the present invention. The appropriate pitch detection unit 12132 can select candidate pitch values based on the autocorrelation function. As shown in FIG. 2, the optimum pitch detection unit 12132 can determine a proper pitch value based on the great pitch value among the candidate pitch values. The appropriate pitch detection unit 12132 can determine a value that is closest to the representative pitch value among the candidate pitch values as an appropriate pitch value. By using a value that is closest to the representative pitch value as the optimum pitch value, the possibility of pitch doubling and pitch havening phenomenon as described above can be reduced.

8 is an exemplary flowchart of a method of extracting an excitation signal in the excitation signal extraction unit 1212 in FIG.

Referring to FIG. 8, the excitation signal extracting unit 1212 may include a line emphasis unit. The pre-emphasis unit may pre-process the input signal s (n) of the periodic section to output the pre- processing signal s pre (n) compensated for the high frequency components lost in the process of generating the input signal. High frequency components are lost in the process of radiating air from lips to free space when the speaker utters voice. The line emphasis unit applies a line emphasis filter to the input signal to compensate for the lost high frequency component. For example, the line enhancement filter may be implemented to have a transfer function as shown in the following equation.

H (z) = 1 -? Z -1 , (0.9?? 1 )

Typically a = 0.97.

8, the excitation signal extraction unit 1212 includes an autocorrelation function output unit for estimating an autocorrelation function for the preprocessed signal and outputting a result value, an input unit for receiving an output of the autocorrelation function estimation unit, Levinson-Durbin) guess coefficients based on the algorithm (a k) to assume the coefficient calculating section to the output, and performs an inverse filter (inverse filtering) based on the pre-signal and the speculative factor here outputs the signal e (n) And an inverse filtering unit.

The inverse filtering unit may be implemented to have a transfer function expressed by the following equation.

Figure 112016044907667-pat00023

Here, P may be an appropriate pitch value detected by the appropriate pitch detecting unit 12132. [ If the appropriate pitch value detected by the appropriate pitch detecting unit 12132 is not an integer, the integer P value can be calculated through a process such as rounding.

According to an embodiment of the present invention, the feature parameter calculator 1213 performs a discrete Fourier transform (Discrete Fourier Transform) on the excitation signal e (n) based on the appropriate pitch value P to obtain a discrete Fourier spectrum Fourier spectrum, DFS).

The feature parameter calculating unit 1213 can extract a discrete Fourier spectral coefficient (DFS coefficient) of the excitation signal and calculate it as a feature parameter that characterizes the input signal. When the excitation signal e (n) extracted by the excitation signal extracting unit 1212 is

Figure 112016044907667-pat00024

, Where p may be the pitch value detected in the appropriate pitch detecting section. If the pitch value detected by the appropriate pitch detector is not an integer, the rounded value may be p.

The excitation signal e (n) is subjected to discrete Fourier transform to obtain a discrete Fourier spectrum for the excitation signal.

The excitation signal e (n) can be transformed as follows.

Figure 112016044907667-pat00025

Figure 112016044907667-pat00026

Here, the feature parameters that characterize the speech signal can be calculated using the DFS coefficients A k and B k .

In one embodiment, the feature parameter calculator 1213 can obtain the DFS size using the DFS coefficient as shown below.

Figure 112016044907667-pat00027

According to one embodiment, the feature parameter calculating unit 1213 calculates a feature vector in which a harmonic frequency of a frequency corresponding to a pitch value and a value obtained by taking log as a magnitude of the calculated DFS coefficient are used as a pair can do. The formula of the calculated feature vector is as follows.

Figure 112016044907667-pat00028

Where K is the harmonic number of the pitch value and E k is the magnitude of the Kth DFS coefficient extracted from the excitation signal normalized by the energy of the input signal.

Referring again to FIG. 2, the feature parameter extracting unit 1213 according to an embodiment of the present invention may extract MFCC (Mel Frequency Cepstral Coefficient) from an input signal.

According to one embodiment, the feature parameter extractor 1213 applies a Mel-Frequency filter to the frequency response spectrum of the input signal with respect to the periodic interval and the aperiodic interval of the input signal, And a cepstral coefficient acquiring unit for acquiring a cepstrum coefficient by inverse discrete cosine transforming the Mel-frequency response. Therefore, the feature parameter extraction unit 1213 can extract the MFCC of the input signal and use it as a feature parameter.

The feature parameter extracting unit 1213 according to an embodiment of the present invention may extract the DFT coefficient for the excitation signal as well as the MFCC as the feature parameter. Therefore, it is possible to improve speech recognition rate and speaker recognition rate by using DFS coefficient as an auxiliary characteristic parameter.

Referring again to FIG. 1, in the speaker recognition apparatus 10 according to the embodiment of the present invention, the voice processing unit 120 processes the collected voice to determine whether or not it matches the voice of a previously registered user .

According to one embodiment, the speech processing unit 120 can calculate the score of the speaker based on the feature parameter and recognize the speaker by referring to the feature parameter extraction unit 1213. Specifically, the phonetic speaker can be recognized by scoring the degree of similarity between the Gaussian mixed model generated for each specific speaker and the feature parameter extracted from the input speech signal. The Gaussian mixture model may be stored in the storage unit 130.

In one embodiment, the score may be calculated as:

Figure 112016044907667-pat00029

x t is the feature vector in the analyzed speech frame, and S i is the Gaussian mixture model parameter for the particular speaker. P represents the probability that the feature vector x t will occur in the speaker's sound.

According to an embodiment of the present invention, the score may be calculated by applying various parameters extracted by the feature parameter extracting unit 121. [ According to an embodiment of the present invention, an excitation signal is extracted from a periodic signal to generate an auxiliary parameter, thereby improving the accuracy of the score. When a score is calculated using a plurality of parameters, a score can be calculated by giving a weight to each parameter. At this time, the weights can be given by the total number survey.

9 is a table showing improved speaker recognition rates according to an embodiment of the present invention. Referring to FIG. 9, speaker recognition rates are shown when speaker recognition is performed by calculating a score according to an embodiment of the present invention, and when speaker recognition is performed by calculating scores according to a comparative example. In the case of the embodiment, both the MFCC parameter and the DFS coefficient parameter of the excitation signal are used. In the case of the comparative example, only the MFCC parameter is used. The score weights?,?, And? Represent weights assigned to the scores calculated from the MFCC extracted from the periodic signal, the MFCC extracted from the aperiodic signal, and the DFS coefficient parameters extracted from the excitation signal.

As shown in the table of FIG. 9, according to an embodiment of the present invention, the speaker recognition rate is further improved when additional parameters extracted from the excitation signal are used.

While the present invention has been described with reference to the exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. Those skilled in the art will appreciate that various modifications may be made to the embodiments described above. The scope of the present invention is defined only by the interpretation of the appended claims.

10: Speaker recognition device
110:
120:
121: Feature parameter extracting unit
130:
1211: Periodic signal detection unit
1212:
1213: Feature parameter calculating section

Claims (13)

A periodic signal detector for receiving an input signal and detecting a periodic period of the signal;
An excitation signal extractor for extracting an excitation signal in a periodic interval of the input signal;
A feature parameter calculating unit for calculating a feature parameter that characterizes the input signal based on a frequency response spectrum of the excitation signal;
A representative pitch detector for determining a representative pitch value among pitch values of the input signal; And
And an appropriate pitch detector for determining an appropriate pitch value based on the representative pitch value among the pitch values in the periodic interval of the input signal,
Wherein the feature parameter extraction unit calculates the feature parameter based on the appropriate pitch value.
The method according to claim 1,
Wherein the periodic signal detection unit comprises:
And detects a periodic interval based on a result of an auto-correlation function on the input signal.
3. The method of claim 2,
Wherein the periodic signal detection unit comprises:
(T) / R (0) for the input signal is equal to or greater than a predetermined threshold value,
Wherein R (T) = Sigmax (n) x (n + T).
delete The method according to claim 1,
The representative pitch detecting unit may include:
And arranging the pitch values of the input signals in order of magnitude to determine an intermediate value as the representative pitch value.
The method according to claim 1,
The optimum pitch detection unit may include:
And determines a value that is closest to the representative pitch value among the pitch values selected based on the result of the autocorrelation function for the input signal of the periodic interval as a proper pitch value.
The method according to claim 1,
Wherein the excitation signal extracting unit comprises:
And a line emphasis unit for preprocessing the input signal of the periodic section and outputting a preprocessed signal compensated for high frequency components lost in the process of generating the input signal.
8. The method of claim 7,
Wherein the excitation signal extracting unit comprises:
An autocorrelation function output unit for outputting the result of the autocorrelation function for the preprocessed signal; And
A speculative coefficient calculator receiving an output of the autocorrelation function estimator and outputting a speculative coefficient based on a Levinson-Durbin algorithm; And
And an inverse filtering unit performing inverse filtering on the basis of the pre-processing signal and the estimated coefficient to output the excitation signal.
The method according to claim 1,
Wherein the feature parameter calculating unit comprises:
And a frequency domain transformer for transforming the excitation signal into a discrete Fourier spectrum in a frequency domain by performing a discrete Fourier transform on the basis of the appropriate pitch value.
10. The method of claim 9,
Wherein the feature parameter calculating unit comprises:
And the logarithm of the size of the discrete Fourier spectrum is calculated as the feature parameter.
10. The method of claim 9,
Wherein the feature parameter calculating unit comprises:
And a value obtained by adding predetermined weights to the energy of the order of the discrete Fourier spectrum and calculating a sum as a feature parameter.
The method according to claim 1,
Wherein the feature parameter extracting unit comprises:
A mel-frequency response acquiring unit for acquiring a mel-frequency response by applying a Mel-Frequency filter to a frequency response spectrum of the input signal with respect to a periodic interval and an aperiodic interval of the input signal; And
And a cepstrum coefficient obtaining unit for obtaining a cepstrum coefficient by inverse discrete cosine transform of the Mel-frequency response.
A voice collecting unit for collecting the voice of the speaker;
A voice processing unit for processing the collected voice and discriminating whether or not it matches the voice of a previously registered user; And
And a storage unit for storing information on the voice of the user,
Wherein the voice processing unit comprises:
A periodic signal detector for detecting a periodic interval of an input signal;
An excitation signal extractor for extracting an excitation signal in a periodic interval of the input signal;
A feature parameter calculating unit for calculating a feature parameter that characterizes the input signal based on a frequency response spectrum of the excitation signal;
A representative pitch detector for determining a representative pitch value among the pitch values of the input signal; And
And an appropriate pitch detector for determining an appropriate pitch value based on the representative pitch value among the pitch values in the periodic interval of the input signal,
Wherein the feature parameter extraction unit calculates the feature parameter based on the appropriate pitch value.
KR1020150183897A 2015-12-22 2015-12-22 Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same KR101671305B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020150183897A KR101671305B1 (en) 2015-12-22 2015-12-22 Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same
PCT/KR2016/014673 WO2017111386A1 (en) 2015-12-22 2016-12-14 Apparatus for extracting feature parameters of input signal, and speaker recognition apparatus using same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020150183897A KR101671305B1 (en) 2015-12-22 2015-12-22 Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same

Publications (1)

Publication Number Publication Date
KR101671305B1 true KR101671305B1 (en) 2016-11-02

Family

ID=57518247

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020150183897A KR101671305B1 (en) 2015-12-22 2015-12-22 Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same

Country Status (2)

Country Link
KR (1) KR101671305B1 (en)
WO (1) WO2017111386A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053837A (en) * 2017-12-28 2018-05-18 深圳市保千里电子有限公司 A kind of method and system of turn signal voice signal identification
KR20220065343A (en) 2020-11-13 2022-05-20 서울시립대학교 산학협력단 Apparatus for simultaneously performing spoofing attack detection and speaker recognition based on deep neural network and method therefor

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012716B (en) * 2021-02-26 2023-08-04 武汉星巡智能科技有限公司 Infant crying type identification method, device and equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050048214A (en) * 2003-11-19 2005-05-24 학교법인연세대학교 Method and system for pith synchronous feature generation of speaker recognition system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7684988B2 (en) * 2004-10-15 2010-03-23 Microsoft Corporation Testing and tuning of automatic speech recognition systems using synthetic inputs generated from its acoustic models
KR100933946B1 (en) * 2007-10-29 2009-12-28 연세대학교 산학협력단 Feature vector extraction method using adaptive selection of frame shift and speaker recognition system thereof
KR20100036893A (en) * 2008-09-30 2010-04-08 삼성전자주식회사 Speaker cognition device using voice signal analysis and method thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050048214A (en) * 2003-11-19 2005-05-24 학교법인연세대학교 Method and system for pith synchronous feature generation of speaker recognition system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
강지훈, 정상배, ‘유/무성음 구분 및 이중적 특징파라미터 결합을 이용한 화자인식 성능 개선’, 한국정보통신학회논문지, Vol.18, No.6, pp.1294~1301, June 2014.* *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108053837A (en) * 2017-12-28 2018-05-18 深圳市保千里电子有限公司 A kind of method and system of turn signal voice signal identification
KR20220065343A (en) 2020-11-13 2022-05-20 서울시립대학교 산학협력단 Apparatus for simultaneously performing spoofing attack detection and speaker recognition based on deep neural network and method therefor

Also Published As

Publication number Publication date
WO2017111386A1 (en) 2017-06-29

Similar Documents

Publication Publication Date Title
CN106935248B (en) Voice similarity detection method and device
Tiwari MFCC and its applications in speaker recognition
Sahidullah et al. A comparison of features for synthetic speech detection
US9536547B2 (en) Speaker change detection device and speaker change detection method
CN108281146B (en) Short voice speaker identification method and device
Rakesh et al. Gender Recognition using speech processing techniques in LABVIEW
US20130035933A1 (en) Audio signal processing apparatus and audio signal processing method
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium and terminal
Vyas A Gaussian mixture model based speech recognition system using Matlab
CN108682432B (en) Speech emotion recognition device
Chaudhary et al. Gender identification based on voice signal characteristics
WO2018095167A1 (en) Voiceprint identification method and voiceprint identification system
KR101671305B1 (en) Apparatus for extracting feature parameter of input signal and apparatus for recognizing speaker using the same
AboElenein et al. Improved text-independent speaker identification system for real time applications
Srinivas et al. Relative phase shift features for replay spoof detection system
EP0474496B1 (en) Speech recognition apparatus
Singh et al. Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition
Maazouzi et al. MFCC and similarity measurements for speaker identification systems
Sadjadi et al. Robust front-end processing for speaker identification over extremely degraded communication channels
Kaminski et al. Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models
Kumar et al. Effective preprocessing of speech and acoustic features extraction for spoken language identification
Natarajan et al. Segmentation of continuous Tamil speech into syllable like units
US20090063149A1 (en) Speech retrieval apparatus
JP4537821B2 (en) Audio signal analysis method, audio signal recognition method using the method, audio signal section detection method, apparatus, program and recording medium thereof
Sharma et al. Speech recognition of Punjabi numerals using synergic HMM and DTW approach

Legal Events

Date Code Title Description
GRNT Written decision to grant
FPAY Annual fee payment

Payment date: 20190930

Year of fee payment: 4