KR101005858B1

KR101005858B1 - Apparatus and method for adapting model parameters of speech recognizer by utilizing histogram equalization

Info

Publication number: KR101005858B1
Application number: KR1020090018909A
Authority: KR
Inventors: 서영주; 김회린
Original assignee: 한국과학기술원
Priority date: 2009-02-13
Filing date: 2009-03-05
Publication date: 2011-01-05
Also published as: KR20100092846A

Abstract

The present invention relates to an apparatus and a method for adapting acoustic model parameters of a speech recognizer. More particularly, the present invention relates to an average parameter of an acoustic model in a training environment using histogram equalization to ensure robust speech recognition performance in a noisy environment. An acoustic model parameter adaptation apparatus using histogram equalization, which eliminates acoustic inconsistencies between acoustic models by adapting to a test environment, and a method thereof are provided.

To this end, the present invention, in the apparatus for adapting the acoustic model parameters of the speech recognizer, the test cumulative distribution function for the test speech feature parameters from the test cumulative distribution function obtained for the test speech feature parameters extracted from the voice input from the outside A test cumulative distribution estimator for estimating an estimate; A training cumulative distribution estimator estimating a training cumulative distribution function estimate for the acoustic model average parameter from the training cumulative distribution functions obtained from the speech recognition feature parameters extracted from previously stored training speech data; A linear interpolation factor calculator for calculating a linear interpolation factor based on a training cumulative distribution number estimate estimated by the training cumulative distribution estimator and a test cumulative distribution function estimate estimated by the test cumulative distribution estimator; And an acoustic model parameter adaptor for performing linear interpolation of the acoustic model average parameter to the test speech feature parameter using histogram equalization based on the linear interpolation factor obtained from the linear interpolation factor calculator.

Speech Recognition, Histogram Equalization, Training Environment, Test Environment, Noise Environment, Acoustic Model, Average Parameter, Adaptive, Acoustic Mismatch Elimination

Description

Apparatus and method for adapting model parameters of speech recognizer by utilizing histogram equalization}

본 발명은 음성 인식기의 음향모델 파라메터를 적응시키는 장치 및 그 방법에 관한 것으로, 더욱 상세하게는 잡음 환경에 강인한 음성 인식 성능을 보장하기 위해 히스토그램 등화를 이용하여 훈련 환경에서의 음향모델의 평균 파라메터를 시험 환경에 적응시켜 음향모델간 음향 불일치를 제거하는, 히스토그램 등화를 이용한 음향모델 파라메터 적응 장치 및 그 방법에 관한 것이다.The present invention relates to an apparatus and a method for adapting acoustic model parameters of a speech recognizer. More particularly, the present invention relates to an average parameter of an acoustic model in a training environment using histogram equalization to ensure robust speech recognition performance in a noisy environment. The present invention relates to an acoustic model parameter adaptation apparatus using histogram equalization and a method for adapting a test environment to eliminate acoustic inconsistencies between acoustic models.

한편, 본 발명은 지식경제부 및 정보통신연구진흥원의 IT성장동력기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2008-S-001-01, 과제명: 대용량/대화형 분산/내장처리 음성인터페이스 요소기술 개발].On the other hand, the present invention is derived from the research conducted as part of the IT growth engine technology development project of the Ministry of Knowledge Economy and the Ministry of Information and Communication Research and Development. [Task management number: 2008-S-001-01, Task name: Large-capacity / interactive distribution / Embedded voice interface element technology development].

현재 개발된 대부분의 음성 인식기들은 조용한 음향 환경에서는 만족할 만한 음성 인식 성능을 보이지만, 주변 잡음이 존재하는 경우에 음성 인식 성능이 심각 하게 저하되고 있는 형편이다.Most speech recognizers developed at present show satisfactory speech recognition performance in a quiet acoustic environment, but speech recognition performance is severely degraded in the presence of ambient noise.

예컨대, 훈련 환경[즉 음성 인식기 개발에 사용된 훈련 음성 데이터를 수집할 때의 음향 환경]과, 잡음의 부가나 채널 왜곡 현상을 겪는 시험 환경[즉 음성 인식기로 음성 인식을 수행할 때의 음향 환경]간에 존재하는 음향 불일치(acoustic mismatch)로 인해 음성 인식 성능 저하 현상이 발생된다.For example, a training environment (i.e., an acoustic environment when collecting training speech data used to develop a speech recognizer), and a test environment (that is, an acoustic environment when performing speech recognition with a speech recognizer) that experiences the addition of noise or channel distortion. ] Acoustic mismatch between the voice recognition performance degradation occurs.

상기와 같은 음향 불일치를 감소시켜 시험 환경에서의 음성 인식 성능 저하를 방지하기 위한 종래기술로서, 음성 특징 영역에서 잡음 음성으로부터 양질의 음성으로 보상하는 '특징 보상 기법'에 관한 연구가 활발히 진행되고 있다.As a conventional technology for reducing the speech mismatch in the test environment by reducing the acoustic mismatch as described above, a study on the 'feature compensation technique' for compensating the speech quality from the noisy voice in the voice feature area is actively conducted. .

그런데, 종래기술에서는 잡음의 부가와 음성 인식을 위한 음성 특징 추출 과정이 수행되어야 되는데, 이는 필연적으로 음향 정보의 손실을 야기하기 때문에 잡음 음성으로부터 양질의 음성으로 완벽하게 복원하기 어려운 문제점이 있다.However, in the prior art, a speech feature extraction process for adding noise and recognizing speech has to be performed, which inevitably causes loss of sound information, and thus, it is difficult to completely recover from a noisy voice to a good quality voice.

한편, 양질의 음성으로 훈련된 음향모델도 적응 데이터량만 충분하다면 시험 환경에 일치된 음향모델로 완벽하게 적응될 수 있다. 예컨대, 음성 특징과 음향모델에는 잡음의 부가로 인한 음향 정보의 손실이 다소 발생되지만 음성 인식 과정 수행에 있어 음향 불일치가 야기되지는 않는다.On the other hand, an acoustic model trained with high quality voice can be perfectly adapted to an acoustic model matched to the test environment if the amount of adaptive data is sufficient. For example, a loss of sound information due to the addition of noise occurs in the voice feature and the acoustic model, but no acoustic inconsistency occurs in performing the speech recognition process.

따라서, 특징 보상 기법에 비해 상대적으로 우수한 음향모델 적응 기법을 토대로 음향 불일치를 제거하여 잡음 환경에 강인한 음성 인식 성능을 보장하기 위한 기술이 절실히 요구되고 있다.Therefore, there is an urgent need for a technique for guaranteeing a speech recognition performance that is robust to a noisy environment by removing acoustic inconsistencies based on an acoustic model adaptation technique which is relatively superior to a feature compensation technique.

한편, 종래기술에 따른 음향모델 적응 기법은 대단히 복잡한 연산/계산 과정을 수행해야 되는데, 이는 실시간으로 동작하기 어려워 실제로 음성 인식기에 적용하기 어려운 문제점이 있다.On the other hand, the acoustic model adaptation technique according to the prior art has to perform a very complex calculation / calculation process, which is difficult to operate in real time, there is a problem that is difficult to apply to the speech recognizer.

따라서, 간단한 리소스, 간단한 연산/계산 과정으로 실시간 음성 인식기 동작을 통해 잡음 환경에 강인한 음성 인식 성능을 보장할 수 있는 기술이 절실히 요구되고 있다.Therefore, there is an urgent need for a technology capable of guaranteeing robust speech recognition performance in a noisy environment through real-time speech recognizer operation with simple resources and simple computation / calculation processes.

이에, 본 발명은 상기와 같은 문제점을 해결하고 상기와 같은 요구에 부응하기 위하여 제안된 것으로, 잡음 환경에 강인한 음성 인식 성능을 보장하기 위해 히스토그램 등화를 이용하여 훈련 환경에서의 음향모델의 평균 파라메터를 시험 환경에 적응시켜 음향모델간 음향 불일치를 제거하는, 히스토그램 등화를 이용한 음향모델 파라메터 적응 장치 및 그 방법을 제공하는데 그 목적이 있다.Accordingly, the present invention has been proposed to solve the above problems and meet the above demands, and uses the histogram equalization to determine the average parameter of the acoustic model in the training environment to ensure the speech recognition performance robust to the noise environment. It is an object of the present invention to provide an acoustic model parameter adaptation apparatus using histogram equalization and a method for adapting a test environment to eliminate acoustic inconsistencies between acoustic models.

본 발명의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects and advantages of the present invention which are not mentioned can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. Also, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

상기의 목적을 달성하기 위한 본 발명의 장치는, 음성 인식기의 음향모델 파 라메터를 적응시키는 장치에 있어서, 외부로부터 입력받은 음성으로부터 추출된 시험 음성 특징 파라메터에 대해 구한 시험 누적분포함수로부터 시험 음성 특징 파라메터에 대한 시험 누적분포함수 추정치를 추정하는 시험 누적분포함수 추정기; 기 저장되어 있는 훈련 음성 데이터로부터 추출된 음성 인식용 특징 파라메터에 대해 구한 훈련 누적분포함수로부터 음향모델 평균 파라메터에 대한 훈련 누적분포함수 추정치를 추정하는 훈련 누적분포함수 추정기; 상기 훈련 누적분포함수 추정기에서 추정한 훈련 누적분포함수 추정치와 상기 시험 누적분포함수 추정기에서 추정한 시험 누적분포함수 추정치를 토대로 선형 보간 인자를 구하는 선형 보간 인자 계산기; 및 상기 선형 보간 인자 계산기에서 구한 선형 보간 인자를 토대로 히스토그램 등화를 이용하여 상기 음향모델 평균 파라메터를 상기 시험 음성 특징 파라메터에 대해 선형 보간을 수행하는 음향모델 파라메터 적응기를 포함한다.The apparatus of the present invention for achieving the above object, in the apparatus for adapting the acoustic model parameters of the speech recognizer, the test speech from the test cumulative distribution function obtained for the test speech feature parameters extracted from the voice input from the outside A test cumulative distribution estimator for estimating a test cumulative distribution estimate for the feature parameter; A training cumulative distribution estimator estimating a training cumulative distribution function estimate for the acoustic model average parameter from the training cumulative distribution functions obtained from the speech recognition feature parameters extracted from previously stored training speech data; A linear interpolation factor calculator for calculating a linear interpolation factor based on a training cumulative distribution number estimate estimated by the training cumulative distribution estimator and a test cumulative distribution function estimate estimated by the test cumulative distribution estimator; And an acoustic model parameter adaptor for performing linear interpolation of the acoustic model average parameter to the test speech feature parameter using histogram equalization based on the linear interpolation factor obtained from the linear interpolation factor calculator.

한편, 본 발명의 방법은, 음성 인식기의 음향모델 파라메터를 적응시키는 방법에 있어서, 외부로부터 시험 발화 음성을 입력받으면 상기 시험 발화 음성으로부터 추출된 시험 음성 특징 파라메터에 대한 시험 누적분포함수를 구하는 단계; 상기 구한 시험 누적분포함수로부터 시험 음성 특징 파라메터에 대한 시험 누적분포함수 추정치를 추정하는 단계; 기 저장되어 있는 훈련 음성 데이터로부터 추출된 음성 인식용 특징 파라메터에 대한 훈련 누적분포함수를 구하는 단계; 상기 훈련 누적분포함수로부터 음향모델 평균 파라메터에 대한 훈련 누적분포함수 추정치를 추정하는 단계; 상기 추정한 훈련 누적분포함수 추정치와 상기 추정한 시험 누적분포함수 추정치를 토대로 선형 보간 인자를 구하는 단계; 및 상기 구한 선형 보간 인자를 토대로 히스토그램 등화를 이용하여 상기 음향모델 평균 파라메터를 상기 시험 음성 특징 파라메터에 대해 선형 보간을 수행하는 단계를 포함한다.On the other hand, the method of the present invention, in the method for adapting the acoustic model parameters of the speech recognizer, the method comprising: obtaining a test cumulative distribution function for the test speech feature parameters extracted from the test speech voice when receiving a test speech voice from the outside; Estimating a test cumulative distribution function estimate for a test speech feature parameter from the obtained test cumulative distribution function; Obtaining a training cumulative distribution function for a feature for speech recognition extracted from previously stored training speech data; Estimating a training cumulative distribution function estimate for an acoustic model mean parameter from the training cumulative distribution function; Obtaining a linear interpolation factor based on the estimated training cumulative distribution function estimate and the estimated test cumulative distribution function estimate; And performing linear interpolation of the acoustic model mean parameter to the test speech feature parameter using histogram equalization based on the obtained linear interpolation factor.

또한, 본 발명의 방법은, 상기 시험 발화 음성으로부터 샘플 공분산 행렬을 구하는 단계; 및 상기 구한 샘플 공분산 행렬을 신호대잡음비에 따라 선형 보간을 수행하여 상기 음향모델의 공분산 행렬을 적응시키는 단계를 더 포함한다.The method also includes obtaining a sample covariance matrix from the test speech voice; And adapting the covariance matrix of the acoustic model by performing linear interpolation on the obtained sample covariance matrix according to the signal-to-noise ratio.

상기와 같은 본 발명은 시험 환경과 훈련 환경간의 음향 불일치를 제거하여 잡음 환경에 강인한 음성 인식 성능을 보장(주변 잡음에 의한 음성 인식기의 성능 저하 개선)할 수 있는 효과가 있다.The present invention as described above has the effect that can eliminate the sound mismatch between the test environment and the training environment to ensure the robust speech recognition performance in the noise environment (improved performance degradation of the speech recognizer due to ambient noise).

또한, 본 발명은 간단한 리소스, 간단한 연산/계산 과정으로 실시간 음성 인식기 동작을 통해 잡음 환경에 강인한 음성 인식 성능을 보장할 수 있는 효과가 있다.In addition, the present invention has the effect of ensuring the speech recognition performance robust to the noise environment through the operation of the real-time speech recognizer with a simple resource, a simple operation / calculation process.

또한, 본 발명은 숫자음과 같은 소규모 어휘 뿐만 아니라 중대규모 어휘의 음성 인식 영역에서도 우수한 성능 개선의 효과가 있다.In addition, the present invention has an effect of excellent performance in the speech recognition region of the medium-to-large vocabulary as well as the small vocabulary such as numerals.

상술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되어 있는 상세한 설명을 통하여 보다 명확해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다.The above objects, features, and advantages will become more apparent from the detailed description given hereinafter with reference to the accompanying drawings, and accordingly, those skilled in the art to which the present invention pertains may share the technical idea of the present invention. It will be easy to implement. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.

본 발명에서는 잡음 환경에 강인한 음성 인식 성능을 보장하기 위해 히스토그램 등화(histogram equalization)를 이용하여 훈련 환경에서의 음향모델의 평균 파라메터를 시험 환경에 적응시키는 알고리즘[일명 히스토그램 등화 기반 음향모델 파라메터 적응 알고리즘 또는 히스토그램 등화 기반 환경 모델 적응 알고리즘]을 제시한다.In the present invention, to guarantee the speech recognition performance that is robust to the noise environment, the algorithm that adapts the average parameter of the acoustic model in the training environment to the test environment using histogram equalization (also known as histogram equalization-based acoustic model parameter adaptation algorithm or Histogram equalization-based environment model adaptation algorithm].

또한, 본 발명에서는 입력 발화 레벨에서 공분산 행렬을 환경 적응시키는 알고리즘, 예컨대 시험 발화 음성으로부터 구한 샘플 공분산 행렬을 신호대잡음비에 따라 선형 보간 처리하여 음향모델의 공분산 행렬을 적응시키는 알고리즘도 제시한다.In addition, the present invention proposes an algorithm for environmentally adapting a covariance matrix at an input speech level, for example, an algorithm for adapting a covariance matrix of an acoustic model by linearly interpolating a sample covariance matrix obtained from a test speech speech according to a signal-to-noise ratio.

이하, 첨부된 도면들을 함께 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 히스토그램 등화를 이용한 음향모델 파라메터 적응 장치에 대한 일실시예 구성도이고, 도 2는 본 발명에 따른 히스토그램 등화를 이용한 음향모델 파라메터 적응 방법에 대한 일실시예 흐름도이다. 이하, 본 발명의 이해를 도모하고자 도 1 및 도 2를 함께 참조하여 본 발명에서 제시하는 알고리즘에 대해 구체적으로 설명하기로 한다.1 is a configuration diagram of an acoustic model parameter adaptation apparatus using histogram equalization according to the present invention, and FIG. 2 is a flowchart illustrating an acoustic model parameter adaptation method using histogram equalization according to the present invention. Hereinafter, the algorithm proposed by the present invention will be described in detail with reference to FIGS. 1 and 2 together for the purpose of understanding the present invention.

도 1에 도시된 바와 같이, 본 발명에 따른 히스토그램 등화를 이용한 음향모델 파라메터 적응 장치[이하 '히스토그램 등화 기반 음향모델 파라메터 적응 장치'라 함]는 시험 누적분포함수 추정기(11), 훈련 누적분포함수 추정기(12), 선형 보간 인자 계산기(13), 음향모델 파라메터 적응기(14), 훈련 음성 DB(15), 음향모델 DB(16) 등을 포함한다.As shown in FIG. 1, the acoustic model parameter adaptation apparatus using the histogram equalization according to the present invention (hereinafter referred to as the "histogram equalization-based acoustic model parameter adaptation apparatus") is a test cumulative distribution estimator 11 and a training cumulative distribution function. Estimator 12, linear interpolation factor calculator 13, acoustic model parameter adaptor 14, training speech DB 15, acoustic model DB 16 and the like.

한편, 본 발명의 히스토그램 등화 기반 음향모델 파라메터 적응 장치는 음성 인식기에 구현되는 것이 바람직하며, 음성 인식기에 구비되는 공지의 구성요소, 예컨대 마이크, 신호 처리부, 음성 특징 파라메터 추출기 등에 대해서는 그 상세한 설명은 생략하기로 한다.Meanwhile, the histogram equalization-based acoustic model parameter adaptation apparatus of the present invention is preferably implemented in a speech recognizer, and a detailed description thereof is omitted for known components provided in the speech recognizer, such as a microphone, a signal processor, and a speech feature parameter extractor. Let's do it.

상기 시험 누적분포함수 추정기(11)는 외부로부터 입력받은 음성[사용자에 의한 시험 발화 음성]으로부터 추출된 시험 음성 특징 파라메터들에 대한 시험 누적분포함수를 구하고, 이 시험 누적분포함수로부터 개별 시험 음성 특징 파라메터에 대한 시험 누적분포함수 추정치를 추정한다[도 2의 '201'].The test cumulative distribution estimator 11 obtains a test cumulative distribution function for test voice feature parameters extracted from an externally input voice [test speech voice by a user] and calculates an individual test voice feature from the test cumulative distribution function. The test cumulative distribution estimates for the parameters are estimated ('201' in FIG. 2).

즉, 시험 누적분포함수 추정기(11)에서는 다음의 [수학식 1]을 이용해 시험 누적분포함수로부터 개별 시험 음성 특징 파라메터에 대한 시험 누적분포함수 추정치를 추정한다.That is, the test cumulative distribution estimator 11 estimates the test cumulative distribution function estimates for the individual test speech feature parameters from the test cumulative distribution function using Equation 1 below.

여기서, N은 시험 발화 음성을 구성하는 음성 프레임들의 수를 나타내고, R(y_n)은 시험 발화 음성을 구성하는 음성 프레임들로부터 추출된 음성 특징 파라메터들 중에서 n번째 음성 프레임의 음성 특징 파라메터인 y_n의 서열(rank)로서 하기의 [수학식 2]에서의 r과 같이 시험 발화 음성에 포함된 음성 특징 파라메터들을 오름차순으로 정렬하여 구해진다.Here, N denotes the number of speech frames constituting the test speech voice, and R (y _n ) is y, the speech characteristic parameter of the nth speech frame among the speech feature parameters extracted from the speech frames constituting the test speech speech. As the rank of _n , negative characteristic parameters included in the test speech negative, such as r in Equation 2 below, are obtained by sorting in ascending order.

여기서, T(r)은 음성 특징 파라메터 y_T(r)의 서열이 r일 경우에 y_T(r)의 원래의 시간 프레임 인덱스를 나타낸다.Here, T (r) represents the original time frame index of y _{T (r)} when the sequence of negative feature parameter y _{T (r)} is r.

상기 훈련 누적분포함수 추정기(12)는 훈련 음성 DB(15)에 저장되어 있는 훈련 음성 데이터, 예컨대 음성 인식기 개발에 사용된 훈련 음성 데이터로부터 추출된 음성 인식용 특징 파라메터들에 대한 훈련 누적분포함수를 구하고, 이 훈련 누적분포함수로부터 음향모델 DB(16)의 평균 파라메터에 대한 훈련 누적분포함수 추정치를 계수별로 추정한다[도 2의 '202'].The training cumulative distribution estimator 12 calculates a training cumulative distribution function for training speech data stored in the training speech DB 15, for example, feature parameters for speech recognition extracted from training speech data used in developing a speech recognizer. From this training cumulative distribution function, the training cumulative distribution function estimate for the average parameter of the acoustic model DB 16 is estimated for each coefficient ('202' in FIG. 2).

즉, 훈련 누적분포함수 추정기(12)에서는 다음의 [수학식 3]을 이용해 훈련 누적분포함수로부터 음향모델 평균 파라메터에 대한 훈련 누적분포함수 추정치를 계수별로 추정한다.That is, the training cumulative distribution estimator 12 estimates the training cumulative distribution function estimate for the acoustic model mean parameter from the training cumulative distribution function for each coefficient by using Equation 3 below.

여기서,

는 음향모델 평균 파라메터 μ에 대한 훈련 누적분포함수 추정치를 나타내고, B_X(μ)는 훈련 히스토그램에서 음향모델 평균 파라메터 μ가 속한 빈(bin)의 인덱스를 나타낸다. P_X(b)는 훈련 히스토그램에서 b번째 빈의 확률값을 나타내는 확률밀도함수로서 하기의 [수학식 4]를 이용해 구해진다.here,

Denotes the training cumulative distribution function estimate for the acoustic model mean parameter μ, and B _X (μ) represents the index of the bin to which the acoustic model mean parameter μ belongs in the training histogram. P _X (b) is a probability density function representing the probability value of the b-th bin in the training histogram, and is obtained using Equation 4 below.

여기서, U는 훈련 데이터를 구성하는 발화 음성들의 수이다. L_X(u)는 u번째 훈련 발화 음성에서 음성 프레임들의 수이다. Q(cond)는 cond(조건)가 참일 때 1, 거짓일 경우 0을 가지는 함수이다.

는 u번째 훈련 발화 음성의 l번째 음성 프레임에 해당하는 음성 특징 파라메터이다. H_X(b)는 훈련 히스토그램에서 b번째 빈을 나타낸다.Where U is the number of spoken voices that make up the training data. L _X (u) is the number of voice frames in the u th training spoken voice. Q (cond) is a function that has 1 when cond is true and 0 if it is false.

Is a speech feature parameter corresponding to the l th speech frame of the u th training speech voice. H _X (b) represents the b th bin in the training histogram.

상기 선형 보간 인자 계산기(13)는 훈련 누적분포함수 추정기(12)에서 구한 음향모델 평균 파라메터에 대한 훈련 누적분포함수 추정치와 시험 누적분포함수 추 정기(11)에서 구한 개별 시험 음성 특징 파라메터에 대한 시험 누적분포함수 추정치를 입력받아, 음향모델 파라메터 적응에 사용할 선형 보간 인자를 구한다[도 2의 '203'].The linear interpolation factor calculator 13 tests the training cumulative distribution function estimates for the acoustic model average parameters obtained by the training cumulative distribution estimator 12 and the individual test speech feature parameters obtained by the test cumulative distribution estimator 11. The cumulative distribution function estimate is input to obtain a linear interpolation factor to be used for acoustic model parameter adaptation ('203' in FIG. 2).

즉, 선형 보간 인자 계산기(13)에서는 히스토그램 등화에 의한 음향모델 파라메터 적응 수행 시 그 적응시키고자 하는 음향모델 평균 파라메터의 훈련 누적분포함수 추정치와 동일한 값의 시험 누적분포함수 추정치를, 시험 음성 특징 파라메터들에 대한 이산 누적분포함수값들로부터 선형 보간(linear interpolation)으로 구할 경우에, 그 필요한 선형 보간 인자를 다음의 [수학식 5]를 이용해 구한다.That is, in the linear interpolation factor calculator 13, when performing acoustic model parameter adaptation by histogram equalization, the test cumulative distribution function estimate equal to the training cumulative distribution function estimate of the acoustic model mean parameter to be adapted, the test speech feature parameter In the case of linear interpolation from the discrete cumulative distribution function values for these fields, the necessary linear interpolation factor is obtained using Equation 5 below.

α는 선형 보간 인자를 나타내고,

는 음향모델 평균 파라메터 μ에 대한 훈련 누적분포함수 추정치를 나타내고,

은 y_T(m)에 대한 시험 누적분포함수 추정치를 나타내고, y_T(m)은 시험 음성 특징 파라메터를 나타내고, m은 y_T(m)의 크기 서열을 나타내고, T(m)은 서열이 m인 y_T(m)의 원래의 프레임 인덱스를 나타내고, N은 시험 발화 음성을 구성하는 음성 프레임들의 수를 나타내고, μ는 음향모델 평균 파라메터를 나타낸다.α represents a linear interpolation factor,

Represents an estimate of the training cumulative distribution function for the acoustic model mean parameter μ,

Denotes a test cumulative distribution function estimate for _{_{y T (m), y T}} (m) denotes the test speech feature parameter, m denotes the size of the sequence of _{y T (m), T (} m) is a sequence m Y denotes the original frame index of _{T (m)} , N denotes the number of speech frames constituting the test speech speech, and μ denotes the acoustic model mean parameter.

상기 음향모델 파라메터 적응기(14)는 선형 보간 인자 계산기(13)에서 구한 선형 보간 인자를 토대로, 히스토그램 등화를 이용해 음향모델 DB(16)의 음향모델 평균 파라메터를 시험 누적분포함수 추정기(11)로부터 입력받은 시험 음성 특징 파라메터들에 대해 선형 보간을 수행하여 시험 음향 환경에 적응시킨다[도 2의 '204'].The acoustic model parameter adaptor 14 inputs the acoustic model average parameter of the acoustic model DB 16 from the test cumulative distribution estimator 11 using histogram equalization based on the linear interpolation factor obtained from the linear interpolation factor calculator 13. Linear interpolation is performed on the received test speech feature parameters to adapt to the test acoustic environment ('204' in FIG. 2).

즉, 음향모델 파라메터 적응기(14)에서는 히스토그램 등화에 기반해 다음의 [수학식 6]을 이용해 음향모델 평균 파라메터 μ를 그 서열이 m, m+1인 두 시험 음성 특징 파라메터들에 대해 선형 보간을 수행한다.That is, in the acoustic model parameter adaptor 14, based on the histogram equalization, Equation 6 is used to perform linear interpolation on the acoustic model mean parameter μ and two test speech feature parameters whose sequences are m and m + 1. To perform.

여기서,

는 시험 누적분포함수 C_Y의 역함수를 나타내고, α는 선형 보간 인자를 나타내고,

는 음향모델 평균 파라메터 μ에 대한 훈련 누적분포함수 추정치를 나타내고, y_T(m)은 시험 음성 특징 파라메터를 나타내고, m은 y_T(m)의 크기 서열을 나타내고, T(m)은 서열이 m인 y_T(m)의 원래의 프레임 인덱스를 나타낸다.here,

Denotes the inverse of the test cumulative distribution function C _Y , α denotes the linear interpolation factor,

Represents the training cumulative distribution estimate for the acoustic model mean parameter μ, y _{T (m)} represents the test negative feature parameter, m represents the size sequence of y _{T (m)} , and T (m) represents the sequence m Denotes the original frame index of y _{T (m)} .

한편, 본 발명에서는 입력 발화 레벨에서 공분산 행렬을 환경 적응시키는 알고리즘도 제시한다. 이를 구체적으로 설명하면 다음과 같다.Meanwhile, the present invention also proposes an algorithm for environment adapting the covariance matrix at the input speech level. This will be described in detail as follows.

음성신호에 대한 잡음의 부가가 증가되면 스펙트럼의 백색화 현상에 의해 잡음 음성의 켑스트럼 계수와 같은 특징들은 다이나믹 레인지(dynamic range)가 줄어 들게 된다. 예컨대, 잡음의 부가가 증가됨에 따라 음향모델의 평균 파라메터 뿐만 아니라 공분산 행렬에 대한 모델 적응의 중요성도 필요하다. 여기서, 음성 인식기의 음향모델의 공분산 행렬은

로 표현된다.When the noise is added to the speech signal, the whitening of the spectrum reduces the dynamic range of features such as the spectral coefficient of the noise speech. For example, as the addition of noise increases, the importance of model adaptation to the covariance matrix as well as the average parameters of the acoustic model is necessary. Here, the covariance matrix of the acoustic model of the speech recognizer

It is expressed as

이에, 본 발명에서는 훈련 공분산 행렬과 입력 발화 레벨 샘플 공분산 행렬을 신호대잡음비에 비례하여 선형 보간을 수행한다. 이는 다음의 [수학식 7]로 표현된다.Accordingly, the present invention performs linear interpolation of the training covariance matrix and the input speech level sample covariance matrix in proportion to the signal-to-noise ratio. This is expressed by Equation 7 below.

여기서,

은 신호대잡음비

에 선형 비례하는 스무딩 인자이고,

는 입력 발화 레벨에 대해 구한 샘플 공분산 행렬이다.here,

Silver Signal-to-Noise Ratio

Is a smoothing factor linearly proportional to

Is a sample covariance matrix obtained for the input speech level.

다음으로, 전술한 바와 같은 본 발명에서 제시한 알고리즘의 성능에 대해 도 3a 및 도 3b를 참조하여 설명하기로 한다.Next, the performance of the algorithm proposed in the present invention as described above will be described with reference to FIGS. 3A and 3B.

도 3a 및 도 3b는 본 발명에서 제시한 알고리즘 성능 평가를 보여주기 위한 일실시예 그래프이다.3A and 3B are graphs of one embodiment for illustrating an algorithm performance evaluation presented in the present invention.

본 발명의 알고리즘[도면의 'HEQ-MA']의 성능을 평가하기 위해 'AURORA2 음성 DB'와 '한국어 POW(Phonetically Optimized Word) 음성 DB'를 사용하여 잡음 환경에서 음성 인식을 수행하였다.In order to evaluate the performance of the algorithm [HEQ-MA] of the present invention, the speech recognition was performed in a noisy environment using the 'AURORA2 speech DB' and 'Korean POW (Phonetically Optimized Word) speech DB'.

그리고, AURORA2 음성 DB와 한국어 POW 음성 DB의 양질의 훈련 음성 데이터를 사용하여 두 베이스라인(baseline) 음성 인식기들의 음향모델들을 별도로 훈련시켰다. 음성 인식 평가에서는 AURORA2의 세 가지 평가셋과 AURORA 가산성 잡음 8종을 부가시켜 생성한 두 가지 POW 잡음 음성 평가셋을 사용하였다.The acoustic models of the two baseline speech recognizers were separately trained using high-quality training voice data from the AURORA2 speech DB and the Korean POW speech DB. In speech recognition evaluation, two evaluation sets of AURORA2 and two POW noise speech evaluation sets created by adding 8 kinds of AURORA additive noise were used.

음성 인식 실험에서는 다음과 같은 AURORA2 표준 실험 체계를 적용하였다.In the speech recognition experiment, the following AURORA2 standard experimental system was applied.

음성 특징 추출에서는 프레임 로그 에너지를 포함한 39차 MFCC(Mel-Frequency Cepstral Coefficient) 추출 과정을 적용하였으며, 이 경우에 정한 프레 임 길이와 간격은 각각 25[ms]와 10[ms]였다. AURORA2 DB용 음성 인식기는 숫자음과 관련된 13개의 단어들로 이루어졌으며, 개별 단어는 'whole-word 모델'로 구성되었고, 숫자음 HMM(히든 마코프 모델, Hidden Markov Model)은 16개의 상태로, 묵음 HMM과 휴지음 HMM은 각각 3개와 1개의 상태로 구성되었다. 그리고, POW DB용 음성 인식기는 6,776개의 'tied-state 트라이폰 HMM'들로 구성되었다. 상기 두 음성 인식기의 HMM들은 모두 대각선 공분산 행렬로 모델링되었고, 개별 상태는 각각 3개와 8개의 가우스 혼합 모델로 구성되었다.For speech feature extraction, the 39th order MFCC (Mel-Frequency Cepstral Coefficient) extraction process including frame log energy was applied. In this case, the frame length and interval were 25 [ms] and 10 [ms], respectively. The speech recognizer for the AURORA2 DB consists of 13 words related to the digits, each word consists of a 'whole-word model', and the digits HMM (Hidden Markov Model) has 16 states. HMM and resting HMM consisted of three and one states, respectively. The speech recognizer for the POW DB consists of 6,776 'tied-state triphone HMMs'. The HMMs of the two speech recognizers were both modeled as diagonal covariance matrices, and the individual states consisted of three and eight Gaussian mixture models, respectively.

그리고, 히스토그램 등화를 위한 훈련 누적분포함수 추정에서는 히스토그램 빈의 수를 64로 정하였고, 시험 누적분포함수 추정에서는 차수 통계(order-statistics) 방식을 사용하였다. 히스토그램 등화는 MFCC 39차 계수들에 대해 계수 별로 적용하였다.In the estimation of the training cumulative distribution function for histogram equalization, the number of histogram bins was set to 64, and the order-statistics method was used to estimate the cumulative distribution test. Histogram equalization is applied coefficient by coefficient for the MFCC 39th order coefficients.

도 3a에는 신호대잡음비에 따른 AURORA2 잡음 음성에 대한 본 발명의 알고리즘[도면의 'HEQ-MA'] 성능과 특징 보상 알고리즘[도면의 'HEQ-FC'] 성능의 비교 결과가 도시되어 있다. 예컨대, 도 3a는 AURORA2 시험 셋 A, B, C에 대해 음성 인식 성능을 평균 단어 인식률(word accuracy)로 나타낸 것이다.FIG. 3A shows a result of comparing the performance of the inventive algorithm [HEQ-MA] in the AURORA2 noise speech according to the signal-to-noise ratio and the performance of the feature compensation algorithm [HEQ-FC in FIG.]. For example, FIG. 3A shows speech recognition performance in terms of average word accuracy for AURORA2 test sets A, B, and C. FIG.

도 3a에 도시된 바와 같이, 대부분의 신호대잡음비 영역에서 본 발명의 알고리즘이 베이스라인(baseline) 음성 인식기 뿐만 아니라 주지의 특징 보상 알고리즘에 비해 매우 우수한 성능을 나타냄을 알 수 있다.As shown in FIG. 3A, it can be seen that in most signal-to-noise ratio domains, the algorithm of the present invention exhibits very good performance compared to not only the baseline speech recognizer but also known feature compensation algorithms.

도 3b에는 신호대잡음비에 따른 POW 잡음 음성에 대한 본 발명의 알고리즘[도면의 'HEQ-MA'] 성능과 특징 보상 알고리즘[도면의 'HEQ-FC'] 성능의 비교 결과 가 도시되어 있다. 예컨대, 도 3b는 POW 잡음음성 시험 셋 A 와 B에 대해 음성 인식 성능을 평균 단어 인식률(word accuracy)로 나타낸 것이다. 여기서, POW DB는 3,848개의 단어로 구성된 음성 DB로서 중대규모 어휘 수준의 음성 인식 영역에서 그 성능 평가를 위해 사용되었다. FIG. 3B shows a result of comparing the performance of the inventive algorithm [HEQ-MA] in the POW noise speech according to the signal-to-noise ratio with the feature compensation algorithm [HEQ-FC in FIG.]. For example, FIG. 3B shows the speech recognition performance as the average word accuracy for POW noise speech test sets A and B. FIG. Here, the POW DB is a speech DB consisting of 3,848 words and was used to evaluate its performance in the speech recognition region of medium to large vocabulary level.

도 3b에 도시된 바와 같이, 중대규모 어휘 수준의 음성 인식 영역에서도 본 발명의 알고리즘이 베이스라인(baseline) 음성 인식기 뿐만 아니라 주지의 특징 보상 알고리즘에 비해 매우 우수한 성능을 나타냄을 알 수 있다.As shown in FIG. 3B, it can be seen that the algorithm of the present invention exhibits excellent performance in comparison with not only a baseline speech recognizer but also a well-known feature compensation algorithm in a speech recognition region of medium to large lexical level.

한편, 다음의 [표 1]은 AURORA2 잡음 음성에 대한 본 발명의 알고리즘[표의 'HEQ-MA']과 특징 보상 알고리즘[표의 'HEQ-FC']의 평균 단어 오인식률[%]을 나타낸다.On the other hand, Table 1 below shows the average word misrecognition rate [%] of the algorithm of the present invention ('HEQ-MA' in the table) and the feature compensation algorithm ('HEQ-FC' in the table) for the AURORA2 noise speech.

한편, 다음의 [표 2]는 POW 잡음 음성에 대한 본 발명의 알고리즘[표의 'HEQ-MA']과 특징 보상 알고리즘[표의 'HEQ-FC']의 평균 단어 오인식률[%]을 나타낸다.On the other hand, the following [Table 2] shows the average word misrecognition rate [%] of the algorithm of the present invention ('HEQ-MA' in the table) and the feature compensation algorithm ('HEQ-FC' in the table) for the POW noise speech.

[표 1] 및 [표 2]를 통해 확인할 수 있듯이, AURORA2와 POW 잡음 음성에 대해 본 발명의 알고리즘이 베이스라인(baseline) 음성 인식기에 비해 각각 62.83[%]와 43.04[%]의 오류 감소를 나타내었다. 즉, 본 발명의 알고리즘은 숫자음과 같은 소규모 어휘 뿐만 아니라 중대규모 어휘의 음성 인식 영역에서도 우수한 성능 개선 효과를 나타냄을 알 수 있다.As can be seen from [Table 1] and [Table 2], the algorithm of the present invention for AURORA2 and POW noise speech reduced error reduction of 62.83 [%] and 43.04 [%], respectively, compared to the baseline speech recognizer. Indicated. In other words, it can be seen that the algorithm of the present invention exhibits an excellent performance improvement effect not only in small vocabulary such as numerals but also in speech recognition region of medium and large vocabulary.

한편, 전술한 바와 같은 본 발명의 방법은 컴퓨터 프로그램으로 작성이 가능하다. 그리고 상기 프로그램을 구성하는 코드 및 코드 세그먼트는 당해 분야의 컴퓨터 프로그래머에 의하여 용이하게 추론될 수 있다.　또한, 상기 작성된 프로그램은 컴퓨터가 읽을 수 있는 기록매체(정보저장매체)에 저장되고, 컴퓨터에 의하여 판독되고 실행됨으로써 본 발명의 방법을 구현한다. 그리고 상기 기록매체는 컴퓨터가 판독할 수 있는 모든 형태의 기록매체를 포함한다.On the other hand, the method of the present invention as described above can be written in a computer program. And the code and code segments constituting the program can be easily inferred by a computer programmer in the art. In addition, the written program is stored in a computer-readable recording medium (information storage medium), and read and executed by a computer to implement the method of the present invention. The recording medium may include any type of computer readable recording medium.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

도 1은 본 발명에 따른 히스토그램 등화를 이용한 음향모델 파라메터 적응 장치에 대한 일실시예 구성도.1 is a configuration diagram of an acoustic model parameter adaptation apparatus using histogram equalization according to the present invention.

도 2는 본 발명에 따른 히스토그램 등화를 이용한 음향모델 파라메터 적응 방법에 대한 일실시예 흐름도.2 is a flowchart illustrating an acoustic model parameter adaptation method using histogram equalization according to the present invention.

도 3a 및 도 3b는 본 발명에서 제시한 알고리즘 성능 평가를 보여주기 위한 일실시예 그래프.Figures 3a and 3b is an embodiment graph for showing the algorithm performance evaluation presented in the present invention.

* 도면의 주요 부분에 대한 부호 설명* Explanation of symbols on the main parts of the drawing

11 : 시험 누적분포함수 추정기11: test cumulative distribution function estimator

12 : 훈련 누적분포함수 추정기12: Train Cumulative Distribution Estimator

13 : 선형 보간 인자 계산기13: linear interpolation factor calculator

14 : 음향모델 파라메터 적응기14: Acoustic model parameter adaptor

15 : 훈련 음성 DB15: training voice DB

16 : 음향모델 DB16: acoustic model DB

Claims

In the device for adapting the acoustic model parameters of the speech recognizer,

A test cumulative distribution estimator for estimating a test cumulative distribution function estimate for the test speech characteristic parameter from the test cumulative distribution function obtained for the test speech feature parameter extracted from an externally input voice;

A training cumulative distribution estimator estimating a training cumulative distribution function estimate for the acoustic model average parameter from the training cumulative distribution functions obtained from the speech recognition feature parameters extracted from previously stored training speech data;

A linear interpolation factor calculator for calculating a linear interpolation factor based on a training cumulative distribution number estimate estimated by the training cumulative distribution estimator and a test cumulative distribution function estimate estimated by the test cumulative distribution estimator; And

Acoustic model parameter adaptor for performing linear interpolation of the acoustic model average parameter to the test speech feature parameter using histogram equalization based on the linear interpolation factor obtained from the linear interpolation factor calculator.

Apparatus for acoustic model parameter adaptation using a histogram equalization comprising a.

The method of claim 1,

The test cumulative distribution function estimator,

Apparatus for adapting acoustic model parameters using histogram equalization, characterized by estimating test cumulative distribution function estimates for test speech feature parameters from test cumulative distribution functions using Equation 1 below.

[Equation 1]

Here, N denotes the number of voice frames constituting the test speech voice (voice received from the outside), and R (y _n ) represents the _nth voice frame among voice feature parameters extracted from the voice frames constituting the test speech voice. Denotes the rank of y _n , a negative feature parameter of.

The method of claim 2,

The rank of y _n , which is the negative feature parameter of the n th voice frame in Equation 1, is obtained by arranging the negative feature parameters included in the test speech voice in ascending order. Acoustic model parameter adaptation device.

The method of claim 1,

The training cumulative distribution function estimator,

Apparatus for adapting acoustic model parameters using histogram equalization, characterized by estimating the training cumulative distribution function estimate for the acoustic model mean parameter from the training cumulative distribution function for each coefficient using Equation 3 below.

&Quot; (3) "

here,

Represents the training cumulative distribution estimate for the acoustic model mean parameter μ, B _X (μ) represents the index of the bin to which the acoustic model mean parameter μ belongs in the training histogram, and P _X (b) represents the training histogram. Represents the probability density function that represents the probability value of the b-th bin.

The method of claim 4, wherein

P _X (b) in [Equation 3] is an acoustic model parameter adaptation apparatus using histogram equalization, characterized in that obtained by the following [Equation 4].

&Quot; (4) "

Where U is the number of speech voices constituting the training data, L _X (u) is the number of speech frames in the uth training speech speech, and Q (cond) is 1 when cond is true and false Function with zero,

Is the speech feature parameter corresponding to the l-th speech frame of the u-th training speech voice, and H _X (b) represents the b-th bin in the training histogram.

The method of claim 1,

The linear interpolation factor calculator,

Acoustic model parameter adaptation apparatus using histogram equalization, wherein the linear interpolation factor is obtained using Equation 5 below.

[Equation 5]

Where α represents a linear interpolation factor,

Denotes a test cumulative distribution function estimate for _{_{y T (m), y T}} (m) denotes the test speech feature parameter, m denotes the size of the sequence of _{y T (m), T (} m) is a sequence m Y represents the original frame index of _{T (m)} , N represents the number of speech frames that make up the test speech speech, and μ represents the acoustic model mean parameter.

The method of claim 1,

The acoustic model parameter adaptor,

Acoustic model parameter adaptation apparatus using histogram equalization, characterized in that linear interpolation is performed on two test speech feature parameters whose sequences are m and m + 1 by using Equation 6 below. .

&Quot; (6) "

here,

Represents the inverse of the test cumulative distribution function C _Y ,

Denotes a training cumulative distribution estimate for the acoustic model mean parameter μ, α denotes a linear interpolation factor, y _{T (m)} denotes a test negative feature parameter, m denotes a magnitude sequence of y _{T (m)} , T (m) represents the original frame index of y _{T (m)} of sequence m.

In the method of adapting the acoustic model parameters of the speech recognizer,

Obtaining a test cumulative distribution function for a test speech feature parameter extracted from the test speech voice when receiving a test speech voice from an outside;

Estimating a test cumulative distribution function estimate for a test speech feature parameter from the obtained test cumulative distribution function;

Obtaining a training cumulative distribution function for a feature for speech recognition extracted from previously stored training speech data;

Estimating a training cumulative distribution function estimate for an acoustic model mean parameter from the training cumulative distribution function;

Obtaining a linear interpolation factor based on the estimated training cumulative distribution function estimate and the estimated test cumulative distribution function estimate; And

Performing linear interpolation of the acoustic model mean parameter on the test speech feature parameter using histogram equalization based on the obtained linear interpolation factor.

Acoustic model parameter adaptation method using histogram equalization comprising a.

The method of claim 8,

Estimating the test cumulative distribution function estimate,

A method for adapting an acoustic model parameter using histogram equalization, characterized by estimating a test cumulative distribution function estimate for a test speech feature parameter from a test cumulative distribution function using Equation 1 below.

[Equation 1]

The method of claim 9,

The rank of y _n , which is the negative feature parameter of the n th voice frame in Equation 1, is obtained by arranging the negative feature parameters included in the test speech voice in ascending order. Acoustic model parameter adaptation method.

The method of claim 8,

Estimating the training cumulative distribution function estimate,

A method for adapting an acoustic model parameter using histogram equalization, characterized by estimating the training cumulative distribution function estimate for the acoustic model mean parameter from the training cumulative distribution function for each coefficient by using Equation 3 below.

&Quot; (3) "

here,

The method of claim 11,

P _X (b) in [Equation 3] is the acoustic model parameter adaptation method using the histogram equalization, characterized in that obtained by using the following [Equation 4].

&Quot; (4) "

The method of claim 8,

Obtaining the linear interpolation factor,

Acoustic model parameter adaptation method using histogram equalization, characterized in that to obtain the linear interpolation factor using Equation 5 below.

[Equation 5]

Where α represents a linear interpolation factor,

The method of claim 8,

Performing the linear interpolation,

Acoustic model parameter adaptation method using histogram equalization characterized in that linear interpolation is performed on two test speech feature parameters whose sequences are m and m + 1 by using Equation 6 below. .

&Quot; (6) "

here,

Represents the training cumulative distribution estimate for the acoustic model mean parameter μ, y _{T (m)} represents the test negative feature parameter, m represents the size sequence of y _{T (m)} , and T (m) represents the sequence m Y represents the original frame index of _{T (m)} .

The method of claim 8,

Obtaining a sample covariance matrix from the test speech voice; And

Adapting the covariance matrix of the acoustic model by performing linear interpolation on the obtained sample covariance matrix according to the signal-to-noise ratio

Acoustic model parameter adaptation method using histogram equalization further comprising.

The method of claim 15,

Adapting the covariance matrix of the acoustic model,

An acoustic model parameter adaptation method using histogram equalization, characterized by the following [Equation 7].

[Equation 7]

here,

Silver Signal-to-Noise Ratio

Is a smoothing factor linearly proportional to

Represents the sample covariance matrix obtained for the input speech level.