KR100322730B1

KR100322730B1 - Speaker adapting method

Info

Publication number: KR100322730B1
Application number: KR1019950043907A
Authority: KR
Inventors: 김남수; 김진
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1995-11-27
Filing date: 1995-11-27
Publication date: 2002-06-20
Also published as: KR970029326A

Abstract

PURPOSE: A speaker adapting method is provided to vary the length of a voice together with the characteristic vector of a reference pattern so as to improve recognition performance. CONSTITUTION: A characteristic vector of each adaptation word is extracted, and a distance between the extracted characteristic vector and a characteristic vector of a reference pattern corresponding to each adaptation word is calculated. The difference is averaged to calculate an average vector variation. The characteristic vector of the reference pattern is changed by applying the calculated average vector variation to the characteristic vector of the reference pattern. The frame length of each adaptation word is extracted and a relative difference between the extracted frame length and the frame length of the reference pattern corresponding to each adaptation word is calculated. The difference is averaged to obtain an average length variation. The frame length of the reference pattern is changed by applying the average length variation to the frame length of the reference pattern.

Description

Speaker adaptation method

본 발명은 새로운 화자로부터 발성된 단어에 의해 기준 패턴을 변화시키는 화자 적응 방법에 관한 것으로서 더욱 상세하게는 기준 패턴의 특징 벡터와 함께 길이를 변화시켜 인식 성능을 향상시키는 개선된 화자 적응 방법에 관한 것이다.The present invention relates to a speaker adaptation method for changing a reference pattern by a word spoken from a new speaker, and more particularly, to an improved speaker adaptation method for improving recognition performance by changing a length together with a feature vector of a reference pattern. .

현재, 음성 인식은 가장 자연스러운 맨-머신 인터페이스(Man-Machine interface)수단으로 각광받고 있다. 여러 가지의 인식 방법이 사용되고 있는 데 그 중에서도 DTW(Dynamic Time Warping)방식은 소규모의 실제적 음성 인식 분야에서 가장 널리 적용되고 있다.Currently, speech recognition is in the spotlight as the most natural Man-Machine interface means. Various recognition methods are used, among which DTW (Dynamic Time Warping) is the most widely used in the field of small scale actual speech recognition.

DTW는 두 개의 음성 패턴을 비교하고, 이들 사이의 거리를 계산하는 방식이며, 이를 이용한 음성 인식에서는 각 단어에 대한 저장된 기준 패턴과 입력된 음성 패턴을 비교하여 가장 가까운 거리를 보이는 기준 패턴의 해당 단어를 인식된 단어로 결정한다.DTW compares two speech patterns and calculates the distance between them. In speech recognition using this, the corresponding words of the reference pattern showing the closest distance by comparing the stored reference pattern for each word with the input speech pattern Is determined as a recognized word.

음성 인식 시스템은 사용자에 따라 세 가지의 형태로 분류된다. 첫번째는 화자 종속 시스템이라 불리는 것으로서 특정 화자 일인만이 사용할 수 있는 시스템이다. 화자 종속 시스템은 기준 패턴 및 기준 모델 파라메터를 특정 화자의 음성 특징으로서 설정하므로 등록된 특정 화자에 대해서는 높은 인식률을 얻을 수 있지만 그렇지 않은 화자에 대해서는 인식률이 현저히 떨어지게 된다.The speech recognition system is classified into three types according to the user. The first is called a speaker dependent system, which is used only by one speaker. Since the speaker-dependent system sets the reference pattern and the reference model parameters as voice features of a specific speaker, a high recognition rate can be obtained for a specific registered speaker, but the recognition rate is significantly reduced for a speaker who is not.

두 번째는 화자 독립 시스템이라 불리는 것으로서 특정 화자가 아닌 일반 화자가 사용할 수 있는 시스템이다. 화자 독립 시스템은 기준 패턴 및 기준 모델 파라메터를 일반 화자의 음성 특징으로서 설정하므로 대부분의 화자에 대하여 고른 인식 성능을 보이지만, 평균적인 인식률이 화자 종속 시스템의 그것에 비하여 떨어진다.The second is called a speaker independent system, and is a system that can be used by a general speaker rather than a specific speaker. Since the speaker independent system sets the reference pattern and the reference model parameters as the speech characteristics of the general speaker, it shows even recognition performance for most speakers, but the average recognition rate is lower than that of the speaker dependent system.

세 번째는 화자 적응 시스템이라 불리는 것으로서 기존의 화자로부터 설정된 기준 패턴 및 모델 파라메터를 새로운 화자의 음성패턴에 적응적으로 변환시키는 것을 말한다. 화자 적응 시스템은 새로운 화자에게 및 개의 적응 단어를 발음하게 하고, 발성된 단어들을 바탕으로 기준 패턴 및 모델 파라메터를 변환하게 된다.The third is called speaker adaptation system, and it is to adaptively convert the reference pattern and model parameters set from the existing speaker to the voice pattern of the new speaker. The speaker adaptation system causes the new speaker to pronounce and adapt the adaptive words, and converts the reference pattern and model parameters based on the spoken words.

이때의 수행 과정을 화자 적응이라 하며, 이후의 인식 시스템의 동작은 화자 종속 시스템의 그것과 같다. 이러한 화자 적응 시스템은 새로운 화자가 기존의 음성 인식 시스템을 사용하려 할 때 매우 유용하며 화자 종속 및 화자 독립 시스템의 성능을 향상시킨다.The execution process at this time is called speaker adaptation, and the operation of the subsequent recognition system is the same as that of the speaker dependent system. This speaker adaptation system is very useful when a new speaker tries to use an existing speech recognition system and improves the performance of speaker dependent and speaker independent systems.

종래의 화자 적응 방법에서는 먼저 적응하려는 화자에게 몇 개의 적응 단어를 발음하게 한다. 이렇게 발성된 적응 단어는 음성 인식 시 수행되는 공지의 신호 처리 과정을 거친 후 특징 벡터열로 나타내어지게된다.In the conventional speaker adaptation method, the speaker to be adapted first has several adaptive words pronounced. The spoken adaptive word is represented as a feature vector sequence after a known signal processing process performed during speech recognition.

이들 특징 벡터열은 그 단어에 해당하는 기준 패턴의 특징 벡터열과 정렬되며 그 결과 각 프레임의 특징 벡터가 기준 패턴의 특정 프레임의 특징 벡터와 대응(정렬)된다.These feature vector sequences are aligned with the feature vector sequence of the reference pattern corresponding to the word, so that the feature vectors of each frame correspond to (align) the feature vectors of a particular frame of the reference pattern.

다음으로, 서로 대응되는 벡터간의 차이를 평균하여 변환량을 결정한다. 이때 각각의 벡터가 그 위치에 따라 변환량이 다르다는 가정을 토대로 특징 벡터가 놓일 수 있는 위치를 몇 개의 지역으로 나누고, 각각의 지역마다 별도의 변환량을 구하여 사용한다.Next, the conversion amount is determined by averaging the difference between the vectors corresponding to each other. At this time, based on the assumption that each vector has a different conversion amount according to its position, the location where the feature vector can be placed is divided into several regions, and a separate conversion amount is obtained for each region.

우수한 화자 적응 방법은 새로운 화자의 발성 속도, 출신 지역, 발음 패턴 및 특징을 적은 양의 적응 단어로부터 추출할 수 있어야 한다. 그런데, 종래의 화자 적응 방법은 특징 벡터의 변환에만 관심을 두고 있을 뿐이다.A good speaker adaptation method should be able to extract a new speaker's speech rate, region of origin, pronunciation pattern and characteristics from a small amount of adaptive words. However, the conventional speaker adaptation method is only concerned with the conversion of the feature vector.

따라서, DTW와 같이 기준 패턴의 길이에 매우 민감한 반응을 보이는 인식 방법에서는 단순히 특징 벡터만을 변환시키는 종래의 화자 적응 방법들로서는 큰 효과를 얻을 수 없게 된다.Therefore, in the recognition method having a very sensitive response to the length of the reference pattern, such as DTW, the conventional speaker adaptation methods that simply transform the feature vector cannot obtain a great effect.

그러므로, 음성 인식 시스템 특히 DTW방식의 음성 인식 시스템에서는 특징 벡터의 변환뿐만 아니라 길이의 변환도 고려된 화자적응 방법이 필요하다.Therefore, in the speech recognition system, particularly in the DTW speech recognition system, a speaker adaptation method considering not only the feature vector but also the length conversion is required.

본 발명은 상기의 요구에 부응하기 위하여 창출된 것으로서 화자적응 시 기준 패턴의 특징 벡터와 더불어 길이를 변환시키는 개선된 화자 적응 방법을 제공하는 것을 그 목적으로 한다.An object of the present invention is to provide an improved speaker adaptation method for converting a length along with a feature vector of a reference pattern in speaker adaptation, which has been created to meet the above requirements.

본 발명의 다른 목적은 상기의 화자 적응 방법에 적합한 기준 패턴의 프레임 길이 적응 방법을 제공하는 것에 있다.Another object of the present invention is to provide a frame length adaptation method of a reference pattern suitable for the speaker adaptation method described above.

상기의 목적을 달성하는 본 발명에 따른 화자 적응 방법은Speaker adaptation method according to the present invention to achieve the above object

각각의 적응 단어의 특징 벡터열을 추출하고, 추출된 특징 벡터열과 각각의 적응 단어에 상응하는 기준 패턴의 특징 벡터열의 거리를 계산하여 정절하는 과정; 정렬된 적응 단어의 특징 벡터열과 기준 패턴의 특징 벡터열과의 차이를 평균하여 평균 벡터 변화량을 산출하는 과정; 그리고 산출된 평균 벡터 변화량을 적용하여 기준 패턴의 특징 벡터열을 변화시키는 과정을 구비하여 새로운 화자가 발성한 적응 단어들에 근거하여 음성 인식에 사용되는 기준 패턴을 적응적으로 변화시키는 화자 적응 방법에 있어서,Extracting a feature vector sequence of each adaptation word, calculating a distance between the extracted feature vector sequence and a feature vector sequence of a reference pattern corresponding to each adaptation word, and correcting them; Calculating an average vector change amount by averaging the difference between the feature vector sequence of the aligned adaptive word and the feature vector sequence of the reference pattern; And a method of changing a feature vector sequence of a reference pattern by applying the calculated average vector change to a speaker adaptation method for adaptively changing a reference pattern used for speech recognition based on adaptive words spoken by a new speaker. In

각각의 적응 단어의 프레임 길이를 추출하고, 추출된 프레임 길이와 각각의 적응 단어에 상응하는 기준 패턴의 프레임 길이와의 상대적 차이를 산출하는 과정;Extracting a frame length of each adaptation word and calculating a relative difference between the extracted frame length and a frame length of a reference pattern corresponding to each adaptation word;

산출된 적응 단어의 프레임 길이와 기준 패턴의 프레임 길이와의 차이를 평균하여 평균 길이 변화량을 구하는 과정; 및Obtaining an average length change amount by averaging the difference between the calculated frame length of the adaptive word and the frame length of the reference pattern; And

산출된 평균 길이 변화량을 적용하여 기준 패턴의 프레임 길이를 변화시키는 과정을 포함함을 특징으로 한다.And changing the frame length of the reference pattern by applying the calculated average length change amount.

상기의 다른 목적을 달성하는 본 발명에 따른 기준 패턴의 프레임 길이 적응방법은Frame length adaptation method of the reference pattern according to the present invention to achieve the above another object

새로운 화자가 발성한 적응 단어들에 대해 음성 인식에 사용되는 기준 패턴을 적응시키는 화자 적응 방법에서 적응 단어의 프레임 길이에 따라 기준 패턴의 프레임 길이를 적응적으로 변화시키는 기준 패턴의 길이 적응 방법에 있어서,In a speaker adaptation method of adapting a reference pattern used for speech recognition to adaptive words spoken by a new speaker, the method of length adaptation of a reference pattern for adaptively changing the frame length of the reference pattern according to the frame length of the adaptive word ,

적응 단어의 프레임 길이와 기준 패턴의 프레임 길이와의 차이를 평균하여 평균 길이 변화량을 구하는 과정; 및Obtaining an average length change amount by averaging the difference between the frame length of the adaptive word and the frame length of the reference pattern; And

평균 길이 변화량을 적응하여 기준 패턴의 프레임 길이를 변화시키는 과정을 포함함을 특징으로 한다. 이하 첨부된 도면을 참조하여 본 발명을 상세히 설명한다.And adapting the average length change to change the frame length of the reference pattern. Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

제1도는 종래의 DTW방식의 음성 인식 시스템의 구성을 보이는 블록도이다. 제1도에 도시된 장치는 음성 처리부(10), 인식부(12), 기준 패턴 저장부(14), 화자 적응부(16), 그리고 적응 기준 패턴 저장부(18)를 구비한다.1 is a block diagram showing the configuration of a conventional DTW speech recognition system. The apparatus shown in FIG. 1 includes a speech processing unit 10, a recognition unit 12, a reference pattern storage unit 14, a speaker adaptation unit 16, and an adaptive reference pattern storage unit 18.

음성 처리부(10)에서는 입력된 아날로그 음성을 디지털 변환하고, 몇 가지의 전처리 과정을 수행하고, 프레임 단위로 블록화하며, 블록화된 각 프레임마다의 특징 벡터를 추출한다.The voice processor 10 digitally converts the input analog voice, performs some preprocessing steps, blocks each frame, and extracts a feature vector for each blocked frame.

인식부(12)에서는 입력된 특징 벡터열과 기준 패턴 저장부(14)에 저장된 각각의 기준 패턴들 사이의 거리를 계산하여, 가장 가까운 거리를 갖는 단어를 인식된 결과로 판정한다. 기준 패턴 저장부(14)에 저장된 기준 패턴은 기존의 화자로부터 발성된 각각의 단어가 음성 처리부(10)를 통하여 산출된 특징 벡터열이다.The recognition unit 12 calculates the distance between the input feature vector sequence and each reference pattern stored in the reference pattern storage unit 14 and determines the word having the closest distance as the recognized result. The reference pattern stored in the reference pattern storage unit 14 is a feature vector sequence in which each word spoken from an existing speaker is calculated through the speech processing unit 10.

화자 적응부(18)에서는 새로운 화자로부터 발성된 몇 개의 적응 단어로부터 얻어진 특징 벡터열과 각각의 적응 단어에 해당하는 적응 기준 패턴과의 정렬 과정을 통하여 기준 패턴의 변환량을 결정한다. 이때 사용되는 적응 기준 패턴(16)도 기존의 화자로부터 발성된 각각의 단어가 음성 처리부(10)를 통하여 산출된 특징 벡터열이다.The speaker adaptation unit 18 determines the amount of conversion of the reference pattern through the alignment process of the feature vector sequence obtained from several adaptation words spoken by the new speaker and the adaptation reference pattern corresponding to each adaptation word. The adaptive reference pattern 16 used at this time is also a feature vector sequence in which each word spoken from an existing speaker is calculated by the speech processor 10.

인식부(12)에서 입력 음성의 특징 벡터열과 기준 패턴과의 거리를 계산하는 방법은 다음과 같다. 우선 ()를 기준 패턴이라 가정한다. 각각의는 Ⅰ번째 프레임에서의 특징 벡터를 나타내고,는 전체 프레임의 수, 즉 길이를 나타낸다.The recognition unit 12 calculates the distance between the feature vector sequence and the reference pattern of the input speech as follows. first ( ) Is assumed to be a reference pattern. Each Denotes the feature vector in frame I, Represents the total number of frames, that is, the length.

같은 형식으로 입력 음성의 특징 벡터열을 ()라 가정한다.In the same format, the feature vector string of the input voice ( Assume that

두 벡터 사이의 거리를 나타내는 척도를 d(·,·)라 하면, 기준 패턴 ()와 입력 음성의 특징 벡터열 ()과의 거리는 공지의 동적 프로그래밍(Dynamic programming) 기법을 사용하여 구할 수 있다.If the scale representing the distance between two vectors is d (·, ·), then the reference pattern ( ) And feature vector strings ( ) Can be obtained using a known dynamic programming technique.

제2도는 동적 프로그래밍 기법을 사용하여 두 벡터열 사이의 거리를 계산했을 때, 각각의 프레임이 정렬된 예를 보이는 것이다.2 shows an example in which each frame is aligned when the distance between two vector columns is calculated using a dynamic programming technique.

이와 같이 동적 프로그래밍 기법을 사용하여 두 특징 벡터열 사이의 거리를 계산하고 각각의 프레임을 서로 정렬시키는 방법을 DTW방법이라 한다.As described above, the DTW method is a method of calculating the distance between two feature vector sequences and aligning each frame using a dynamic programming technique.

화자 적응부(28)에서는 새로운 화자가 발성한 적응 단어를 미리 저장해 둔 적응 기준 패턴과 비교하여 각각의 기준 패턴을 변환하는 규칙을 정하게 된다. 입력 패턴 ()과 그에 해당하는 단어의 적응 기준 패턴 ()을 제2도에 도시된 바와 같이 DTW를 이용하여 정렬한다. 이러한 정렬이 이루어지면 다음과 같이 서로 대응되는 M개의 특징 벡터 쌍이 결정된다.The speaker adaptation unit 28 determines a rule for converting each reference pattern by comparing the adaptive word uttered by the new speaker with the previously stored adaptive reference pattern. Input pattern ( ) And the matching criteria pattern of the corresponding words ( ) Is aligned using DTW as shown in FIG. When this alignment is made, M feature vector pairs corresponding to each other are determined as follows.

이때이다.At this time to be.

각각의 특징 벡터 쌍은 제2도에서 결정된 정렬선을 따라 일정 시간 간격으로 채취된다. 즉, 제2도의 정렬선 위에 표시된 검은 점이 해당하는 적응 기준 패턴과 입력 패턴의 특징 벡터와의 쌍을 나타낸다.Each feature vector pair is taken at regular time intervals along the alignment line determined in FIG. In other words, a black dot displayed on the alignment line of FIG. 2 represents a pair of a corresponding adaptation reference pattern and a feature vector of the input pattern.

일단, 모든 적응 단어에 대해 적응 기준 패턴과의 정렬이 구해지면 다음 단계에서는 다음과 같이 평균 벡터 변화를 계산한다. 이때, 평균 벡터 변화를라 하면Once the alignment with the adaptation criteria pattern is found for all the adaptive words, the next step computes the average vector change as follows. In this case, the average vector change If

여기서, k는 전체 적응 기준 패턴의 수를 나타내고, M()Ⅰ번째 적응 기준 패턴과 그 때와 입력 패턴을 정절했을 때의 특징 벡터쌍의 수를 나타낸다.Where k denotes the total number of adaptive reference patterns, and M ( ) I-th adaptation reference pattern, and the number of feature vector pairs at that time and when the input pattern was refined.

가 결정되면, 이를 이용하여 다음과 같이 기준 패턴을 변환한다. 특정의 기준 패턴의 특정 프레임의 특징 벡터를 Z라하고, 이를 변환하여 얻어지는 벡터를 Z'라 하면, When is determined, it is used to convert the reference pattern as follows. If the feature vector of a specific frame of a specific reference pattern is Z and the vector obtained by converting it is Z ',

Z' = Z +가 된다.Z '= Z + Becomes

일반적으로 특징 벡터의 변환은 그 벡터의 위치엔 좌우되기 때문에 위와 같이 일률적으로 벡터를 변화시키는 대신에 그 위치에 따라 다른값을 사용하게 된다.In general, the transformation of a feature vector depends on the position of the vector, so instead of changing the vector uniformly, The value will be used.

이를 위해서 적응 기준 패턴의 벡터 양자화 과정을 거치게 한다. 이때 특징 벡터는 하나의 코드워드 첨자에 대응하는 것이 아니라 여러 코드워드 첨자를 동시에 고려하여 그 특징 벡터의 위치를 자세히 표현하는 것이 필요하다.To this end, the vector quantization process of the adaptive reference pattern is performed. In this case, the feature vector does not correspond to one codeword subscript but needs to express the position of the feature vector in detail considering multiple codeword subscripts simultaneously.

을 L개의 코드워드를 나타내는 벡터라 하면, 임의의 특징벡터 x가 각 코드워드 첨자에 대응하는 정도는 다음과 같이 구해진다. Is a vector representing L codewords, the degree to which an arbitrary feature vector x corresponds to each codeword subscript is obtained as follows.

이때,는 x가에 대응하는 정도를 확률 값으로 나타낸 것이다.At this time, Is x The degree corresponding to is expressed as probability value.

평균 벡터 변화는 각 코드워드마다에 정의되며 다음과 같이 구해진다.The average vector change is defined for each codeword and obtained as follows.

여기서,은에 정의된 평균 벡터 변화량을 나타낸다.here, silver The average vector change amount defined in FIG.

각각의이 구해지면 기준 패턴 Z의 변환은 다음과 같이 이루어진다.Each If this is found, the reference pattern Z is transformed as follows.

다음은 특징 벡터의 적응과 동시에 길이의 적응까지 이를 수 있는 방법을 설명한다. 각각의 적응 기준 패턴과 입력 패턴의 정렬로부터 각 코드워드의 길이 변환 평균치는 다음과 같이 얻을 수 있다.The following describes how the feature vector can be adapted to the length adaptation. From the alignment of each adaptation reference pattern and the input pattern, the length transform average of each codeword can be obtained as follows.

정렬된 두 벡터의 시간축 상의 대용을이라 가정한다. 여기서,는 특징 벡터가 추출된 적응 기준 패턴의 프레임을 나타내고는 특징 벡터가 추출된 입력 패턴의 프레임을 나타낸다.The substitution on the time axis of two aligned vectors Assume that here, Feature vector Represents a frame of the extracted adaptive reference pattern. Feature vector Represents a frame of the extracted input pattern.

DTW를 통한 정렬로 인하여 적응 기준 패턴의 특정 프레임에 생긴 길이의 증감은 다음과 같이 구해진다.The increase or decrease in the length of a specific frame due to the alignment through the DTW is obtained as follows.

여기서,는 적응 기준 패턴의번째 프레임의 길이가 증감된 정도를 상대적으로 나타내는 수치이다. 그리고, N과 P는 DTW의 결과 하나의 기준 패턴에 대하여 여러 개의 적응 단어가 정렬되거나 반대로 여러 개의 기준 패턴에 대하여 하나의 적응 단어가 정렬될 때 현재 고려되고 있는 프레임의 전에 위치하는 정렬 개수 및 후에 위치하는 정렬개수를 나타낸다. 즉, N+P+1은 전제 정렬 개수를 나타낸다.here, Of the adaptive criterion pattern This is a number indicating the degree to which the length of the first frame is increased or decreased. N and P are the number of alignments located before and after the frame currently being considered when multiple adaptive words are aligned with respect to one reference pattern as a result of DTW or vice versa. Indicates the number of sorts to be located. That is, N + P + 1 represents the total sort number.

각 프레임의 상대적 증감이 계산되면, 특징 벡터의 변환 때와 마찬가지로 각 코드워드에 대응되는 길이의 증감의 평균치를 구한다.When the relative increase and decrease of each frame is calculated, the average value of the increase and decrease of the length corresponding to each codeword is obtained as in the conversion of the feature vector.

여기서,은에 정의된 길이 증감의 평균치이며,와는 i번째 적응 기준 패턴의 j번째 프레임과 그 때의 특징 벡터를 나타내고,는 Ⅰ번째 적응 기준패턴의 길이를 나타낸다.here, silver Is the average length increase or decrease defined in Wow Denotes the j th frame of the i th adaptation reference pattern and the feature vector at that time, Denotes the length of the I-th adaptation reference pattern.

각 코드워드에 대한 길이 증감의 평균치가 구해지면, 이것을 바탕으로 기준 패턴의 길이를 변환한다. 기준 패턴의 특징 벡터를 Z라 하면 변환된 후의 프레임의 길이 Ⅰ'는 다음과 같이 구해진다.When the average value of the length increase and decrease for each codeword is obtained, the length of the reference pattern is converted based on this. Assuming that the feature vector of the reference pattern is Z, the length I 'of the frame after conversion is obtained as follows.

여기서, 1'와 1은 각각 변한된 후와 변환되기 전의 프레임 길이를 나타낸다. 이러한 적응 기준 패턴의 길이 변환은 제3도에 도시되어져 있다. 제3도에는 적응 전의 기준 패턴의 프레임 길이(D1 -D6)가 적응에 의해 변화된 프레임 길이(E1-E6)를 갖게 되는 것을 보인다.Here, 1 'and 1 represent the frame length after the change and before the conversion, respectively. The length transform of this adaptive reference pattern is shown in FIG. 3 shows that the frame lengths D1-D6 of the reference pattern before adaptation have frame lengths E1-E6 changed by adaptation.

일단 길이 변환과 특징 벡터의 변환이 이루어지면 기준 패턴의 적응은 제4도에 도시된 바와 같이 이루어진다. 제4도에 있어서 적응된 기준 패턴은 일점 쇄선의 화살표가 나타내는 시간에 해당되는 변환된 특징 벡터를 순차적으로 찾아감으로서 구해진다. 이때, 각 프레임의 어느 위치에 일점 쇄선의 화살표가 대용되더라도 그 프레임에 해당되는 특징 벡터가 출력된다.Once the length transformation and the transformation of the feature vector are made, the adaptation of the reference pattern is done as shown in FIG. The adjusted reference pattern in FIG. 4 is obtained by sequentially finding the transformed feature vectors corresponding to the time indicated by the dashed-dotted arrows. At this time, even if an arrow of a dashed-dotted line is substituted at any position of each frame, a feature vector corresponding to the frame is output.

상술한 바와 같이 본 발명에 따른 화자 적응 방법은 특징 벡터와 함께 음성의 길이를 적응시켜 줄 수 있으므로 인식 성능을 향상시키는 효과가 있다.As described above, the speaker adaptation method according to the present invention can adapt the length of the voice together with the feature vector, thereby improving the recognition performance.

제1도는 종래의 음성 인식 시스템의 구성을 보이는 블록도이다.1 is a block diagram showing the configuration of a conventional speech recognition system.

제2도는 DTW를 이용한 기준 패턴과 입력 패턴의 정렬 상태를 보이는 도면이다.2 is a view showing the alignment of the reference pattern and the input pattern using the DTW.

제3도는 본 발명에 따른 화자 적응 방법을 도식적으로 설명하는 도면이다.3 is a diagram schematically illustrating a speaker adaptation method according to the present invention.

제4도는 특징 벡터와 길이를 변환하는 방법을 도식적으로 설명하는 도면이다.4 is a diagram schematically illustrating a method of converting feature vectors and lengths.

Claims

Extracting a feature vector sequence of each adaptation word and calculating and aligning a distance between the extracted feature vector sequence and a feature vector sequence of a reference pattern corresponding to each adaptation word; Calculating an average vector change amount by averaging the difference between the feature vector sequence of the aligned adaptive word and the feature vector sequence of the reference pattern; And a method of changing a feature vector sequence of a reference pattern by applying the calculated average vector change to a speaker adaptation method for adaptively changing a reference pattern used for speech recognition based on adaptive words spoken by a new speaker. In

Extracting a frame length of each adaptation word and calculating a relative difference between the extracted frame length and a frame length of a reference pattern corresponding to each adaptation word;

Obtaining an average length change amount by averaging the difference between the calculated frame length of the adaptive word and the frame length of the reference pattern; And

And adapting the calculated average length change to change the frame length of the reference pattern.

In a speaker adaptation method of adapting a reference pattern used for speech recognition to adaptive words spoken by a new speaker, the method of length adaptation of a reference pattern for adaptively changing the frame length of the reference pattern according to the frame length of the adaptive word ,

Obtaining an average length change amount by averaging the difference between the frame length of the adaptive word and the frame length of the reference pattern; And

A frame length adaptation method of a reference pattern, the method comprising changing a frame length of a reference pattern by applying an average length change amount.

The method of claim 2,

In the process of calculating the difference of the relative frame length, the difference in the frame length is calculated by the following equation,

here, Of the adaptive criterion pattern N and P are numbers indicating the degree of increase or decrease of the length of the first frame, and N and P are the result of DTW, when multiple adaptive words are aligned with respect to one reference pattern or vice versa. The number of sorts placed before and after the frame currently being considered.

The average length change calculation step calculates the average length change amount by the following equation,

here, silver Is the average length increase or decrease defined in Wow Denotes the j th frame of the i th adaptation reference pattern and the feature vector at that time, Is the length of the first adaptation reference pattern.

The length change process is to change the frame length of the reference pattern by the following equation

Where 1 'and 1 are the frame lengths after and before conversion, respectively.

A frame length adaptation method of a reference pattern characterized by the above-mentioned.