KR100339525B1

KR100339525B1 - Device for guiding living information using two step keyword recognizing method

Info

Publication number: KR100339525B1
Application number: KR1019940034185A
Authority: KR
Inventors: 김락용
Original assignee: 엘지전자주식회사
Priority date: 1994-12-14
Filing date: 1994-12-14
Publication date: 2002-11-23
Also published as: KR960024882A

Abstract

PURPOSE: A device for guiding living information using a two step keyword recognizing method is provided to recognize a keyword through two stages by dividing keywords according to classes, dividing keywords which represent each class into a class keyword and a sub class keyword. CONSTITUTION: A voice feature extracting unit(100) extracts a feature from a filtered and quantized voice signal with respect to a voice signal inputted through a microphone(9). The first stage keyword recognition unit(200) extracts the first stage class keyword from the feature. A second stage keyword recognition unit(300) extracts a sub class keyword possessed in the class keyword. A recognition result combining unit(21) combines recognition languages of the first stage keyword recognition unit(200) and the second stage keyword recognition unit(300) and outputs the combined recognition languages. The first stage class keyword extracting recognition unit(14) progresses a viterbi process using a class keyword model(13) and filler models(15). If an accumulated probability value of the class keyword model(13) and a probability value in the filler model(15) with respect to a searched section are larger than a critical value, the first stage class keyword extracting recognition unit(14) extracts a class keyword. A recognition unit(16) judges whether the class keyword extracted in the first stage class keyword extracting recognition unit(14) is recognized or rejected, and recognizes or rejects the class keyword.

Description

Life information guide device using the 1st and 2nd stage central word recognition method

본 발명은 음성인식에 관한 것으로, 특히 기존 고립어를 사용하는 인식시스템에서의 단점인 어휘의 제약 및 부자연스러움을 해소하고 자연스런 대화체로 "뉴스", "날씨", "증권", "프로야구" 생활관련 정보를 문의하고 그 결과를 알려주는 1,2단계 중심어 인식방법을 이용한 생활정보 안내장치에 관한 것이다.The present invention relates to speech recognition, and in particular, solves the limitations and unnaturalness of vocabulary, which is a disadvantage in the recognition system using the existing isolated language, and uses "news", "weather", "securities" and "professional baseball" life as a natural dialogue The present invention relates to a living information guide device using the 1,2-step central word recognition method that inquires related information and informs the result.

종래의 중심어 인식장치 구성은 제 1 도에 도시된 바와같이 마이크(1)를 통해 입력된 음성정보에 대하여 필터링 및 양자화하는 저역통과필터(2) 및 아날로그/디지탈변환기(3)와, 상기 아날로그/디지탈변환기(3)를 통해 양자화된 음성의 특징을 추출하는 특징추출부(4)와, 상기 특징추출부(4)을 통한 특징들을 입력으로하여 미리 학습된 중심어와 필러모델(7)을 사용하여 바이터비(Viterbi)과정을 수행하여 누적된 확률값을 출력하는 중심어추출 인식부(6)와, 상기 중심어추출 인식부(6)에서 출력된 누적활률값과 임계치를 비교하여 해당중심어를 인식할 것인지 거절할 것인지를 결졍하는 인식부(8)로 구성된다.The conventional center word recognition apparatus has a low pass filter 2 and an analog / digital converter 3 for filtering and quantizing voice information input through the microphone 1, as shown in FIG. The feature extractor 4 extracts the features of the quantized speech through the digital converter 3 and the core words and the filler model 7 trained in advance by inputting the features through the feature extractor 4. The central word extraction recognizer 6 outputting a cumulative probability value by performing the Viterbi process, and the cumulative activity rate value output from the central word extract recognizer 6 and the threshold are compared to reject the corresponding central word. It consists of the recognition part 8 which decides whether to do it.

이와같이 구성된 종래의 기술에 대하여 살펴보면 다음과 같다.Looking at the conventional technology configured as described above are as follows.

마이크(1)를 통해 음성신로(s(t))가 입력되면 저역통과필터(2)에서 저역통과시켜 필터링하고 아날로그/디지탈변환기(3)에서 그 저역통과된 음성신호에 대하여 디지털값으로 변환시켜 특징추출부(4)로 전달하여 준다.When the voice path s (t) is input through the microphone 1, the low pass filter 2 performs low pass filtering and converts the low pass voice signal into a digital value by the analog / digital converter 3. It is delivered to the feature extraction unit (4).

그러면, 상기 특징추출부(4)에서 음성의 특징을 추출하여 출력시키면 그 출력을 입력받은 중심어추출 인식부(6)는 미리 학습되어 저장되어 있는 데이타저장부(5)의 중심어와 필러모델(7)을 사용하여 바이터비(Viterbi)과정을 수행시켜 인식부(8)로 출력하면, 상기 인식부(8)는 상기 중심어추출 인식부(6)를 통해 얻어진 누적확률값과 임계치를 비교하여 임계치보다 커지는 음성구간에 대해 해당중심어(Key word)로 인식되었다고 하고, 그 외의 구간에 대하여는 인식되지 않으므로 비중심어 인식수행을 하도록 한다.Then, when the feature extractor 4 extracts and outputs a feature of the voice, the center word extraction recognizer 6 that receives the output is the center word and the filler model 7 of the data storage unit 5 that are pre-learned and stored. When the Viterbi process is performed and output to the recognizer 8, the recognizer 8 compares the cumulative probability value obtained through the central word extraction recognizer 6 with a threshold value and becomes larger than a threshold value. It is said that it is recognized as the key word for the voice section, and it is not recognized for the other sections.

그러나, 종래의 기술에 있어서 인식대상 중심어모델 모두에 대해 동등하게 누적 확률값을 비교해서 특정임계치를 넘는 음성구간에 대해 해당하는 중심어를 인식하는 장치를 적용하여 실용화할 경유 오류가 자주발생하고 인식하고자 하는 중심어의 개수가 증가할 경우에 제한된 메모리, 계산속도에서 실시간으로 인식하기 어려운 문제점이 있다.However, in the related art, a common error is frequently generated and recognized to be applied by applying a device that recognizes a corresponding center word for a voice interval exceeding a specific threshold by comparing the cumulative probability values equally with respect to all the central object models to be recognized. When the number of central words increases, there is a problem that it is difficult to recognize in real time in limited memory and calculation speed.

따라서, 본 발명의 목적은 생활정보 안내서비스에 사용되는 중심어들을 각 분류(class)별로 나누고 각 분류를 대표하는 중심어를 분류중심어(class key word)로 그 분류에 속한 나머지 중심어를 부분류 중심어(sub class key word)로 나누어서 2단계로 중심어를 인식하도록 한 1,2단계 중심어 인식방법을 이용한 생활정보 안내장치를 제공함에 있다.Accordingly, an object of the present invention is to divide the central words used in the living information guide service by each class, and to classify the central words representing each class as the class key words. The present invention provides a living information guide device using a 1,2-step central word recognition method, which divides a class key word into two levels.

본 발명의 다른 목적은 메모리나 계산속도와 같은 제한자원만을 이용해서 동시에 많은 수의 중심어에 대해서 인식이 불가능하나 이를 대상서비스의 영역(class)을 대표하는 중심어집합과 이에 속한 중심어들의 집합으로 나누어서 한번에 인식되는 중심어의 개수를 줄임으로써 실시간으로 처리가 가능하도록 한 1,2단계 중심어 인식방법을 이용한 샐활정보 안내장치를 제공함에 있다.Another object of the present invention is that it is impossible to recognize a large number of central words at the same time using only limited resources such as memory or computational speed, but it is recognized at once by dividing them into a central word set representing a class of a target service and a set of central words belonging to it. In order to reduce the number of core words to be processed in real time to provide a sage information guide device using the first and second steps of the central word recognition method.

상기 목적을 달성하기 위한 본 발명은 제2도에 도시한 바와같이, 마이크(9)를 통해 입력된 음성신호에 대해 필터링 및 양자화한 음성신호로부터 특징을 추출하는 음성특징 추출부(100)와, 상기 음성특징 추출부(100)에서 추출된 특징으로부터 1단계 분류 중심어를 추출하는 1단계 중심어 인식부(200)와, 상기 1단계 중심어 인식부(200)에서 추출한 분류 중심어의 분류에 속한 부분류 중심어를 추출하는 2단계 중심어 인식부(300)와, 상기 1단계 중심어 인식부(200)와 2단계 중심어 인식부(300)의 인식어를 결합하여 출력하는 인식결과 결합부(21)로 구성한다.The present invention for achieving the above object is a voice feature extraction unit 100 for extracting features from the voice signal filtered and quantized for the voice signal input through the microphone 9, as shown in FIG. The sub-class center word belonging to the classification of the classification-center word extracted by the first-stage central word recognition unit 200 and the first-stage central word recognition unit 200 from the feature extracted by the voice feature extraction unit 100. And a recognition result combining unit 21 for combining and outputting the recognized words of the first-stage central word recognition unit 200 and the second-stage central word recognition unit 300 to extract the two-stage central word recognition unit 300.

그리고, 상기 1단계 중심어 인식부(200)는 미리 모델링된 분류 중심어모델(13)과 이에 대응된 필러모델(15)들을 이용하여 바이터비(Viterbi)과정을 진행하여 찾아진 구간에 대하여 상기 분류중심어 모델(13)에 대한 누적확률값과 이에 대응된 필러모델(15)에서의 확률값을 구해 임계치보다 크면 분류중심어로 추출하는 1단계분류 중심어추출 인식부(14)와, 상기 1단계 분류 중심어추출 인식부(14)에서 추출된 분류중심어를 인식할 것인지 거절할 것인지를 결정하여 인식 또는 거절하도록 하는 인식부(16)로 구성된다.In addition, the first-stage central word recognition unit 200 performs the Viterbi process using a pre-modeled classification core word model 13 and a filler model 15 corresponding thereto, and the classification core word for the section found. A first-stage classification core word extraction recognizer 14 that obtains a cumulative probability value for the model 13 and a corresponding probability value in the filler model 15 corresponding thereto and extracts it as a classification center if it is larger than a threshold value; And a recognition unit 16 for determining whether to recognize or reject the classification center word extracted at (14).

또한 2단계 중심어 인식부(300)는 상기 1단계 중심어 인식부(200)와 동일한 구성과 그에대한 동일한 동작을 행함에 있어 사용하는 모델이 부분류 중심어모델(17)과 필러모델(19)을 사용한다.In addition, the two-stage central word recognition unit 300 uses the partial configuration of the central word model 17 and the filler model 19 as the model used in performing the same configuration and the same operation as the first-stage central word recognition unit 200. do.

이와같이 구성된 본 발명의 동작 및 작용효과에 대하여 상세히 설명하면 다음과 같다.When described in detail with respect to the operation and effect of the present invention configured as described above.

음성신호(S(t))가 마이크(9)를 통하여 입력되면 먼저 저역통과필터(10)를 거쳐 필터링되고 아날로그/디지탈변환기(11)를 통해 양자화(S(N)) 되며, 그 양자화된 신호는 특징추출부(12)에서 일정시간 간격으로 음성신호를 대표하는 특징벡터를 추출하게 되고 이 추출된 특징벡터들은 1차로 분류 중심어 모델중 1개를 추출하는 1단계 중심어 인식부(200)에 입력된다.When the voice signal S (t) is input through the microphone 9, it is first filtered through the low pass filter 10 and quantized through the analog / digital converter 11 (S (N)), and the quantized signal. The feature extractor 12 extracts feature vectors representing voice signals at predetermined time intervals, and the extracted feature vectors are input to the first-stage central word recognition unit 200 to extract one of the classification core word models. do.

그러면, 상기 1단계 중심어 인식부(200)의 1단계 분류 중심어 추출인식부(14)에서는 미리 모델링된 분류 중심어 모델(13)과 이에 대응된 필러모델(15)들을 이용하여 바이터비(Viterbi)과정을 진행하여 음성구간을 찾고, 그 찾아진 구간에 대하여 상기 분류 중심어 모델(13)에 대한 누적확률값과 이에 대응된 필러모델(15)에서의 확률값을 각각 구하고 이들값을 이용해서 비교하는 인식부(16)에서 비교된 중심어를 인식할 것인지 거절할 것인지를 결정한다.Then, the first-stage classification core word recognition recognition unit 14 of the first-stage central word recognition unit 200 uses a pre-modeled classification core word model 13 and a filler model 15 corresponding thereto to perform a Viterbi process. The recognition unit which finds a speech section, finds a cumulative probability value for the classification core word model 13 and a probability value in the filler model 15 corresponding to the found section, and compares them using these values. Decide whether to recognize or reject the central word compared in (16).

상기에서 거절할 경우에는 피드백되어 1단계 분류 중심어추출 인식부(14)로 입력되고 인식할 경우에는 2단계 중심어 인식부(300)로 출력한다.In the case of the rejection, the feedback is inputted to the first stage classification core word extraction recognition unit 14, and when it is recognized, the second stage central language recognition unit 300 is output.

그러면, 상기 2단계 중심어 인식부(300) 2단계 부분류 중심어추출 인식부(18)는 분류중심어 분류에 속한 부분류 중심어(Sub class key word) 모델(17)과 이에 대응된 필러모델(19)을 이용해서 2단계 부분류 중심어 추출인식부(18)에서 중심어를 추출하게 되며, 이 과정은 1단계 분류중심어 인식과 동일한다.Then, the second stage central word recognition unit 300, the second stage subclass center word extraction recognition unit 18, the subclass key word model 17 belonging to the classification center word classification and the filler model 19 corresponding thereto. The central word is extracted from the two-stage subclass center word extraction recognition unit 18 using the same process.

즉, 2단계 부분류 중심어 추출인식부(18)는 미리 모델링된 부분류 중심어 모델(17)과 이에 대응된 필러모델(19)들을 이용하여 바이터비(Viterbi) 과정을 진행하여 음성구간을 찾아 인식부(20)로 출력하면, 상기 인식부(20)에서는 상기에서 찾아진 음성구간에 대하여 부분류 중심어 모델(17)에 대한 누적확률값과 이에 대응된 필러모델(19)에서의 확률값을 구하고 이들값을 서로 비교하여 부분류 중심어로 인식할것인지 거절할 것인지를 결정한다.That is, the second stage subclass center word extraction recognition unit 18 performs a Viterbi process using a pre-modeled subclass center word model 17 and a corresponding filler model 19 to find and recognize a speech segment. When outputting to the unit 20, the recognition unit 20 obtains the cumulative probability values for the subclass center word model 17 and the probability values in the filler model 19 corresponding to the found voice intervals. Are compared to each other to decide whether to recognize subtypes as a central word or not.

상기 1,2단계에서 사용되는 중심어와 필러모델들은 각각 제 3 도와 제 4 도에서와 같이 1단계에서는 입력된 문장에서 분류 중심어가 1개만 추출되고, 2단계에서는 그 분류에 속한 부분류 중심어들이 추출된다.In the first and second core words and filler models used in the first and second steps, as shown in FIG. 3 and FIG. do.

예를들어 설명하면, "내일 날씨는 어떻습니까?" 라는 입력문장에 대해 1단계로 분류 중심어 집합Ⅱ={ "뉴스", "날씨", "영화"......}중에서 "날씨"라는 분류 중심어를 추출해내고 날씨의 부분류 집합 Θ={"오늘". "내일", "주말", "남부", "서울"......} 중의 한 단어를 추출하게 된다.For example, "How's the weather tomorrow?" In the first step of the sentence, the classification core word set Ⅱ = {extracts the classification key word "weather" from "news", "weather", "movie" ......} and sets the subclass set of weather Θ = { "today". The words "Tomorrow", "Weekend", "South", "Seoul" ......} are extracted.

따라서, "날씨"와 "내일"이란 인식결과를 인식결과 결합부(21)에서 종합해서 "내일날씨"란 의미를 파악해서 해당 일기예보를 출력하게 된다.Therefore, the recognition results of "weather" and "tomorrow" are combined in the recognition result combining unit 21 to grasp the meaning of "weather tomorrow" and output the corresponding weather forecast.

본 발명에서 사용한 중심어 집합은 제 5 도에 나타난 단어들이고 두 번째줄에 위치한 단어들이 분류(class)중심어에 해당하는 단어이고, 그 밑으로 연결된 단어들이 그 분류에 속한 부분류 중심어들이 된다.The core word set used in the present invention is the words shown in FIG. 5 and the words located in the second line are the words corresponding to the class center words, and the words connected below the sub-words belong to the classification.

입력된 문장에 대해서 구해진 특징벡터를 다시 구하지 않고 단지 1,2단계 중심어추출시 제 3도와 제 4 도와 같은 구조의 네트웍을 이용해서 바이터비(Viterbi)과정만을 반복한다.Instead of recalculating the feature vectors obtained for the input sentences, only the Viterbi process is repeated by using networks of the same structures as in FIGS.

이상에서와 같이 실제 생활정보 서비스를 수행하기 위해서는 수십단어의 중심어들을 인식해야 하나 메모리와 계산속도와 같이 제한된 자원만을 이용해서 동시에 많은 수의 중심어에 대해서 인식이 불가능하나 이를 대상 서비스의 영역을 대표하는 중심어 집합과 이에 속한 중심어들의 집합으로 나누어서 한번에 인식되는 중심어의 갯수를 줄임으로써 실시간 처리가 가능한데, 1단계로 발음된 음성신호에서 분류를 대표하는 분류 중심어 1개만을 추출하므로 잘못 인식되는 오류를 줄일 수 있고, 상기 분류에 속한 부분류 중심어를 2단계로 인식하도록 한다.As mentioned above, in order to perform real life information service, it is necessary to recognize the central words of dozens of words, but it is impossible to recognize a large number of central words at the same time by using only limited resources such as memory and computation speed, Real-time processing is possible by reducing the number of central words recognized at once by dividing them into a set of central words and a set of central words belonging to them.Because only one classification core word representing a classification is extracted from a sound signal pronounced in one step, errors that are incorrectly recognized can be reduced. In addition, the subclass center words belonging to the classification may be recognized in two stages.

제 1 도는 종래 중심어 인식장치 구성도.1 is a block diagram of a conventional center word recognition apparatus.

제 2 도는 본 발명의 1,2단계 중심어 인식방법을 이용한 생활정보 안내장치 구성도.2 is a block diagram of the life information guide device using the method of recognizing the central and first stage of the present invention.

제 3 도는 제 2 도에서 1단계분류 중심어 추출에 사용된 중심어와 필러의 네트웍 설명도.3 is a network explanatory diagram of a core word and a filler used for extracting a first-stage classification key word in FIG.

제 4 도는 제 2 도에서 2단계분류 중심어 추출에 사용된 중심어와 필러의 네트웍 설명도.4 is a network explanatory diagram of a core word and a filler used for extracting a two-stage classification key word in FIG.

제 5 도는 제 2 도에서 사용된 중심어 집합도.5 is a central word set used in FIG.

***** 도면의 주요부분에 대한 부호의 설명 ********** Explanation of symbols for main parts of drawing *****

1,9 : 마이크 2,10 : 저역통과필터1,9: Microphone 2,10: Low pass filter

3,11 : 아날로그(A)/디지탈(D)변환기 3,11: Analog (A) / Digital (D) Converter

4,12 : 특징추출부 6 : 중심어추출 인식부4,12: Feature extraction part: 6: Central word extraction recognition part

7,15,19 : 필러모델 8,16,20 : 인식부7,15,19: Filler model 8,16,20: Recognition unit

13 : 분류 중심어 모델 14 : 1단계 중심어추출 인식부13: classification core word model 14: first stage core word extraction recognition unit

17 : 부분류 중심어 모델 18 : 2단계 중심어추출 인식부17: Sub-class core word model 18: Two-level core word extraction recognition unit

21 : 인식결과 결합부 100 : 음성특징 추출부21: Recognition result combining unit 100: speech feature extraction unit

200 : 1단계 중심어 인식부 300 : 2단계 중심어 인식부200: step 1 core word recognition unit 300: step 2 central word recognition unit

Claims

Speech feature extraction means for extracting a feature vector from the speech signal filtered and quantized with respect to the speech signal input through the microphone, a classification model model pre-modeled from the feature vector extracted by the speech feature extraction means, and a corresponding filler model A first-stage central word recognition means for extracting a first-stage classification central word using the first-stage classification core word, a sub-class core word model pre-modeled from the feature vector extracted by the speech feature extraction means, and a filler model corresponding thereto 1 to 2, characterized in that it comprises a two-stage central word recognition means for extracting the subclass center words belonging to the classification of the classification center word extracted in the step; Life information guide device using step-centered word recognition method.

The method of claim 1, wherein the first-stage central word recognition unit comprises: a first-stage classification center word extraction recognizing unit that performs a Viterbi process using a pre-modeled classification core model and a filler model corresponding thereto to find a speech section; Recognition means for determining whether or not to recognize the classification center word as classification center by comparing the threshold with the cumulative probability value for the classification center word model and the probability value in the filler model corresponding to the speech interval of Life information guide device using a 1,2-step central word recognition method characterized in that consisting of.