KR20010088766A

KR20010088766A - Speech recognition control system using dsp and wireless communication

Info

Publication number: KR20010088766A
Application number: KR1020000011954A
Authority: KR
Inventors: 양성일; 권영헌; 김준성
Original assignee: 양성일; 권영헌; 임중혁; 김준성; (주)도움정보기술
Priority date: 2000-03-10
Filing date: 2000-03-10
Publication date: 2001-09-28

Abstract

PURPOSE: A speech recognition control device using a DSP(Digital Signal Processing Unit) and radio communication is provided to recognize input speech signals through a speech recognition process and transfer the recognized result to a control device through radio communication, thereby solving a problem of error recognition. CONSTITUTION: A speech input part receives speech signals. A DSP implements a speech recognition algorithm to recognize input speech signals. A radio communication part transfers the recognized speech signals through a radio unit. A control device controls a speech output part on the basis of the recognized speech signals to inform the state of a controller, or controls the controller to realize desired operations.

Description

SPEECH RECOGNITION CONTROL SYSTEM USING DSP AND WIRELESS COMMUNICATION}

본 발명은 제 통제장치에 음성인식을 적용한 것으로서, 특히 DSP와 무선통신을 이용한 음성인식 통제장치에 관한 것이다.The present invention applies voice recognition to a first control device, and more particularly, to a voice recognition control device using a DSP and wireless communication.

이 장치는 예를 들면 주택내의 각종 시설의 기능을 자동화하여 주택주의 주택공간 통제능력을 증가시켜 삶의 질을 향상시킬 수 있다. 통제장치의 한 예인 홈 오토메이션(Home Automation)의 경우 리모콘을 이용한 방법이 현재 주류를 이루고 있다.The device can improve the quality of life by, for example, automating the functions of various facilities in the home, thereby increasing the landlord's ability to control the housing space. In the case of home automation, an example of a control device, the method using a remote controller is mainstream.

그러나 이러한 리모콘의 사용은 장애인에게 있어서 상당한 불편을 느낄 수 있다. 즉, 이러한 리모콘은 결국 손을 사용하여 조작하여야 하는데, 손이 불편한 장애인은 리모콘을 조작할 수 없으므로 홈 오토메이션을 사용할 수 없는 불편이 있다.However, the use of such a remote control can be a significant inconvenience for the disabled. That is, such a remote control must be operated by using a hand. However, a handicapped person who has an uncomfortable hand cannot operate a remote control, and thus, home automation cannot be used.

본 발명은 상기의 문제점을 감안하여 이루어진 것으로써 장애자라도 제 통제장치를 사용하는 것이 가능하게 하는 DSP와 무선통신을 이용한 음성인식 통제장치를 제공하고자 하는 것이다.SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and an object of the present invention is to provide a voice recognition control device using a DSP and wireless communication, which enables a disabled person to use a control device.

도1은 음성인식 통제장치의 전체 시스템 구성도,1 is an overall system configuration of the voice recognition control device,

도2는 음성인식 통제장치의 인식률을 나타내는 도면이다.2 is a view showing a recognition rate of the voice recognition control device.

본 발명은 상기의 과제를 해결하기 위하여, DSP와 RF 통신을 이용하여 음성인식 통제장치 시스템을 구성하였으며 전체 시스템 구성은 도1과 같다.In order to solve the above problems, the present invention constitutes a voice recognition control system using DSP and RF communication.

음성인식 통제장치는 크게 음성 입력부, DSP부(Digital Signal Processing Unit), 무선 통신부(RF 모듈), 통제장치부(Control System)로 구성된다.The voice recognition control device is largely composed of a voice input unit, a digital signal processing unit (DSP), a wireless communication unit (RF module), and a control system unit (Control System).

음성 입력부에는 잡음을 제거하기 위한 선처리기로서 LPF(Lowpass Filter)를 사용하였으며 아날로그(Analog) 신호를 디지털(Digital) 신호로 바꾸기 위해 8 비트(bit)의 분해능(resolution)을 가지는 A/D 변환기(Analog/Digital converter)인 AD7821KN을 사용한다.LPF (Lowpass Filter) is used as a preprocessor to remove noise in the voice input unit, and an A / D converter having an 8-bit resolution for converting an analog signal into a digital signal Analog / Digital converter) AD7821KN is used.

A/D 변환기를 통과한 신호는 연산 속도가 빠른 DSP부를 이용하여 음성인식 과정을 수행하고 인식된 단어를 RF 모듈을 통해 무선으로 통제장치부로 전송하게 된다. 통제장치부에서 MICOM은 80c51이라는 8비트의 마이크로 프로세서를 사용하며 외부 메모리로 롬(ROM, 27256)과 램(RAM, 62256)을 사용하고 있다. 이 MICOM에서는 RF 모듈에서 받은 신호를 파악하여 인식된 단어에 대해서 현재의 상태를 음성으로 알려주고 '예' 또는 '아니오' 로 인식된 명령대로 통제장치를 구동하도록 구성한다. 음성의 출력을 위해 사용된 칩은 ISD2560이며 이는 음성의 녹음과 재생이 아주 간편한 칩이다.The signal passing through the A / D converter performs the voice recognition process using the DSP unit that has a high operation speed, and wirelessly transmits the recognized word to the controller unit through the RF module. In the control unit, MICOM uses an 8-bit microprocessor called 80c51 and uses ROM (ROM, 27256) and RAM (RAM, 62256) as external memory. The MICOM is configured to identify the signal received from the RF module and inform the current state of the recognized word by voice and to drive the control unit as the command recognized as 'yes' or 'no'. The chip used for voice output is the ISD2560, which is a very simple chip for recording and playing back voice.

(실시예)(Example)

이하, 본 발명의 전체 구성을 홈 오토메이션을 예로 들어 도면을 참조하면서 설명한다.Hereinafter, the overall configuration of the present invention will be described with reference to the drawings taking home automation as an example.

우선, 음성 입력부에 대해 설명한다.First, the voice input unit will be described.

음성인식 독립시스템을 구성할 때에는 실시간 처리가 중요하다. 그 중에서도 실제로 음성이 들어있는 구간을 찾아내어 처리하기 위해서는 하나, 혹은 몇 개의 음성 샘플이 들어올 때마다 계산이 수행되어야 한다. 이를 위해서 본 시스템에서는 하나의 음성 샘플이 들어올 때에서부터 다음 샘플이 들어올 때까지의 시간에 계산 과정을 수행하기 위하여 하나의 샘플이 들어올 때마다 외부 인터럽트를 걸어 데이터를 받고 다시 빠져 나와 계산 과정을 수행할 수 있도록 회로와 프로그램을 구성한다.Real time processing is important when constructing an independent voice recognition system. In particular, in order to find and process a section in which speech is actually contained, a calculation must be performed every time one or several speech samples are received. To do this, the system performs an external interrupt every time one sample comes in to receive the data and exits to perform the calculation process in order to perform the calculation process from one voice sample to the next. Organize circuits and programs to enable

A/D 변환기는 8 kHz의 샘플링 주파수(sampling frequency)와 8 비트 분해능의 양자화 율을 갖도록 구성하며, 4 kHz cut-off frequency를 갖는 LPF를 추가한다.The A / D converter is configured to have a sampling frequency of 8 kHz and a quantization rate of 8 bits of resolution, and adds an LPF with a 4 kHz cut-off frequency.

다음으로, DSP부와 음성인식 과정이다.Next, the DSP unit and the voice recognition process.

본 발명에 사용된 DSP는 TMS320C3X 시리즈로서 TI(Texas Instrument)사에서 발매한 32 비트 부동소수점 방식이다. 이는 곱셈기와 같은 고속의 연산회로를 하드웨어적으로 처리하는 기능을 가지고 있어서 많은 양의 데이터를 처리하거나 복잡한 연산을 빠른 시간 내에 처리할 수 있다. 사용된 DSP의 기능 및 특징은 다음과 같다.The DSP used in the present invention is a TMS320C3X series, which is a 32-bit floating point method released by Texas Instruments. It has the function of hardware processing high speed computing circuits such as multipliers, so it can process large amounts of data or complex operations in a short time. The functions and features of the DSP used are as follows.

일단 음성이 들어오게 되면 입력된 음성으로부터 실제로 음성이 있는 구간을 찾는 과정이 실행되어야 한다. 이를 실음성 구간의 검출이라 한다. 음성의 인식에 있어서 정확한 실음성 구간의 검출은 인식 알고리즘과 더불어 매우 중요한 과정의 하나이다. 본 발명에서는 보다 정확한 실음성 구간의 검출을 위하여 에너지와 경계 교차율을 동시에 이용하는 함수를 이용하여 실음성 구간의 검출을 정확히 수행한다.Once the voice comes in, the process of searching for the section with the actual voice from the input voice should be executed. This is called detection of a real voice interval. In speech recognition, accurate real speech section detection is a very important process along with the recognition algorithm. In the present invention, the detection of the real speech section is accurately performed by using a function that simultaneously uses the energy and the boundary crossing rate for the detection of the real speech section.

실음성 구간이 검출되면, 실음성에서 특징 파라메터를 추출해야 한다. 이 특징 파라메터로부터 학습과 비교 과정을 거쳐서 인식을 수행하게 된다. 음성 특징 파라메터의 추출은 음성을 되도록 적은 양의 수치 데이터를 가지고 효과적으로 묘사하는 것이 매우 중요하다.If a real speech section is detected, feature parameters must be extracted from the real speech. From this feature parameter, recognition is performed through a process of learning and comparison. Extraction of speech feature parameters is very important to effectively describe speech with as little numerical data as possible.

본 발명에서는 10차원의 LPC 계수에서 유도된 켑스트럼(cepstrum)계수와 로그 스케일(log scale) 에너지를 총 11차의 파라메터로 이용한다.In the present invention, the cepstrum coefficient and the log scale energy derived from the 10-dimensional LPC coefficients are used as the 11th order parameters.

각각의 명령에 대해 특징 파라메터를 추출한 후, 이를 코드북(codebook)과 훈련 과정에 이용하기 위하여 VQ(vector quantization)를 수행한다. VQ는 음성신호의 정보율을 감소시킴으로써 전송율을 높이려는 의도에서 개발되었다.After extracting feature parameters for each command, vector quantization (VQ) is performed to use them in the codebook and training process. VQ was developed with the intention of increasing the transmission rate by reducing the information rate of the voice signal.

VQ 과정은 두 가지로 나눌 수 있는데 코드북(codebook)을 생성하는 과정과 관측열을 생성하는 과정으로 나눌 수 있다.The VQ process can be divided into two types: a codebook generation process and an observation sequence generation process.

먼저 VQ과정을 통해 코드북을 생성한 후 훈련하고자 하는 계수들로부터 관측열을 생성시킨다. 관측열을 생성하는 방법은 훈련하고자 하는 계수들과 기준패턴 사이의 거리값을 비교한 뒤, 거리값이 가장 가까운 벡터의 인덱스(index)를 이용한다. 인식하기 위한 어휘가 들어오면, 훈련할 때와 마찬가지로 VQ 과정으로부터 생성된 코드북과 비교하여 다시 관측열을 생성하게 된다.First, a codebook is generated through the VQ process, and an observation sequence is generated from the coefficients to be trained. The method of generating the observation sequence compares the distance between the coefficients to be trained and the reference pattern, and then uses the index of the vector closest to the distance. When the vocabulary for recognition comes in, the training sequence is generated again in comparison with the codebook generated from the VQ process as in the training.

음성 훈련 및 인식 알고리즘은 음성신호의 시변성(時變性)을 잘 반영할 수 있는 은닉 마코프 모델(Hidden Macov Model)을 사용한다.Speech training and recognition algorithms use the Hidden Macov Model, which can accurately reflect the time-varying nature of speech signals.

인식에 쓰인 어휘는 가정에서 많이 쓰일 수 있는 /티비(TV)/, /비디오/, /에어컨/, /난방/, /현관/, /컴퓨터/, /세탁기/, /메모/, /취사/, /가스/ 등의 10가지 단어로 구성하였으며 10개의 모델에 대해 Baum-Welchi 알고리즘으로 훈련시킨 후 인식실험을 하였다. 각 모델을 설정하기 위하여 VQ 레벨은 64로 구성하였으며 각각의 상태수는으로 하고 음절의 초성 'ㅇ'은 없는 음소로 가정하였다.The vocabulary used for recognition is / TV /, / video /, / air conditioner /, / heating /, / entrance /, / computer /, / washing machine /, / memo /, / cooking /, It consisted of 10 words such as / gas / and trained with Baum-Welchi algorithm for 10 models. In order to set up each model, the VQ level consists of 64. It is assumed that the phoneme has no initial syllable 'ㅇ' of syllable.

시험(Test) 어휘가 들어오면, 훈련할 때와 마찬가지로 이전에 VQ 과정으로부터 생성해 놓은 기준패턴과, 입력패턴을 비교하여 관측열을 생성하게 된다. 이 관측열과 훈련된 각각의 확률값을 비터비(Viterbi) 알고리즘에 적용시킨다. 비터비 알고리즘은 주어진 관측열과 모델에 대해서 최적의 상태열을 찾는 방법이다. 결정된 최적의 상태열로 부터 확률값을 계산한다. 각각의 모델에 대해 계산된 확률값들 중 가장 확률값이 높은 모델로 인식하게 된다.When the test vocabulary comes in, as in training, the observation sequence is generated by comparing the input pattern with the reference pattern previously generated from the VQ process. This observation string and each trained probability value are applied to the Viterbi algorithm. The Viterbi algorithm finds the optimal state sequence for a given observation sequence and model. The probability value is calculated from the determined optimal state string. It is recognized as the model with the highest probability among the probability values calculated for each model.

토포로지(topology)는 left-to-right 모델 중에서 하나의 상태(state) 만을 천이할 수 있는 모델을 적용한다. 각각의 모델은 설정한 상태수에 따라 훈련과정을 거치게 된다.Topology applies a model that can transition only one state of the left-to-right model. Each model is trained according to the number of states you set.

훈련방법은, 먼저 모델을 설정하고, 설정한 모델의 π, A, B 값을 초기화한다. 여기서 π는 초기 상태 확률 분포, A는 상태 천이 확률, B는 상태 j에서의 관측 심볼 확률이다. 이때 A,B의 초기값들은 모두 임의의 값(random)을 사용하였으며는 항상 첫 번째 상태로 부터 천이해 나갈 수 있도록 첫 번째 상태의 값을 1로, 나머지는 0으로 하였다.In the training method, first, a model is set, and the pi, A, and B values of the set model are initialized. Where π is the initial state probability distribution, A is the state transition probability, and B is the observed symbol probability in state j. At this time, the initial values of A and B were all random. Always set the value of the first state to 1 and the rest to 0 so that it always transitions from the first state.

이러한 인식과정은 각각의 모델에 대하여 모두 수행하고 이들 중에서 가장 확률이 높은 모델이 인식결과가 된다.This recognition process is performed for each model, and the model with the highest probability among them becomes the recognition result.

인식된 결과는 DSP부와 연결된 RF 모듈을 통해서 무선으로 통제장치부로 전달되게 된다. 통제장치부에서는 인식된 단어에 대해 현재 상태를 음성으로 알려주고 이에 대해 자동으로 실행하게 된다. 이 실행 방법은 다음과 같다. 우선 통제장치부에서 인식된 단어에 대한 상태를 음성으로 알려준다. (예: TV가 켜져 있습니다. 끄시겠습니까?) 주어진 음성 물음에 대해 '예, 아니오'로 대답하면 그에 대응하는 명령을 수행하게 된다.The recognized result is wirelessly transmitted to the control unit through the RF module connected to the DSP unit. The controller unit voices the current state of the recognized word and automatically executes it. This execution method is as follows. First of all, the controller informs the status of the recognized word. (Example: TV is on. Are you sure you want to turn it off?) If you answer yes or no to a given voice question, the corresponding command will be performed.

이상과 같이 본 발명은 입력된 음성신호를 음성인식 과정에서 인식하여 그 결과를 무선 통신을 이용해 통제장치부로 보냄으로써 음성인식 통제장치를 구동하게 된다. 제시된 음성인식 통제장치의 인식률을 도2에 나타낸다.As described above, the present invention drives the voice recognition control device by recognizing the input voice signal in the voice recognition process and sending the result to the control device using wireless communication. The recognition rate of the proposed speech recognition control device is shown in FIG.

도면에서 알 수 있는 바와 같이 각 단어에 대한 인식률은 화자 독립으로 평균 90% 이상 임을 알 수 있다. 또한 통제장치부에서 인식된 단어에 대한 상태를 음성으로 알려주어 (예: 에어컨이 꺼져 있습니다. 켜시겠습니까?) 음성인식에 있어서 중요한 문제인 오인식에 대한 문제를 해결하였다. 따라서 본 발명은 그 실용성을 인정할 수 있다.As can be seen in the figure, the recognition rate for each word can be seen that the average of more than 90% independent of the speaker. In addition, the controller unit informed the status of the recognized words (eg, the air conditioner is turned off. Do you want to turn it on?) To solve the problem of false recognition, which is an important problem in speech recognition. Therefore, this invention can acknowledge the utility.

본 발명의 장점으로는 간편성, 안전성, 고품격성, 오락성, 정확성, 절약성 등을 들 수 있다. 본 발명으로 적용될 수 있는 부분으로서는 옥내외 조명 시설, 보안 시설, 냉 온방 시설, 문 및 차고 문 개폐시설, 오디오/비디오 시설, 주방 시설 등을 열거할 수 있다.Advantages of the present invention include simplicity, safety, high quality, entertainment, accuracy, savings and the like. As a part which can be applied to the present invention, indoor and outdoor lighting systems, security facilities, heating and cooling facilities, door and garage door opening and closing facilities, audio / video facilities, kitchen facilities and the like can be listed.

Claims

A voice input unit for receiving a voice signal,

A DSP unit for recognizing an input voice signal by implementing a voice recognition algorithm;

A wireless communication unit for shearing the recognized signal to the control unit through radio;

Based on the recognized signal, it controls the voice output unit to inform the status of the control device or the control device to control the control device to perform the desired operation.

Voice recognition control device using DSP and wireless communication.