KR19990050799A

KR19990050799A - Voice recognition device and method of voice mail device

Info

Publication number: KR19990050799A
Application number: KR1019970069983A
Authority: KR
Inventors: 박경원; 설홍수
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1997-12-17
Filing date: 1997-12-17
Publication date: 1999-07-05
Also published as: KR100258140B1

Abstract

음성우편시스템의 음성인식 장치가, 음성인식 모드시 라인인터페이스를 통해 입력되는 음성을 수신하여 특징을 축출하는 다수의 음성특징 축출부와, 특징 데이터를 리드하여 다수의 단어검색부의 입력버퍼에 동시에 전달하고, 다수의 단어검색부가 동시에 동작되도록 동작 플레그를 세트하는 제어부와, 제어부의 제어하에 상기 특징 데이터를 리드하여 데이터와 동일한 단어를 찾기 위해 각각 데이터베이스의 해당 영역을 억세스하는 다수의 단어검색부와, 음성인식 단어에 해당하는 음소파일을 저장하는 데이터베이스로 구성되며, 데이터베이스의 검색 영역을 분할하여 상기 단어검색부들이 해당 영역만을 검색하여 검색시간을 단축한다.The voice recognition system of the voice mail system receives a voice input through a line interface in voice recognition mode and delivers a plurality of voice feature extraction units for extracting features, and reads feature data and simultaneously delivers them to the input buffers of a plurality of word search units. A control unit for setting an operation flag such that the plurality of word search units operate simultaneously, a plurality of word search units each of which accesses a corresponding region of the database to read the feature data under the control of the control unit to find the same word as the data; The database is configured to store a phoneme file corresponding to a speech recognition word. The search section of the database is divided to shorten the search time by searching only the region.

Description

Voice recognition device and method of voice mail device

본 발명은 음성우편시스템(VMS;Voice Mailig System)의 음성인식 장치 및 방법에 관한 것으로, 특히 검색용 프로세서를 여러개 두고 다중으로 검색을 수행할 수 있는 장치 및 방법에 관한 것이다.The present invention relates to a voice recognition device and a method of a voice mail system (VMS), and more particularly, to a device and a method capable of performing a search multiplely with several search processors.

상기 음성우편시스템은 일반적으로 T1 또는 E1과 같은 디지털 교환망이나 공중전화망(PSTN)과 같은 아날로그 교환망에 연결되어 운용되며, 다수의 사용자에게 음성이나 화상의 우편 서비스를 제공한다. 즉, 상기 음성우편시스템은 교환기로부터 수신되는 음성 데이터 및 팩시밀리 화상데이타를 저장하거나, 저장되어 있는 음성 데이터 및 화상 데이터를 교환기로 송출하는 기능을 수행한다.The voice mail system is generally connected to and operated by a digital switching network such as T1 or E1 or an analog switching network such as a public switched telephone network (PSTN), and provides a voice or video mail service to a plurality of users. That is, the voice mail system stores voice data and facsimile image data received from the exchange or transmits the stored voice data and image data to the exchange.

또한 이러한 음성우편시스템에 있어서 음성이나 화상에 대한 서비스를 제공하고자 할 때 사용자와 상기 시스템간의 상호 대화는 두가지 방법으로 구현될 수 있다. 그 첫 번째 방법이 사용자가 서비스 받기를 워하는 정보코드를 전화기의 DTMF(dual tone multi frequency) 전송키를 이용하여 입력하는 방법이며, 그 두 번째 방법이 음성인식을 통한 음성을 입력하는 방법(음성인식 음성우편장치에 적용)이다.In addition, in such a voice mail system, when a service for a voice or a video is provided, mutual communication between the user and the system may be implemented in two ways. The first method is to input the information code that the user wants to receive using the dual tone multi frequency (DTMF) transmission key of the phone, and the second method is to input the voice through voice recognition (voice Applicable to recognition voice mail device).

여기서 상기 음성인식을 통해 서비스를 제공하는 방법을 살펴보면, 사용자가 음성으로 메뉴명이나 지정된 단어를 말하면, 시스템이 상기 음성을 수신하여 실시간으로 단어에 대한 특징을 축출하고, 상기 축출된 특징을 가지고 데이터베이스를 검색하여 가장 비슷한 단어들을 찾아서 해당 서비스를 사용자에게 제공하는 것으로 이루어진다.Here, a method of providing a service through voice recognition, when a user speaks a menu name or a designated word by voice, the system receives the voice and extracts a feature of a word in real time, and extracts a database with the extracted feature. The search consists of finding the most similar words and providing the service to the user.

일반적으로 음성인식 음성우편시스템은 인식된 단어가 어떤 단어인지를 검색하기 위해 인식단어들의 목록을 음소파일로 만들어 데이터베이스로 구축하고 있다.여기서 상기 음소파일이란 국문의 자음 및 모음의 특색에 대응하여 영어의 알파벳과 숫자를 결합하여 분류한 파일을 말한다. 또한 현재 음성인식 장치는 일반적으로 음성의 특징을 실시간으로 축출하는 음성처리용 DSP(디지털처리부)를 하나 사용하고, 상기 축출된 특징을 바탕으로 데이터베이스의 어느 단어와 가장 비슷한지를 검색하는 검색용 DSP를 한 개 사용하고 있다.In general, a voice recognition voice mail system constructs a database of phoneme files by using a list of recognized words to search for a recognized word. Here, the phoneme file corresponds to the characteristics of Korean consonants and vowels. Refers to a file classified by combining alphabets and numbers. Also, the current speech recognition device generally uses one DSP (digital processing unit) for speech processing in real time to extract the features of speech, and a search DSP for searching which word in the database is most similar to the extracted features. I use one.

따라서 기존의 음성인식 음성우편시스템은 검색할 단어의 수가 많으면 많을수록 검색하는데 많은 시간이 소요되었으며, 일예를 들어 증권정보 같은 서비스를 음성인식용으로 제공하고자 할 경우 사용자가 너무 많이 기다려야만 하는 문제점이 발생한다.Therefore, in the existing voice recognition voice mail system, the more words to search, the more time it takes to search. For example, if a user wants to provide services such as stock information for voice recognition, the user has to wait too much. do.

따러서 본 발명의 목적은 음성우편시스템에서 음성인식 단어를 검색함에 있어 검색용 DSP를 여러개 구현한 음성인식 장치를 제공함에 있다.Accordingly, an object of the present invention is to provide a voice recognition device in which a plurality of DSPs for searching are implemented in searching for voice recognition words in a voice mail system.

본 발명의 다른 목적은 음성우편시스템에서 음성인식 단어를 검색하는 시간을 단축시키기 위한 방법을 제공함에 있다.Another object of the present invention is to provide a method for shortening the time for searching a voice recognition word in a voice mail system.

상기 목적들을 달성하기 위한 음성우편시스템의 음성인식 장치가, 음성인식 모드시 라인인터페이스를 통해 입력되는 음성을 수신하여 특징을 축출하는 다수의 음성특징 축출부와, 상기 특징 데이터를 리드하여 다수의 단어검색부의 입력버퍼에 동시에 전달하고, 상기 다수의 단어검색부가 동시에 동작되도록 동작 플레그를 세트하는 제어부와, 상기 제어부의 제어하에 상기 특징 데이터를 리드하여 상기 데이터와 동일한 단어를 찾기 위해 각각 데이터베이스의 해당 영역을 억세스하는 상기 다수의 단어검색부와, 음성인식 단어에 해당하는 음소파일을 저장하는 데이터베이스로 구성되며, 상기 데이터베이스의 검색 영역을 분할하여 상기 단어검색부들이 해당 영역만을 검색하여 검색시간을 단축함을 특징으로 한다.A voice recognition device of a voice mail system for achieving the above objects includes a plurality of voice feature extraction units for receiving a voice input through a line interface in voice recognition mode and extracting a feature, and a plurality of words by reading the feature data. A control unit which simultaneously transmits to an input buffer of a search unit and sets an operation flag such that the plurality of word search units are operated simultaneously; and a corresponding region of a database to read the feature data under the control of the control unit to find the same word as the data. And a database for storing a phoneme file corresponding to a voice recognition word, and a database for storing a phoneme file corresponding to a voice recognition word. It is characterized by.

도 1은 본 발명에 따른 일반적인 음성우편시스템의 블록 구성도.1 is a block diagram of a general voice mail system according to the present invention;

도 2는 도 1의 구성에서 음성인식처리부의 상세 구성을 도시한 도면.FIG. 2 is a diagram illustrating a detailed configuration of a voice recognition processing unit in the configuration of FIG. 1. FIG.

도 3은 본 발명의 일 실시예에 따라 두 개의 검색용 프로세서를 가지고 음성인식 단어를 검색하기 위한 처리 과정을 도시한 도면.3 is a diagram illustrating a processing procedure for searching for speech recognition words with two search processors according to an embodiment of the present invention.

이하 본 발명의 바람직한 실시예를 첨부된 도면의 참조와 함께 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

우선 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한 동일 부호를 가지도록 하였다. 또한 본 발명을 설명함에 있어서, 관련된 공지기능 혹은 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단된 경우 그 상세한 설명은 생략한다.First, in adding reference numerals to the components of each drawing, the same components have the same reference numerals as much as possible even if they are displayed on different drawings. In describing the present invention, when it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

도 1은 본 발명의 실시예에 따른 음성처리시스템의 블록 구성도이다.1 is a block diagram of a speech processing system according to an exemplary embodiment of the present invention.

제어부111은 음성처리시스템의 전반적인 동작을 제어하는 주처리장치(Master Processing Unit ; MPU)이다. 메모리112는 음성처리시스템의 전반적인 동작을 제어하기 위한 프로그램을 저장하는 롬 및 상기 프로그램 수행중에 발생하는 일시적인 데이터를 저장하는 램으로 구성된다. 디스크제어부(SCSI controller)113는 시스템 버스에 연결되며, 상기 주처리장치111의 제어하에 메시지저장부(하드디스크)114에 음성 데이터를 기록 및 독출하기 위한 제어신호를 발생한다. 상기 메시지저장부114는 상기 디스크제어부113의 제어하에 수신되는 음성 데이터 및 각종 부가정보(주식정보 등)를 저장하거나, 상기 디스크제어부113의 제어하에 수신되는 음성 데이터 및 각종 부가정보를 출력한다. 라인인터페이스부115는 외부 전화라인과 시스템 버스 사이에 연결되어 상기 전화라인과의 접속을 담당하며, 라인 접속시 수신되는 음성 데이터를 압축하여 정보량을 감축하거나, 압축된 음성 데이터를 재생하여 출력한다. 음성인식처리부(Voice Recognition Unit ; 이하 VRU라 칭함)116은 상기 라인인터페이스부115에 연결되며, 음성인식 모드시 상기 라인인터페이스부115로 부터 입력되는 음성신호를 분석하여 특징을 축출하며, 상기 추출된 데이터를 근거로 음성인식 단어를 검색한다. 입출력부117는 상기 시스템 버스에 연결되며, 상기 음성처리시스템과 외부 호스트 터미널 등과의 통신을 담당한다. 여기서 상기 입출력부에 접속되는 외부기기로는 주식정보와 같은 부가 서비스를 제공하기 위한 정보제공 호스트 컴퓨터와, 상기 음성인식 음성우편 시스템의 동작을 원격지에서 감시 및 제어하기 위한 워크스테이션 등이 있을수 있다.The control unit 111 is a master processing unit (MPU) that controls the overall operation of the speech processing system. The memory 112 includes a ROM that stores a program for controlling the overall operation of the speech processing system, and a RAM that stores temporary data generated during the execution of the program. A SCSI controller 113 is connected to a system bus and generates a control signal for recording and reading voice data to a message storage unit (hard disk) 114 under the control of the main processing unit 111. The message storage unit 114 stores voice data and various additional information (stock information, etc.) received under the control of the disc controller 113, or outputs voice data and various additional information received under the control of the disc controller 113. The line interface unit 115 is connected between the external telephone line and the system bus to perform the connection with the telephone line. The line interface unit 115 compresses the voice data received during the line connection to reduce the amount of information, or reproduces and outputs the compressed voice data. A voice recognition unit (hereinafter referred to as a VRU) 116 is connected to the line interface 115, and extracts a feature by analyzing a voice signal input from the line interface 115 in the voice recognition mode. Search for speech recognition words based on the data. The input / output unit 117 is connected to the system bus and is responsible for communication between the voice processing system and an external host terminal. The external device connected to the input / output unit may include an information providing host computer for providing an additional service such as stock information, and a workstation for remotely monitoring and controlling the operation of the voice recognition voice mail system.

상기 도 1과 같은 음성처리스시템의 동작을 살펴보면, 상기 라인인터페이스115를 통해 호가 접속되면 이를 주처리장치111에 통보하며, 상기 주처리장치111은 호접속 사실을 감지하고, 대응되는 명령을 상기 라인인터페이스부115에 출력한다. 그러면 상기 라인인터페이스부115는 상기 주처리장치111의 제어하에 전화라인을 통해 수신되는 음성, 팩시밀리 또는 일반 데이터들을 상기 시스템 버스 상에 출력하거나, 시스템 버스 상의 데이터를 수신하여 전화라인을 통해 출력한다. 그리고 디스크제어부113는 상기 시스템 버스에 연결되어 상기 라인인터페이스부115에서 메시지저장부114로 또는 상기 메시지저정부114에서 라인인터페이스부115로의 데이터 전달을 담당한다. 또한 음성인식 모드일 경우 음성인식처리부116은 상기 라인인터페이스부115로부터 음성신호를 입력받아 실시간으로 특징을 축출하고, 상기 축출된 데이터를 근거로 음성인식 단어를 검색하여 그 결과를 상기 주처리장치111에 보고한다.Referring to the operation of the voice processing system as shown in FIG. 1, when a call is connected through the line interface 115, the main processing unit 111 is notified of this, and the main processing unit 111 detects the call connection and detects a corresponding command. Output to line interface 115. Then, the line interface 115 outputs voice, facsimile or general data received through a telephone line under the control of the main processing apparatus 111 on the system bus, or receives data on a system bus and outputs the data on the telephone line. The disk controller 113 is connected to the system bus and is responsible for data transfer from the line interface unit 115 to the message storage unit 114 or the message storage unit 114 to the line interface unit 115. In the voice recognition mode, the voice recognition processor 116 receives the voice signal from the line interface 115 and extracts a feature in real time. Report to.

도 2는 도 1의 구성에서 본 발명의 일 실시예에 따른 음성인식처리부의 상세 구성도를 보여주고 있다.2 shows a detailed configuration of the speech recognition processing unit according to an embodiment of the present invention in the configuration of FIG.

제어부311은 상기 음성인식처리부의 내의 전반적인 동작을 제어한다. 메모리(도시하지 않음)는 상기 제어부311 내에 내장되어 상기 음성인식처리부의 동작 제어를 위한 프로그램을 저장하거나 상기 프로그램 수행중 발생되는 일시적인 데이터를 저장한다. 음성특징 축출부(212a, 212b)는 상기 라인인터페이스부115에 연결되며, 상기 라인인터페이스부115로 부터 수신되는 음성신호를 분석하여 특징을 축출한다. 상기 공통메모리213은 상기 음성특징 추출부(212a, 212b)와 상기 제어부211 및 단어검색부(214, 215) 사이에 연결되며, 이들 각 부들간에 통신을 담당한다. 상기 단어검색부(214, 215)는 입력버퍼에 기록된 음성 특징 데이터를 이용하여 해당 음성인식 단어를 데이터베이스를 억세스하여 검색한다.The controller 311 controls overall operations of the voice recognition processor. A memory (not shown) is stored in the controller 311 to store a program for controlling the operation of the voice recognition processor or to store temporary data generated during the execution of the program. Voice feature extraction units 212a and 212b are connected to the line interface unit 115 to extract features by analyzing voice signals received from the line interface unit 115. The common memory 213 is connected between the voice feature extraction unit 212a and 212b, the control unit 211 and the word search unit 214 and 215, and is responsible for communication between each of these units. The word search unit 214 or 215 searches for a speech recognition word by accessing a database using the speech feature data recorded in the input buffer.

여기서 상기 단어검색부(214,215)는 본 발명의 일실시예에 따라 두 개로 구현되었으며, 따라서 음성인식 단어 검색이 이루어질시 상기 단어검색부들(214,215)은 각각 데이터베이스 해당 영역만을 검색하게 된다. 즉, 기존의 하나의 단어검색부가 전 데이터베이스를 모두 검색하던 것을 상기 두 개의 단어검색부(214,215)가 분할하여 데이터베이스를 검색하여 검색시간을 보다 단축하였다.Here, the word search units 214 and 215 are implemented in two according to an embodiment of the present invention. Thus, when the voice recognition word search is performed, the word search units 214 and 215 respectively search only the corresponding database area. That is, the two word search units 214 and 215 divide the existing one word search unit to search the entire database, thereby shortening the search time.

상기 도 2과 같은 구성을 갖는 음성인식처리부의 동작을 살펴보면, 상기 라인인터페이스부115로 부터 음성이 입력되면, 상기 음성특징 추출부(212a 또는 212b)는 상기 입력되는 음성을 실시간으로 수신하여 특징을 추출하여 버퍼에 저장한다. 그리고 상기 특징 축출이 완료었음을 상기 제어부에 알리면, 상기 제어부211은 상기 음성특징 추출부(212a 또는 212b)로부터 상기 특징 데이터를 리드하여 상기 공통메모리213을 통해 상기 단어검색부들(214,215)의 입력버퍼에 각각 기록한다. 그러면 상기 단어검색부들(214,215)은 상기 입력버퍼에 기록된 데이터들을 읽어와 상기 데이터들과 비슷한 단어를 각각 데이터베이스의 해당영역을 억세스하여 검색한다. 그리고 상기 단어검색부들(214,215)은 가장 비슷한 단어를 찾아 상기 제어부에 통보하고. 상기 제어부211은 상기 단어검색부들(214,215)로부터 입력된 단어들을 최종적으로 판단하여 가장 비슷한 단어를 상기 주처리장치(111)에 알려줌으로서 해당 서비스가 수행되도록 한다.Referring to the operation of the speech recognition processing unit having the configuration as shown in FIG. 2, when a voice is input from the line interface unit 115, the voice feature extracting unit 212a or 212b receives the input voice in real time. Extract it and store it in a buffer. When the feature extraction is completed, the controller 211 reads the feature data from the voice feature extraction unit 212a or 212b and inputs the input buffers of the word search units 214 and 215 through the common memory 213. Record each on Then, the word search units 214 and 215 read the data recorded in the input buffer and search for a word similar to the data by accessing the corresponding area of the database. The word search units 214 and 215 find the most similar word and notify the controller. The controller 211 finally determines the words inputted from the word search units 214 and 215, and informs the main processor 111 of the most similar words so that the corresponding service is performed.

이하 본 발명의 일실시예에 따른 동작을 첨부된 도면의 참조와 함께 상세히 설명한다.Hereinafter, an operation according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 3은 본 발명의 일실시예에 따라 음성인식 단어를 검색하기 위한 처리 과정을 도시한 도면이다.3 is a flowchart illustrating a process for searching for a voice recognition word according to an embodiment of the present invention.

먼저, 사용자가 음성인식용 단어을 음성으로 입력하면, 상기 단어는 라인인터페이스부115를 거쳐 음성특징 축출부(212a 또는 212b)로 입력된다.(311단계) 그러면 상기 음성특징 축출부(212a, 212b)는 실시간으로 입력되는 상기 음성을 분석하여 특징을 축출한다.(313단계) 그리고 상기 사용자가 음성으로 입력한 단어의 음성 특징이 축출되면, 상기 음성특징 축출부(212a 또는 212b)는 상기 특징 데이터를 버퍼에 저장하고, 상기 특징 축출이 완료되었음을 상기 제어부211에 통보한다.(315단계) 그러면 상기 제어부211은 상기 특징 데이터를 읽어와 2개의 음성인식 단어검색부(214,215)의 각 입력버퍼에 동시에 복사한다.(317단계) 그리고 상기 제어부211는 상기 단어검색부들의 각 동작 플래그들을 세트한다.(319단계) 그러면 상기 단어검색부들(214,215)은 각 입력버퍼에 기록된 특징 데이터를 이용하여 각각 데이터베이스의 해당영역을 검색하여 가장 비슷한 단어을 찾아 상기 제어부211에 통보한다.(321단계) 그리고 상기 제어부211은 최종적으로 가장 비슷한 단어를 선택하여 그 단어를 호처리 타스크(task)에 알려주고(323단계), 상기 호처리 타스크는 상기 단어에 해당하는 서비스를 상기 사용자에게 제공한다.(325단계)First, when a user inputs a voice recognition word as a voice, the word is input to a voice feature extracting unit 212a or 212b via a line interface unit 115 (step 311). Then, the voice feature extracting units 212a and 212b are input. Extracts a feature by analyzing the voice input in real time (step 313). If the voice feature of a word input by the user is extracted, the voice feature extracting unit 212a or 212b extracts the feature data. The controller 211 stores the data in a buffer and notifies the controller 211 that the feature extraction has been completed (step 315). The controller 211 reads the feature data and simultaneously copies the feature data to each input buffer of the two voice recognition word search units 214 and 215. In operation 317, the controller 211 sets operation flags of the word search units. In operation 319, the word search units 214 and 215 are recorded in each input buffer. The feature data is searched for each region of the database to find the most similar word and notified to the controller 211 (step 321). The controller 211 finally selects the most similar word and processes the word into a call processing task. In step 323, the call processing task provides the user with a service corresponding to the word (step 325).

상기 실시예에서는 음성특징 축출부를 두 개로 구현하였으나, 다른 실시예로서 한 개 혹은 여러개로 구현할 수도 있다. 또한 단어검색부도 두 개로 구현하였으나, 다른 실시예로서 여러개의 단어검색부를 구현하여 보다 빠른 검색이 수행되도록 구성할 수도 있다. 그리고 상기 음성특징 축출부 및 단어검색부는 본 실시예에서 DSP를 이용하여 구성하였다.In the above embodiment, two voice feature extraction units are implemented. However, as another embodiment, one or more voice feature extraction units may be implemented. Also, although two word search units are implemented, as another embodiment, a plurality of word search units may be implemented to perform a faster search. The voice feature extracting unit and the word searching unit are constructed by using the DSP in this embodiment.

상술한 바와 같이 본 발명은 음성인식 음성우편시스템에서 음성인식용 단어 검색을 종전의 프로세서 하나가 담당하던 것을 다수의 프로세서가 분담하여 처리하므로서 검색시간을 보다 단축할 수 있는 효과를 거둘수 있다.As described above, according to the present invention, a plurality of processors share the processing of a single word processor for the voice recognition word search in the voice recognition voice mail system, thereby reducing the search time.

Claims

In the voice recognition device of the voice mail system,

A plurality of voice feature extractors for extracting a feature by receiving a voice input through a line interface in a voice recognition mode;

A control unit which reads the feature data and simultaneously transmits the feature data to an input buffer of a plurality of word search units, and sets an operation flag to simultaneously operate the plurality of word search units;

The plurality of word retrieval units for reading the feature data under the control of the controller and accessing corresponding regions of the database to find the same words as the data;

It consists of a database that stores phoneme files corresponding to speech recognition words.

Speech recognition system of the voice mail system, characterized in that by dividing the search area of the database, the word search unit searches only the corresponding area to shorten the word search time.

In the voice recognition method of the voice mail system having a database for storing phoneme files for speech recognition words, and a plurality of search processors for searching the database,

Extracting the feature data of the received voice recognition words and simultaneously recording them into input buffers of the plurality of search processors;

Reading the most similar words by the plurality of search processors accessing the respective regions of the database with the extracted feature data;

And a process of finally selecting the most similar words among the words read by the respective search processors and delivering the same to the call processing task.