KR100395222B1

KR100395222B1 - Voice Recognition System for Voice Mail Service (VMS)

Info

Publication number: KR100395222B1
Application number: KR10-1998-0054607A
Authority: KR
Inventors: 장육현
Original assignee: 엘지전자 주식회사
Priority date: 1998-12-12
Filing date: 1998-12-12
Publication date: 2003-10-17
Also published as: KR20000039299A

Abstract

본 발명은 음성사서함서비스(VMS; Voice Mail Service)기능에 사용되는 문자 또는 숫자 등을 음성으로 명령하여 해당 서비스를 제공받는 음성인식시스템에 관한 것이다. 본 발명의 음성인식시스템은 입력되는 음성의 입력단위별로 시작점과 끝점을 추출하여 피치정보를 구하며, 동시에 입력 음성의 특징벡터를 추출하는 전처리수단과, 사용자의 선택모드정보를 입력받아 인식대상단어를 설정하며, 검출된 특징벡터와 설정된 인식대상단어를 이용하여 단어인식을 수행하는 인식부와, 피치정보에 의해 음성과 음성이 아닌 소리로 판별하여 특징벡터를 추출하는 전처리수단의 동작을 제어하며, 사용자의 선택모드정보를 인식부로 제공하며, 인식부의 인식결과를 VMS주처리부로 전달하기 위한 제어를 수행하는 제어부 및 인식부에서 수행된 인식결과에 따라 사용자에게 적절한 서비스를 제공하는 VMS주처리부를 포함한다. 따라서, 본 발명은 사용자의 대상단어에 대한 피치정보를 이용하여 음성이 아닌 입력에 대하여 거절기능을 제공하며, 사용자가 사용중인 기능에 대한 정보를 이용하여 각 기능마다 대상단어를 줄일 수 있어 인식속도와 인식율이 향상되는 효과가 있다. 특히, 비밀번호 인증과정에 사용되는 4자리 숫자인 경우 4자리 음성을 동시에 입력받아 확인하는 방식 대신에 각 자리 숫자에 대하여 인식을 수행하는 방식으로 구현되어 사용자의 편의성과 인식성능을 개선하였다.The present invention relates to a voice recognition system for receiving a corresponding service by commanding letters or numbers used in a voice mail service (VMS) function with voice. The speech recognition system of the present invention obtains pitch information by extracting a starting point and an end point for each input unit of an input speech, and simultaneously receives preprocessing means for extracting a feature vector of the input speech and user's selection mode information. And a recognition unit that performs word recognition using the detected feature vector and the set recognition target word, and controls the operations of the preprocessing means for extracting the feature vector by discriminating the voice and the voice based on the pitch information. Provides the user's selection mode information to the recognition unit, and includes a control unit for performing a control for transferring the recognition results of the recognition unit to the VMS main processor and a VMS main processor for providing the appropriate service to the user according to the recognition results performed by the recognition unit do. Accordingly, the present invention provides a rejection function for an input other than a voice by using pitch information of a target word of a user, and reduces a target word for each function by using information about a function being used by a user. And the recognition rate is improved. In particular, in the case of four-digit numbers used in the password authentication process, instead of receiving and confirming four-digit voices at the same time, it is implemented by performing recognition on each digit, thereby improving user convenience and recognition performance.

Description

Voice Recognition System for Voice Mail Service (VMS)

본 발명은 음성인식시스템에 관한 것으로, 특히 음성사서함서비스(VMS; Voice Mail Service)기능에 사용되는 명령어, 또는 숫자 등을 음성으로 입력하여 해당 기능을 수행하도록 한 한 음성사서함서비스(VMS)를 위한 음성인식시스템에 관한 것이다.The present invention relates to a voice recognition system, and more particularly, to a voice mail service (VMS) for performing a function by inputting a command or a number, etc. used for a voice mail service (VMS) function, by voice. A voice recognition system.

근래에 들어, 음성인식시스템은 매우 폭넓은 영역에서 응용되어지고 있으며, 새로운 응용분야에 맞도록 인식장치를 새롭게 구성하거나 개선하고 있는 추세이다. 이러한 음성인식의 여러 응용분야 중 기존의 모드선택기능을 음성입력으로 대치하려는 음성활성화(voice-activate)기술이 급속도로 발전하고 있다. 최근들어, 이동전화 또는 개인 휴대통신이 급속히 발달하여 VMS기능이 단순히 유선전화뿐만 아니라, 다양한 단말기를 통해 이용되므로 음성과 음성이 아닌 소리(잡음)를 정확하게 구분하여 음성인식을 수행하는 인식장치가 필요하다. 기존의 음성사서함서비스(VMS)는 안내방송에 따라 자신이 원하는 숫자를 차례대로 누름으로써 원하는 기능의 서비스를 받을 수 있다. 이렇게 숫자를 누르는 것을 대신하기 위해 음성인식기능이 사용되고 있다. 이러한 기존의 음성인식장치로는 인식시간을 단축하기 위해 DTW(dynamic Time Wraping)방식으로 구현되었다.In recent years, voice recognition systems have been applied in a very wide range of fields, and the trend of reconfiguring or improving the recognition device for new applications is on the rise. Among many applications of such voice recognition, voice-activate technology is rapidly developing to replace the existing mode selection function with voice input. In recent years, mobile phone or personal mobile communication has been rapidly developed, so the VMS function is used through various terminals as well as wired telephones. Therefore, a recognition device is needed to perform voice recognition by accurately distinguishing voice (noise) from voice and voice. Do. Existing voicemail service (VMS) can receive the service of the desired function by pressing the desired number in turn according to the announcement. Voice recognition is used to replace the numbers. Such a conventional voice recognition device has been implemented by DTW (dynamic time wrapping) method to reduce the recognition time.

도 1은 DTW방식을 이용한 기존의 음성인식장치를 보여주는 도면이다. 전처리부(11)는 사용자의 음성을 입력받아, 필요한 음성의 특징 벡터를 추출한다. DTW인식기(13)는 메모리(15)에 저장된 다수의 인식대상 단어를 읽어들인다. 그래서, 소정의 음성인식 프로그램 기법에 의해 입력음성과 다수의 인식대상 단어와 비교하여 가장 근접한 단어를 인식결과로 출력한다. 중앙제어부(17)는 인식결과에 따른 해당 서비스기능을 사용자에게 제공한다. 만약, 사용자가 숫자음을 명령어로 입력하고자 할 때, 시스템에서 제공되는 "삐"소리가 날 때마다 사용자는 한 숫자씩 발성하여 입력한다. 기존의 음성인식장치는 음성명령어 입력이 모두 끝나면 그 결과를 다시 사용자에게 확인한 후 맞으면 그에 해당하는 서비스를 제공한다.1 is a view showing a conventional voice recognition device using a DTW method. The preprocessor 11 receives a user's voice and extracts a feature vector of a required voice. The DTW recognizer 13 reads a plurality of words to be recognized stored in the memory 15. Therefore, the closest word is output as a recognition result by comparing the input voice with a plurality of words to be recognized by a predetermined voice recognition program technique. The central control unit 17 provides the user with the corresponding service function according to the recognition result. If the user wants to input a number sound as a command, each time the user hears the "beep" sound provided by the system, the user speaks one number. The existing voice recognition device checks the result again after the user inputs the voice command and provides the corresponding service if it is correct.

상술한 바와같은, 기존의 음성사서함(VMS)기능을 위한 음성인식기는 매우 단순한 음성인식장치로서 매우 적은 단어만을 인식할 수 있으며, 주변의 잡음을 처리하지 못하고 인식하는 등 인식성능이 매우 낮았다. 또한, 기존의 음성인식기는 사용되는 모든 명령어를 인식대상 단어로 가정한 후 인식을 수행하며, 숫자음 인식의 경우 각각 독립되는 숫자음을 '삐'소리가 난 후 한 숫자씩 독립적으로 인식한다. 그러나, 이는 사용자가 매우 불편함을 느끼며, 중간에 한 숫자라도 인식이 잘못될 경우 처음부터 다시 모든 숫자를 발성해야 하는 번거로움이 있다.As described above, the voice recognition device for the conventional voice mail (VMS) function is a very simple voice recognition device, which can recognize very few words, and does not process surrounding noise, and has a very low recognition performance. In addition, the conventional speech recognizer performs recognition after assuming that all the commands used are words to be recognized, and in the case of digit recognition, each digit is independently recognized by one digit after a 'beep' sound. However, this makes the user feel very uncomfortable, and when one number in the middle is wrongly recognized, it is troublesome to utter all numbers again from the beginning.

따라서, 본 발명의 목적은 사용자의 대상단어에 대한 피치정보를 이용하여 음성이 아닌 입력에 대하여 거절기능을 제공하며, 사용자가 사용중인 기능에 대한 정보를 이용하여 음성인식을 수행하는 음성사서함서비스(VMS)를 위한 음성인식시스템을 제공함에 있다.Accordingly, an object of the present invention is to provide a voice rejection function for a non-speech input using pitch information on a target word of a user, and perform voice recognition using information on a function being used by a user. It provides a voice recognition system for VMS).

본 발명의 다른 목적은 숫자음을 인식할 때, 음성명령어를 이루는 모든 자리수에 해당하는 음성을 연속으로 입력받아 확인하는 연속단어 인식방식 대신에, 각 자리 숫자에 대하여 개별적으로 음성인식을 수행하는 고립단어 인식방식을 수행하여 사용자의 편의성과 인식성능을 개선한 음성사서함서비스(VMS)를 위한 음성인식시스템을 제공함에 있다.Another object of the present invention is to isolate the speech recognition for each digit instead of the continuous word recognition method of receiving and confirming the speech corresponding to all the digits constituting the voice command in succession when recognizing the digits, The present invention provides a voice recognition system for a voice mail service (VMS) that improves user convenience and recognition performance by performing word recognition.

도 1은 종래의 음성인식시스템을 보여주는 도면,1 is a view showing a conventional speech recognition system,

도 2는 본 발명의 바람직한 실시예에 따른 음성인식시스템을 보여주는 도면,2 is a view showing a voice recognition system according to a preferred embodiment of the present invention;

도 3은 음성과 음성이 아닌 소리(잡음)로 판정하는 과정을 설명하기 위한 흐름도.3 is a flowchart for explaining a process of determining sound and noise other than voice (noise);

※ 도면의 주요 부분에 대한 부호의 설명※ Explanation of codes for main parts of drawing

20 : 전처리수단 22 : 에코 및 잡음제거부20: preprocessing means 22: echo and noise canceling unit

24 : 끝점추출 및 피치정보검출부 26 : 특징벡터추출부24: end point extraction and pitch information detection unit 26: feature vector extraction unit

30 : 인식부 40 : 제어부30: recognition unit 40: control unit

50 : VMS주처리부50: VMS main processing unit

위와같은 목적을 달성하기 위한 본 발명의 특징은 음성사서함서비스(VMS)를 위한 음성인식시스템에 있어서, 입력되는 음성의 에코 및 잡음을 제거하고, 에코 및 잡음이 제거된 음성에 대하여 입력단위별로 시작점과 끝점을 추출하여 피치정보를 구하고, 이 피치정보에 의해 입력 음성의 특징벡터를 추출하는 전처리수단, 사용자의 선택모드정보를 입력받아 인식대상단어를 설정하며, 상기 검출된 특징벡터와 설정된 인식대상단어를 이용하여 단어인식을 수행하는 인식부, 상기 피치정보에 의해 음성과 음성이 아닌 소리로 판별하여 특징벡터를 추출하는 상기 전처리수단의 동작을 제어하며, 사용자의 선택모드정보를 상기 인식부로 제공하며, 인식부의 인식결과를 VMS주처리부로 전달하기 위한 제어를 수행하는 제어부 및 상기 인식부에서 수행된 인식결과에 따라 사용자에게 적절한 서비스를 제공하는 VMS주처리부를 포함하는 음성사서함서비스(VMS)를 위한 음성인식시스템에 있다.A feature of the present invention for achieving the above object is a voice recognition system for a voice mail service (VMS), to remove the echo and noise of the input voice, the starting point for each input unit for the echo and noise is removed voice Extracts and end points, obtains pitch information, preprocessing means for extracting feature vectors of the input speech from the pitch information, receives selection mode information of the user, sets recognition target words, and detects the detected feature vectors and the set recognition objects. A recognition unit that performs word recognition using words, and controls the operation of the preprocessing means for extracting a feature vector by discriminating voice and non-voice sound based on the pitch information, and providing the user's selection mode information to the recognition unit. The controller performs a control for transferring the recognition result of the recognition unit to the VMS main processor, and the recognition performed by the recognition unit. According to and in the speech recognition system for voice mail service (VMS) to the user comprises a VMS main processor to provide the appropriate services.

이하, 첨부된 도면들을 참조하여 본 발명의 바람직한 일 실시예를 상세히 설명하겠다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 음성사서함서비스(VMS)를 위한 음성인식시스템을 보여주는 도면이다. 본 발명의 음성인식시스템은 음성명령어를 입력받아 음성인식을 위한 전처리를 수행하는 전처리수단(20), 전처리된 음성을 인식하는 인식부(30)와, 음성인식을 제어하는 제어부(40) 및 VMS주처리부(50)를 포함한다. 전처리수단(20)은 입력음성과 안내방송 음성을 분리하기 위한 에코제거(echo cancel)와 입력음성내의 잡음을 제거하는 잡음제거(noise cancel)기능을 갖는 에코 및 잡음제거부(22)와, 입력음성의 시작과 끝을 추출함으로써 피치정보를 추출하는 끝점추출 및 피치정보검출부(24)를 구비한다. 또한, 전처리수단(20)은 끝점추출 및 피치정보검출부(24)의 출력에 대해 인식부(28)에서 사용될 특징벡터를 계산하는 특징벡터추출부(26)를 포함한다.2 is a diagram illustrating a voice recognition system for a voice mail service (VMS) of the present invention. The voice recognition system of the present invention receives a voice command, the preprocessing means 20 for performing preprocessing for voice recognition, the recognition unit 30 for recognizing the preprocessed voice, the controller 40 for controlling the voice recognition and VMS The main processing unit 50 is included. The preprocessing means 20 includes an echo canceling unit 22 having an echo cancel function for separating the input voice and the announcement broadcast voice and a noise cancel function for removing noise in the input voice, and an input. End point extraction and pitch information detection section 24 for extracting pitch information by extracting the beginning and the end of speech. The preprocessing means 20 also includes a feature vector extraction section 26 for calculating a feature vector to be used in the recognition section 28 for the output of the endpoint extraction and pitch information detection section 24.

위와같은 구성을 갖는 본 발명의 음성인식시스템의 동작에 대해 상세히 설명하겠다.The operation of the voice recognition system of the present invention having the above configuration will be described in detail.

본 발명의 실시예에서, 입력되는 음성명령어는 8㎑-16bit로 이루어진 펄스코드변조(PCM;pulse code modulation)데이타 포맷으로 한다.In the embodiment of the present invention, the voice command to be input has a pulse code modulation (PCM) data format consisting of 8 ㎑-16 bits.

에코 및 잡음제거부(22)는 이러한 데이터 포맷을 갖는 음성명령어를 입력받는다. 에코 및 잡음제어부(22)는 사용자가 VMS기능을 사용하는 도중에 출력되는 안내방송과, 사용자가 발성하는 음성이 서로 섞여 시스템으로 입력될 때, 사용자 음성만을 추출하기 위한 것으로, 에코제거기능은 입력되는 음성내에 존재하는 안내방송음성을 제거하도록 필터의 계수를 반복적으로 변화시키면서 사용자의 발성음성과 유사하도록 만들어 간다. 이렇게 입력되는 음성에 대하여 에코가 제거되는 과정과 약간의 차이를 두고 잡음제거부가 동작하게 된다. 잡음제거부는 입력되는 음성의 처음 30∼50㎳음성의 데이터를 잡음의 초기정보로 하여 입력되는 음성의 매 프레임마다 스펙트랄차감법(spectral substation method)을 이용하여 음성내에 존재하는 잡음을 제거한다.The echo and noise canceling unit 22 receives a voice command having this data format. The echo and noise control unit 22 extracts only the user's voice when the announcement broadcasted while the user uses the VMS function and the user's voice are mixed with each other and input into the system. The filter coefficients are repetitively changed to remove the announcement voice present in the voice, making it similar to the voice of the user. The noise canceling unit operates with a slight difference from the echo cancellation process for the input voice. The noise canceling unit removes the noise present in the speech by using a spectral substation method for every frame of the input speech, using the first 30 to 50 dB voice data of the input voice as initial information of the noise.

끝점추출 및 피치정보검출부(24)는 끝점추출기능과 피치정보추출기능의 두가지를 수행한다. 끝점추출 및 피치정보검출부(24)는 에코가 제거된 음성에 대하여 프레임 단위별로 음성의 '시작점'과 '끝점'을 찾는다. 이 음성의 시작점과 끝점 사이의 구간에서, 매 프레임별로 3개의 프레임데이타를 이용하여 피치정보를 추출한다. 실시예에서, 피치정보추출은 자기상관법에 의한 방식으로 1차대역필터를 이용한다. 이 피치정보는 입력음성의 전체 프레임에 대한 피치의 평균값, 분산, 최대값과 최소값의 차, 그리고 앞, 뒤 프레임의 피치값의 변화율을 말한다. 또한, 끝점추출 및 피치정보검출부(24)는 추출된 피치정보를 제어부(40)로 전달한다. 제어부(40)는 이 피치정보를 이용하여 음성과 음성이 아닌 소리(잡음)를 구분하여 특징벡터추출부(26)의 동작을 제어한다. 제어부(40)의 음성과 음성이 아닌 소리(잡음)로 구분하는 것에 대한 자세한 설명은 도 3을 참조하여 자세히 후술하겠다. 끝점추출 및 피치정보검출부(24)는 입력되는 음성명령어중 시작점과 끝점 사이만의 데이터를 특징벡터추출부(26)로 출력한다. 특징벡터추출부(26)는 끝점 추출결과를 이용하여 13차 MFCC와 13차 delta-MFCC, 13차 delta-delta-MFCC를 계산하여 인식부(30)로 출력한다. 인식부(30)는 음성데이타에 대한 특징벡터를 내부의 메모리(도면 미도시)에 저장한다. 그리고, 특징벡터에 대하여 빔서치(beam search)와 비터비디코딩(viterbi decoding)을 통하여 대상 단어에 대하여 인식을 수행한다. 인식부(30)는 대상 단어에 대한 인식결과를 제어부(40)로 출력한다. 제어부(40)는 끝점추출 및 피치정보검출부(24)에서 제공되는 피치정보와 사용자가 기존에 발성하여 저장된 피치정보를 비교하여 기존의 피치정보 범위안에 존재하는 경우는 특징벡터추출부(26)가 계속 동작하도록 제어한다. 그렇지 않을 경우, 특징벡터추출부(26)가 동작하지 않도록 제어한다. 이와 동시에, 사용자에게 안내방송을 다시 송출하도록 VMS주처리부(50)로 제어신호를 출력하고, 에코 및 잡음제거부(22)가 동작하도록 제어신호를 출력하여 사용자의 입력음성을 받도록 준비시킨다.The end point extraction and pitch information detection unit 24 performs two functions, an end point extraction function and a pitch information extraction function. The end point extraction and pitch information detection unit 24 finds the 'start point' and 'end point' of the voice for each frame of the voice from which the echo is removed. In the section between the start point and the end point of the voice, pitch information is extracted using three frame data for each frame. In an embodiment, pitch information extraction uses a first order filter in a manner by autocorrelation. This pitch information refers to the average value, the variance, the difference between the maximum value and the minimum value of the pitch for the entire frame of the input voice, and the rate of change of the pitch value of the front and rear frames. In addition, the end point extraction and pitch information detection unit 24 transmits the extracted pitch information to the control unit 40. The control unit 40 controls the operation of the feature vector extraction unit 26 by using this pitch information to distinguish between voice and non-voice sound (noise). A detailed description of the division of the voice of the controller 40 into the sound (noise) other than the voice will be described later with reference to FIG. 3. The end point extraction and pitch information detection unit 24 outputs data only between the start point and the end point of the input voice command word to the feature vector extraction unit 26. The feature vector extractor 26 calculates the 13 th order MFCC, the 13 th order delta-MFCC, and the 13 th order delta-delta-MFCC using the end point extraction result, and outputs the result to the recognizer 30. The recognition unit 30 stores the feature vector for the voice data in an internal memory (not shown). The target word may be recognized through the beam search and the viterbi decoding on the feature vector. The recognition unit 30 outputs the recognition result of the target word to the control unit 40. The controller 40 compares the pitch information provided from the end point extracting and pitch information detecting unit 24 with the pitch information stored by the user, and the feature vector extracting unit 26 is present in the existing pitch information range. Control to keep running. Otherwise, the feature vector extraction unit 26 is controlled to not operate. At the same time, the control signal is output to the VMS main processor 50 so as to transmit the announcement to the user again, and the control signal is output so that the echo and noise canceling unit 22 is operated to prepare to receive the user's input voice.

또한, 제어부(40)는 현재의 인식모드에 대한 모든 정보를 VMS주처리부(50)로부터 입력받아 인식부(30)로 출력하며, 이와 동시에 에코 및 잡음제거부(22)로 인식준비 명령신호를 출력한다. VMS주처리부(50)는 다수의 VMS기능들중 사용자가 사용하는 기능이 어떤 것인지에 대한 정보를 제어부(40)로 출력한다. 제어부(40)는 이 정보를 내부의 메모리(도면 미도시)에 저장한다. 인식부(30)는 현재 선택된 기능 또는 과거 선택된 기능에 대한 모든 정보를 이용하여 인식대상 단어와 그에 알맞은 모델 파라미터를 이용하여 인식을 수행한다. 특히, 현재 선택된 기능이 비밀번호 입력등과 같은 기능일 경우, 이때는 4자리 숫자음이 대상단어가 된다. 4자리 숫자음의 인식은 본 발명에서는 연속 숫자음 인식방법이 아닌 고립단어 인식방법을 이용하며, 한자리씩 발성하여 인식결과를 받아 진행하도록 한다. 각 자리에서 인식이 성공된 숫자는 인식부(30)의 메모리(도면 미도시)에 임시저장되며, 각 자리에서 틀린 경우에는 다시 틀린 숫자만을 재발성하도록 하도록 사용자에게 요구한다. 그래서, 4자리의 인식이 모두 성공되면, 전체 인식이 성공되어 비밀번호 인증을 받게 된다.In addition, the control unit 40 receives all information on the current recognition mode from the VMS main processing unit 50 and outputs it to the recognition unit 30, and at the same time sends the recognition preparation command signal to the echo and noise canceling unit 22. Output The VMS main processing unit 50 outputs information on which function a user uses among the plurality of VMS functions to the control unit 40. The controller 40 stores this information in an internal memory (not shown). The recognition unit 30 performs recognition using all the information on the currently selected function or the previously selected function by using the recognition target word and a model parameter corresponding thereto. In particular, when the currently selected function is a function such as a password input, at this time, the 4-digit tone becomes the target word. In the present invention, the recognition of the four digit number uses the isolated word recognition method instead of the continuous digit sound recognition method, and proceeds by receiving the recognition result by uttering one digit. The numbers successfully recognized at each digit are temporarily stored in a memory (not shown) of the recognition unit 30, and if the digits are incorrect at each digit, the user is requested to re-create only the wrong digit again. Therefore, if all four-digit recognition is successful, the entire recognition is successful and the password is authenticated.

도 3은 음성과 음성이 아닌 소리(잡음)로 판정하는 제어부(40)의 동작을 설명하기 위한 흐름도이다. 제어부(40)는 끝점 및 피치정보검출부(24)로부터 피치정보를 입력받는다(단계 310). 그리고, 메모리(도면 미도시)에 저장된 기존에 구출된 단어에 대한 피치정보들과 현재 입력된 음성의 피치정보를 비교한다(단계 320). 그 다음, 현재 음성의 피치정보가 기존에 구축된 단어의 피치정보들중 가장 근사적인 피치정보와의 오차가 일정한 유효한계에 포함되는지 판단한다(단계 330). 음성명령어의 피치정보가 기존에 구축된 단어의 피치정보들중 가장 근사적인 피치정보와의 오차가 일정한 유효한계에 포함되면, 단어로 판단하고(단계 340), 단계 390으로 넘어가 특징벡터추출을 계속하도록 제어한다. 그러나, 음성의 피치정보가 기존에 구축된 단어의 피치정보들중 가장 근사적인 피치정보와의 오차가 일정한 유효한계에 포함되지 않으면, 기존에 구축된 잡음의 피치정보들과 비교한다(단계 350). 단계 360에서, 기존에 구축된 잡음의 피치정보와의 오차가 일정한 유효한계에 포함되면 잡음으로 판정하고(단계 380), 특징벡터추출부(26)로 특징벡터추출을 중단시키는 제어신호를 출력한다(단계 400). 그러나, 단계 360에서, 기존에 구축된 잡음의 피치정보와의 오차가 일정한 유효한계에 포함되지 않으면, 유사단어로 판정하고(단계 370), 특징벡터추출을 계속하도록 특징벡터추출부(26)를 제어한다. 여기서, 기존에 구축된 각 단어에 대한 피치정보는 인식결과에 의해 새롭게 갱신된다. 그러나, 유사단어로 판정된 단어에 대해서는 갱신이 이루어지지 않는다. 또한, 잡음으로 판정받은 경우도 잡음에 대한 피치정보가 새롭게 갱신된다.3 is a flowchart for explaining the operation of the control unit 40 to determine sound and noise other than voice (noise). The controller 40 receives pitch information from the endpoint and the pitch information detector 24 (step 310). Then, the pitch information of the existing rescued word stored in the memory (not shown) is compared with the pitch information of the currently input voice (step 320). Next, it is determined whether the pitch information of the current voice is included in a certain valid system with an error from the most approximate pitch information among the existing pitch information of the constructed word (step 330). If the pitch information of the voice command is included in a certain valid system with an error of the most approximate pitch information among the existing pitch information of the constructed word, it is determined as a word (step 340), and the process proceeds to step 390 to continue the feature vector extraction. To control. However, if the pitch information of the voice is not included in a certain valid system, the error with the most recent pitch information among the pitch information of the constructed word is compared with the pitch information of the existing built noise (step 350). . In step 360, if the error with the pitch information of the existing noise is included in a certain valid system is determined to be noise (step 380), and outputs a control signal to stop the feature vector extraction to the feature vector extraction unit 26 (Step 400). However, in step 360, if the error with the pitch information of the previously constructed noise is not included in a certain valid system, it is determined as a pseudo word (step 370), and the feature vector extraction unit 26 is continued to continue feature vector extraction. To control. Here, the pitch information for each constructed word is newly updated by the recognition result. However, no update is made for words determined as similar words. In addition, even in the case of being judged as noise, the pitch information for the noise is newly updated.

상술한 바와같이, 본 발명의 음성사서함서비스(VMS)를 위한 음성인식시스템은 기존에 서비스 되고 있는 음성사서함(VMS)서비스를 보다 사용자가 편리하게 사용할 수 있도록 하며, 사용자의 대상단어에 대한 피치정보를 이용하여 음성이 아닌 입력에 대하여 거절기능을 제공하며, 사용자가 사용중인 기능에 대한 정보를 이용하여 각 기능마다 대상단어를 줄일 수 있어 인식속도와 인식율이 향상되는 효과가 있다. 특히, 비밀번호 인증과정에 사용되는 4자리 숫자인 경우 4자리 음성을 동시에 입력받아 확인하는 방식 대신에 각 자리 숫자에 대하여 인식을 수행하는 방식으로 구현되어 사용자의 편의성과 인식성능을 개선하였다.As described above, the voice recognition system for the voice mail service (VMS) of the present invention allows a user to conveniently use a voice mail service (VMS) that has been previously serviced, and the pitch information of the target word of the user. It provides a rejection function for non-speech inputs, and reduces the target word for each function by using information on the function being used by the user, thereby improving recognition speed and recognition rate. In particular, in the case of four-digit numbers used in the password authentication process, instead of receiving and confirming four-digit voices at the same time, it is implemented by performing recognition on each digit, thereby improving user convenience and recognition performance.

Claims

In the voice recognition system for voice mailbox service (VMS),

Preprocessing means for removing echo and noise of the input voice, extracting start and end points for each input unit of the voice from which the echo and noise have been removed, and obtaining pitch information, and extracting feature vectors of the input voice from the pitch information;

A recognition unit configured to receive a user's selection mode information, set a recognition target word, and perform word recognition using the detected feature vector and the set recognition target word;

It controls the operation of the preprocessing means for extracting the feature vector by discriminating the sound and the sound other than the voice based on the pitch information, providing the user's selection mode information to the recognition unit, and transfers the recognition result to the VMS main processor. A control unit which performs a control to perform the control; And

A voice recognition system for a voice mail service (VMS) comprising a VMS main processing unit for providing an appropriate service to the user according to the recognition result performed in the recognition unit.

The method of claim 1, wherein the pretreatment means

An echo and noise canceller having an echo cancel for separating the input voice and the announcement voice and a noise cancel function for removing noise in the input voice;

An end point extraction and pitch information detection unit for extracting the 'start point' and 'end point' of the input voice and extracting pitch information using the 'start point' and 'end point'; And

And a feature vector extracting unit for calculating a feature vector to be used in the recognition unit with respect to the extracted input voice end point and the pitch information.

The apparatus of claim 1 or 2, wherein the controller compares the pitch information of the current voice input with the pitch information of a previously constructed word to convert the word into a word when an error of the most approximate pitch information is included in a certain valid system. Discriminating and controlling the feature vector extracting unit to continuously extract the feature vector. Otherwise, the input voice is discriminated as noise when the error of the most approximate pitch information is included in a certain valid system. Voice recognition system for voice mailbox service (VMS), characterized in that the control to stop the feature vector extraction of the feature vector extraction unit.

The apparatus of claim 2, wherein the end point extracting and pitch information detecting unit searches for the 'start point' and 'end point' of the voice for each input unit with respect to the input voice, and a predetermined number of frames for each frame in a section between the start point and the end point of the voice. Speech recognition system for voice mail service (VMS), characterized in that to extract the pitch information using the data.

5. The method of claim 4, wherein the end point extraction and pitch information detection unit uses a first-order filter in an autocorrelation method to obtain an average value, a variance, a difference between a maximum value and a minimum value of the pitch, and the front of the entire frame of the input voice. Voice recognition system for voice mail service (VMS), characterized in that for extracting the pitch information including the rate of change of the pitch value of the rear frame.