KR20050098349A

KR20050098349A - Apparatus and method for automatic dialling in a mobile portable telephone

Info

Publication number: KR20050098349A
Application number: KR1020040023355A
Authority: KR
Inventors: 김강열; 강상기
Original assignee: 삼성전자주식회사
Priority date: 2004-04-06
Filing date: 2004-04-06
Publication date: 2005-10-12
Also published as: KR100827074B1

Abstract

본 발명은 이동 통신 단말기에서, 특정 구조를 가지는 단어로부터 미리 설정된 설정 개수의 음절들인 핵심어들의 음성 특징 패턴들과 숫자를 나타내는 숫자 인덱스들의 음성 특징 패턴들 및 다수의 상대방 정보들을 상기 숫자 인덱스들과 매핑하여 저장하고, 최초로 입력되는 음성의 특징 패턴이 상기 핵심어들 중 어느 한 핵심어의 음성 특징 패턴과 일치할 경우 상기 입력 음성을 핵심어로 판단한다. 상기 상대방 정보들 중 상기 판단된 핵심어를 가지는 상대방 정보들 및 그에 매핑되어 있는 숫자 인덱스들을 디스플레이한 후, 상기 상대방 정보들 및 그 숫자 인덱스들을 디스플레이한 후 입력되는 음성의 음성 특징 패턴이 상기 숫자 인덱스들 중 어느 한 숫자 인덱스의 음성 특징 패턴과 일치할 경우 상기 일치하는 숫자 인덱스에 매핑되어 있는 상대방 정보에 상응하게 자동 다이얼링한다. According to the present invention, in a mobile communication terminal, voice feature patterns of key words, which are a preset number of syllables, voice feature patterns of numeric indices representing numbers, and a plurality of counterpart information are mapped to the numeric indices in a mobile communication terminal. If the feature pattern of the first input voice coincides with the voice feature pattern of any one of the key words, the input voice is determined as a key word. After displaying the counterpart information having the determined key word among the counterpart information and the numeric indexes mapped thereto, the voice feature pattern of the input voice is displayed after displaying the counterpart information and the numeric indices. If the voice feature pattern of any one of the numeric indexes is matched, the device automatically dials corresponding to the counterpart information mapped to the corresponding numeric index.

Description

Apparatus and method for automatic dialing of mobile communication terminal {APPARATUS AND METHOD FOR AUTOMATIC DIALLING IN A MOBILE PORTABLE TELEPHONE}

본 발명은 이동 통신 단말기에서 자동 다이얼링 장치 및 방법에 관한 것으로서, 특히 핵심어 음성 인식을 사용하는 자동 다이얼링 장치 및 방법에 관한 것이다.The present invention relates to an automatic dialing apparatus and method in a mobile communication terminal, and more particularly, to an automatic dialing apparatus and method using key word speech recognition.

일반적으로 음성인식 기술은 학습시킨 특정인의 음성만을 인식하는 화자 종속형 음성 인식 기술과 모든 사람들의 음성을 인식하는 화자 독립형 음성 인식 기술로 분류된다. 상기 화자 종속형 음성 인식 기술을 사용하는 이동 통신 단말기의 음성 인식 모드에서는 상기 이동 통신 단말기의 사용자가 미리 설정한 다이얼링 음성에 해당하는 다이얼링 음성만을 인식하여 자동 다이얼링을 수행하는 것이 가능하다. 그래서, 상기 화자 종속형 음성인식 기술은 음성 인식률이 상당히 저조하고 사용이 복잡하여 불편하고, 전화번호 등록 시 상기 등록하는 전화번호와 매핑될 음성도 별도로 저장을 해야 하기 때문에 상기 이동통신 단말기의 메모리 사용량이 늘어나게 된다는 문제점이 있었다.In general, speech recognition technology is classified into speaker-dependent speech recognition technology that recognizes only the voice of a specific person who has learned, and speaker-independent speech recognition technology that recognizes all people's speech. In the voice recognition mode of the mobile communication terminal using the speaker dependent voice recognition technology, it is possible to perform automatic dialing by recognizing only the dialing voice corresponding to the dialing voice preset by the user of the mobile communication terminal. Thus, the speaker-dependent speech recognition technology has a very low speech recognition rate and is complicated to use, which is inconvenient, and requires a separate storage of a voice to be mapped with the registered telephone number when registering a phone number. There was a problem that this will increase.

상기 화자 종속형 음성인식 기술의 문제점을 해결하기 위해 화자 독립형 음성 인식 기술을 사용하여 자동 다이얼링하는, 즉 음소 단위 화자 독립 인식기를 이용하여 전화번호를 검색한 후 자동 다이얼링하는 방법이 최근에 많이 사용되고 있다. 상기 음소 단위 화자 독립 인식기는 이동 통신 단말기를 불특정 다수의 사용자들이 사용할 수 있도록 제작된다. 상기 음소 단위 화자 독립 인식기는 언어학적으로 구성된 단어나 어구에 해당하는 음성을 가능한 많은 사람들의 경우에 대해 확보하여 구성된다. 이렇게 많은 사람들의 음성을 확보해야 하는 이유는 사람마다 단어를 발성하는 방식이 다르고 지역에 따라서 같은 단어를 발음하더라도 억양과 강세가 다르기 때문이며, 이러한 다양한 발성들을 이용한 단어들로 이루어진 음성 인식용 데이터베이스를 구축하여 상기 음소 단위 화자 독립기에 적용해야 하기 때문이다. 따라서, 상기 화자 독립형 음성 인식 기술을 사용하기 위해서는 단어를 음절 단위로 나누고 또 다시 음소 단위로 나누어서 분석을 하게 되고, 그 후에 음소 별로 최적화된 단어 네트워크를 구성해야 한다. 상기 화자 독립형 음성 인식 기술은 상기에서 설명한 바와 같은 방식으로 음성을 인식하기 때문에 새로운 단어가 입력되면 상기 이동 통신 단말기 메모리 사용이 현저히 증가하게 된다.In order to solve the problem of the speaker-dependent speech recognition technology, a method of automatically dialing using a speaker-independent speech recognition technology, that is, a phone number using a phoneme-based speaker independent recognizer, has been recently used. . The phoneme unit speaker independent recognizer is designed to be used by an unspecified number of users of the mobile communication terminal. The phoneme unit speaker independent recognizer is constructed by securing a speech corresponding to a linguistically constructed word or phrase for as many people as possible. The reason why so many people's voices should be secured is that different people speak different words, and accents and accents are different even if they pronounce the same words in different regions. This is because it must be applied to the phoneme unit speaker independent. Therefore, in order to use the speaker-independent speech recognition technology, words are divided into syllable units and divided into phoneme units to be analyzed. After that, an optimized word network for each phoneme should be constructed. Since the speaker-independent speech recognition technology recognizes speech in the manner described above, when a new word is input, the use of the mobile communication terminal memory is significantly increased.

또한, 상기 화자 독립형 음성 인식 기술을 사용할 경우 음성 인식 결과가 여러 가지이면 사용자가 일일이 상기 이동 통신 단말기에 구비되어 있는 키를 이용하여 상기 여러 가지의 결과들 중 어느 한 결과를 선택하여 다이얼링을 해야 한다. 또한, 상기 음성 인식된 결과가 상기 이동 통신 단말기의 화면상에 표시되기 때문에 상기 이동 통신 단말기 사용자가 운전 중일 경우에는 상기 음성 인식 자동 다이얼링을 위해 화면을 봐야만 하기 때문에 사용의 불편함과 동시에 위험성이 증가된다는 문제점이 있다. In addition, when using the speaker-independent speech recognition technology, if the speech recognition results are different, the user must select and dial any one of the results using the keys provided in the mobile communication terminal. . In addition, since the voice recognition result is displayed on the screen of the mobile communication terminal, when the user of the mobile communication terminal is driving, the user must watch the screen for the voice recognition automatic dialing. There is a problem.

따라서, 본 발명의 목적은 이동 통신 단말기에서 핵심어 인식을 사용하는 자동 다이얼링 장치 및 방법을 제공함에 있다.Accordingly, an object of the present invention is to provide an automatic dialing device and method using key word recognition in a mobile communication terminal.

상기한 목적을 달성하기 위한 본 발명의 장치는; 이동통신 단말기의 자동 다이얼링 장치에 있어서, 특정 구조를 가지는 단어에서 미리 설정된 설정 개수의 음절들인 핵심어들의 음성 특징 패턴들과, 숫자를 나타내는 숫자 인덱스들의 음성 특징 패턴들과, 다수의 상대방 정보들을 상기 숫자 인덱스들과 매핑하여 저장하는 메모리와, 입력되는 음성의 특징 패턴을 분석하는 오디오 처리부와, 소정 제어에 따라 상기 상대방 정보들과 숫자 인덱스들을 디스플레이 하는 표시부와, 최초로 입력되는 음성의 특징 패턴이 상기 핵심어들 중 어느 한 핵심어 음성 특징 패턴과 일치할 경우 상기 입력 음성을 핵심어로 판단하고, 상기 상대방 정보들 중 상기 판단된 핵심어를 가지는 상대방 정보들 및 그에 매핑되어 있는 숫자 인덱스들을 상기 표시부에 디스플레이하도록 제어하고, 이후 입력되는 음성의 특징 패턴이 상기 숫자 인덱스들 중 어느 한 숫자 인덱스의 음성 특징 패턴과 일치할 경우 상기 일치하는 숫자 인덱스에 매핑되어 있는 상대방 정보에 상응하게 자동 다이얼링 하도록 제어하는 제어부를 포함함을 특징으로 한다.The apparatus of the present invention for achieving the above object; In the automatic dialing device of a mobile communication terminal, voice feature patterns of key words which are a preset number of syllables in a word having a specific structure, voice feature patterns of numeric indices representing numbers, and a plurality of counterpart information. The key words include a memory for mapping and storing the indexes, an audio processor for analyzing a feature pattern of an input voice, a display unit for displaying the counterpart information and numeric indices according to a predetermined control, and a feature pattern of a first input voice. If one of the keywords match a voice feature pattern, the input voice is determined as a key word, and the counterpart information having the determined key word among the counterpart information and the numeric indexes mapped thereto are controlled to be displayed on the display unit. Feature of voice input after When matching the speech feature patterns of the numerical index of one of the index number it is characterized in that it comprises a control unit for controlling so as to correspond to auto-dialing to the other information, which is mapped to the index number for the matching.

상기한 목적을 달성하기 위한 본 발명의 방법은; 이동통신 단말기의 자동 다이얼링 방법에 있어서, 특정 구조를 가지는 단어에서 미리 설정된 설정 개수의 음절들인 핵심어들의 음성 특징 패턴들과, 숫자를 나타내는 숫자 인덱스들의 음성 특징 패턴들과, 다수의 상대방 정보들을 상기 숫자 인덱스들과 매핑하여 저장하는 과정과, 최초로 입력되는 음성의 특징 패턴이 상기 핵심어들중 어느 한 핵심어의 음성 특징 패턴과 일치할 경우 상기 입력 음성을 핵심어로 판단하는 과정과, 상기 상대방 정보들 중 상기 판단된 핵심어를 가지는 상대방 정보들 및 그에 매핑되어 있는 숫자 인덱스들을 디스플레이하는 과정과, 상기 상대방 정보들 및 그 숫자 인덱스들을 디스플레이한 후 입력되는 음성의 음성 특징 패턴이 상기 숫자 인덱스들 중 어느 한 숫자 인덱스의 음성 특징 패턴과 일치할 경우 상기 일치하는 숫자 인덱스에 매핑되어 있는 상대방 정보에 상응하게 자동 다이얼링 과정을 포함함을 특징으로 한다. The method of the present invention for achieving the above object; In the automatic dialing method of a mobile communication terminal, voice feature patterns of key words which are a preset number of syllables in a word having a specific structure, voice feature patterns of numeric indices representing numbers, and a plurality of counterpart information Mapping and storing the index with the indexes; determining the input voice as a key word if the feature pattern of the first input voice coincides with the voice feature pattern of any one of the key words; and among the counterpart information. Displaying the counterpart information having the determined key word and the numeric indices mapped thereto; and displaying the counterpart information and the numeric indices thereof, and then a voice feature pattern of a voice input is displayed. If it matches the voice feature pattern of the matching And an automatic dialing process corresponding to the counterpart information mapped to the numeric index.

이하, 본 발명에 따른 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 하기의 설명에서는 본 발명에 따른 동작을 이해하는데 필요한 부분만이 설명되며 그 이외 부분의 설명은 본 발명의 요지를 흩트리지 않도록 생략될 것이라는 것을 유의하여야 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. It should be noted that in the following description, only parts necessary for understanding the operation according to the present invention will be described, and descriptions of other parts will be omitted so as not to distract from the gist of the present invention.

도 1은 본 발명의 실시 예에 따른 음성인식을 이용한 이동통신 단말기의 구성을 도시한 도면이다.1 is a diagram illustrating a configuration of a mobile communication terminal using voice recognition according to an embodiment of the present invention.

상기 도 1을 참조하면,RF(102)부는 무선주파수 대역으로 송수신되는 데이터를 처리한다. 상기 RF부는 송신되는 신호의 주파수를 상승변환 및 증폭하는 RF송신기와, 수신되는 신호를 저 잡음 증폭하고 주파수를 하강 변환하는 RF수신기 등을 포함한다.Referring to FIG. 1, the RF 102 unit processes data transmitted and received in a radio frequency band. The RF unit includes an RF transmitter for upconverting and amplifying a frequency of a transmitted signal, and an RF receiver for low noise amplifying a received signal and downconverting a frequency of the received signal.

데이터 처리부(104)는 상기 송신되는 신호를 부호화 및 변조하는 송신기 및 상기 수신되는 신호를 복조 및 복호 화하는 수신기 등을 구비한다. 즉, 상기 데이터 처리부(104)는 모뎀(MODEM) 및 코덱(CODEC)으로 구성될 수 있다.The data processor 104 includes a transmitter for encoding and modulating the transmitted signal, a receiver for demodulating and decoding the received signal, and the like. That is, the data processor 104 may be configured of a modem and a codec.

오디오 처리부(106)는 상기 데이터 처리부에서 출력되는 수신 음성을 재생하거나 마이크로부터 발생되는 송신 오디오신호를 상기 데이터 처리부(104)에 전송하는 기능을 수행한다. 또한 상기 오디오 처리부(106)는 본 발명의 실시 예에 따라 상기 이동 통신 단말기에 저장되어 있는 사용자의 인덱스, 사용자의 이름, 대표 전화번호(핸드폰, 사무실, 집, 기타) 등을 상기 데이터 처리부(104)를 통해 입력받아 스피커로 출력한다. The audio processor 106 plays a function of reproducing a received voice output from the data processor or transmitting a transmission audio signal generated from a microphone to the data processor 104. In addition, the audio processor 106 may display the user's index, the user's name, a representative phone number (cell phone, office, home, etc.) stored in the mobile communication terminal according to an embodiment of the present invention. Input through) and output to the speaker.

키패드(108)는 숫자 및 문자 정보를 입력하기 위한 키들 및 각종 기능 들을 설정하기 위한 기능 키들을 구비한다.The keypad 108 is equipped with keys for inputting numeric and text information and function keys for setting various functions.

메모리(110)는 프로그램 메모리 및 데이터 메모리들로 구성될 수 있다. 상기 프로그램 메모리에는 휴대용 단말기의 일반적인 동작을 제어하기 위한 프로그램들이 저장된다. 또한, 상기 프로그램 메모리에는 휴대용 단말기의 일반적인 동작을 제어하기 위한 프로그램들이 저장된다. 또한, 상기 메모리(110)는 본 발명의 실시 예에 따라 음성 인식 자동 다이얼링을 위한 핵심어 데이터베이스와 숫자 인덱스 데이터베이스를 저장한다. 또한, 상기 메모리(110)는 전화번호 데이터베이스를 저장하며, 상기 전화번호 데이터베이스는 상대방 정보, 즉 상대방 이름과, 전화 번호 항목, 즉 집 전화 번호, 회사 전화 번호, 이동 통신 단말기 전화 번호 등과 같은 전화 번호 항목이 매핑되어 저장된다. 본 발명의 실시 예에서는 상대방 이름이 핵심어를 포함하는 것으로 가정한다. 여기서, 상기 핵심어라 함은 특정한 구조를 가지는 단어에서 미리 설정된 설정 개수의 음절들을 나타낸다. 일 예로, 상기 단어를 상대방 이름(name)이라고 가정할 때 핵심어는 성(family name)이 될 수 있다. 그리고 본 발명의 실시 예에서는 상기 특정 구조를 가지는 단어를 일 예로 하여 설명하였으나, 특정 구조를 가지는 문장 역시 상기 핵심어에 따른 음성 인식 자동 다이얼링이 가능함은 물론이다. The memory 110 may be composed of a program memory and a data memory. The program memory stores programs for controlling general operations of the portable terminal. In addition, the program memory stores programs for controlling a general operation of the portable terminal. In addition, the memory 110 stores a keyword database and a numeric index database for voice recognition automatic dialing according to an embodiment of the present invention. In addition, the memory 110 stores a telephone number database, and the telephone number database includes counterpart information, that is, the name of the counterpart and a phone number such as a home number, a work phone number, a work phone number, a mobile phone number, and the like. The item is mapped and stored. In the embodiment of the present invention, it is assumed that the other party name includes a key word. Here, the key word refers to a preset number of syllables in a word having a specific structure. For example, assuming that the word is a counterpart name, the key word may be a family name. In the embodiment of the present invention, the word having the specific structure has been described as an example, but the sentence having the specific structure may also be automatically dialed in voice recognition according to the key word.

또한, 본 발명의 실시 예에서 상기 메모리(110)는 숫자 인덱스(index)에 해당하는 숫자 인덱스 데이터베이스도 저장한다. 여기서, 상기 숫자 인덱스라 함은 상기 핵심어에 따라 검색된 상대방 정보에 순차적으로 부가되는 번호로서, 상대방들의 정보와 함께 상기 이동통신 단말기에 저장된다.In addition, in an embodiment of the present invention, the memory 110 also stores a numeric index database corresponding to a numeric index. Here, the numeric index is a number sequentially added to the counterpart information retrieved according to the key word and stored in the mobile communication terminal together with the counterpart information.

또한 상기 데이터 메모리에는 상기 단말기의 동작 수행에 있어 발생하는 데이터들을 일시 저장하는 기능을 수행한다.In addition, the data memory temporarily stores data generated in the operation of the terminal.

제어부(100)는 휴대용 단말기의 전반적인 동작을 제어하는 기능을 수행한다. 또한 상기 제어부(100)는 상기 데이터 처리부(104)를 포함할 수도 있다. 또한 상기 제어부(100)는 본 발명의 실시 예에 따라 입력받은 음성을 분석하고, 특징 벡터를 추출하여 메모리(110)에 저장되어 있는 상기 핵심어 데이터베이스로부터의 핵심어와 비교를 통해 패턴을 인식한다. 그리고 상기 제어부(100)는 상기 인식 결과가 핵심어로 인식 될 경우, 즉 상기 인식결과가 성을 나타낼 경우, 상기 메모리(110)에 저장되어 있는 전화번호 데이터베이스에서 상기 핵심어를 사용하는 상대방 이름들을 검색한다. 그리고 나서 상기 제어부(100)는 상기 검색된 상대방 이름과 이에 매핑된 숫자 인덱스를 표시부(118)에 표시하도록 제어한다. 일 예로, 상기 검색된 상대방의 이름들이 10개일 경우 상기 숫자 인덱스는 10개가 되는 것이다. 물론 상기 검색된 상대방 이름들과 숫자 인덱스들은 스피커를 통해 송출될 수도 있다.The controller 100 performs a function of controlling the overall operation of the portable terminal. In addition, the controller 100 may include the data processor 104. In addition, the controller 100 analyzes the input voice according to an embodiment of the present invention, extracts a feature vector, and recognizes a pattern by comparing with a key word from the key word database stored in the memory 110. When the recognition result is recognized as a key word, that is, when the recognition result indicates a last name, the controller 100 searches for counterpart names using the key word in a telephone number database stored in the memory 110. . Then, the controller 100 controls the display unit 118 to display the searched counterpart name and the numeric index mapped thereto. For example, when the searched counterparts have 10 names, the number index is 10. Of course, the searched counterpart names and numeric indices may be transmitted through a speaker.

카메라(112)는 영상 데이터를 촬영하고 신호 처리부(114)는 상기 카메라(112)로부터 출력되는 영상신호를 이미지신호로 변환한다. 영상 처리부(116)는 상기 신호 처리부(114)에서 출력되는 영상 신호를 표시하기 위한 화면 데이터를 발생하는 기능을 수행한다. 제어부(100)의 제어 하에 수신되는 영상신호 및 데이터를 상기 표시부(118)의 규격에 맞춰 전송한다.The camera 112 captures image data, and the signal processor 114 converts the image signal output from the camera 112 into an image signal. The image processor 116 performs a function of generating screen data for displaying an image signal output from the signal processor 114. The image signal and data received under the control of the controller 100 are transmitted in accordance with the standard of the display unit 118.

상기 도 1을 참조하여 상기 이동 통신 단말기의 동작을 설명하면, 마이크를 통해 입력받은 음성은 오디오 처리부(106)로 전달된다. 상기 오디오 처리부(106)는 상기 전달받은 음성에 섞인 잡음을 제거하고 사용하고자 하는 음성이 존재하는 영역만을 검출한다. 여기서, 상기 오디오 처리부(106)는 끝점 검출 방식에 의해 상기 전달받은 음성으로부터 사용하고자 하는 음성만을 검출한다. 상기 끝점 검출 방식은 입력된 음성의 시작점과 끝점을 추출하여 음성인식에 필요한 정보만을 추출해 내는 방식이다. 한편, 사람의 음성은 일정한 주기적 특성을 가지는 아날로그 파형인데 주로 음성인식에서는 벡터 양자화 방식을 사용한다. 여기서, 상기 벡터 양자화 방식은 입력 샘플들의 벡터를 부호화 하는 방식이다. 음성신호의 경우 시간 축 상의 신호는 변별력이 적어서 음성을 단 구간 신호인 프레임 단위로 바꾸고 이것을 변별력이 큰 특성 벡터로 변환한다. 상기 벡터 양자화 방식을 통해 얻어진 상기 특성 벡터를 가지고 개별적으로 구분이 가능한 패턴을 형성한다. 또한 상기 패턴은 상기 이동통신 단말기의 제어부(100)로 입력된다. 상기 제어부(100)는 상기 오디오 처리부(106)에서 출력한 패턴을 상기 메모리(110)에 저장되어 있는 핵심어 데이터베이스상의 패턴과 비교하여 상기 오디오 처리부(106)에서 출력한 패턴이 상기 핵심어 데이터베이스 상의 어떤 패턴과 일치하는 지를 인식하게 된다. 여기서, 상기 핵심어 데이터베이스의 구성에 관한 것은 하기에 설명하기로 한다. Referring to FIG. 1, the operation of the mobile communication terminal will be described. The voice received through the microphone is transmitted to the audio processor 106. The audio processor 106 removes noise mixed with the received voice and detects only an area in which the voice to be used exists. Here, the audio processor 106 detects only a voice to be used from the received voice by an endpoint detection method. The end point detection method is a method of extracting the start point and the end point of the input voice to extract only information necessary for voice recognition. On the other hand, human speech is an analog waveform having a certain periodic characteristics, mainly speech recognition uses a vector quantization method. The vector quantization method is a method of encoding a vector of input samples. In the case of the voice signal, the signal on the time axis has a small discrimination force, so that the voice is converted into a unit of a frame, which is a short interval signal, and converted into a feature vector having a large discrimination force. An individually distinguishable pattern is formed using the characteristic vector obtained through the vector quantization method. In addition, the pattern is input to the controller 100 of the mobile communication terminal. The controller 100 compares the pattern output from the audio processor 106 with the pattern on the keyword database stored in the memory 110, and the pattern output from the audio processor 106 is a pattern on the keyword database. Will be recognized as Here, the configuration of the keyword database will be described below.

상기 음성인식 과정에서 패턴 분석의 기본 단위는 단어, 음절, 음소 등이 사용될 수 있으나 음소 단위는 단어 및 음절 단위 보다 그 종류가 작고 음향적인 특성을 인식기에 고르게 반영할 수 있는 장점을 가진다. 그래서 상기 이동통신 단말기에는 음소 단위 화자 독립 인식기를 채택하여 사용자의 음성을 음소단위 벡터로 분석한다.Word, syllable, phoneme, etc. may be used as a basic unit of pattern analysis in the speech recognition process, but the phoneme unit is smaller than the word and syllable unit, and has the advantage of uniformly reflecting acoustic characteristics to the recognizer. Therefore, the mobile communication terminal adopts a phoneme unit speaker independent recognizer and analyzes the user's voice as a phoneme unit vector.

상기 입력받은 음성을 분석하여 추출된 특징벡터를 통해 상기와 같은 패턴인식을 수행하여 얻어진 결과 값을 가지고 상기 이동통신 단말기에 저장되어 있는 숫자 인덱스, 상대방의 이름, 대표 전화번호(핸드폰, 사무실, 집, 기타)와 함께 표시부(118)에 출력한다. 상기 표시부(118)를 통해 디스플레이 되는 데이터는 문자를 음성으로 변환하여 주는 음성합성기(TTS: Text-To-Speech, 이하 TTS라 칭하기로 한다)를 이용해 음성 데이터로 변환되어 스피커를 통해 사용자에게 전달될 수도 있다. 상기 스피커를 통해 숫자 인덱스를 포함하는 정보를 전달 받은 사용자는 통화를 원하는 상대방 정보의 숫자 인덱스를 발성한다. 이 숫자음은 이동통신 단말기의 CH_STA_REQ음 인식기로 인식이 된 후 사용자가 원하는 상대방에게 자동 다이얼링을 한다. 또한 사용자는 상기 표시부(118)을 통해 디스플레이 되는 데이터를 보고 키패드(108)의 입력을 통하여 사용자가 원하는 상대방에게 자동 다이얼링 할 수 있다. The numerical index stored in the mobile communication terminal, the name of the other party, the representative phone number (cell phone, office, home) with the result value obtained by performing the pattern recognition through the feature vector extracted by analyzing the input voice. , And the like) are output to the display unit 118. The data displayed through the display unit 118 is converted into voice data using a voice synthesizer (TTS: Text-To-Speech, hereinafter referred to as TTS) that converts text into voice to be transmitted to a user through a speaker. It may be. The user who receives the information including the numeric index through the speaker utters the numeric index of the other party's information to be called. The digital tone is recognized by the CH_STA_REQ tone recognizer of the mobile communication terminal and then automatically dialed by the user. In addition, the user may view the data displayed through the display unit 118 and automatically dial the desired counterpart through the input of the keypad 108.

도 2는 본 발명의 실시 예에 따른 음성인식을 위한 핵심어 데이터베이스 구성과정을 보여주는 도면이다.2 is a diagram illustrating a process of constructing a keyword database for speech recognition according to an embodiment of the present invention.

상기 도 2를 참조하면 본 발명에서는 핵심어를 사용한 화자 독립 음성인식 방법을 제안하였으므로 우선 불특정 다수에 대한 핵심어 발성 수집이 이루어져야 한다. 불특정 다수의 핵심어 발성을 통하여 상기 입력 받은 음성의 벡터 값들을 사용하여 음성 특징을 추출한다(200). 또한, 상기 수집된 음성에 대해 나타나는 음성들을 조사하고 분석하여 음절 단위 보다 하위의 음소들에 대한 분석을 통해 목록을 작성하는 음소 분석 과정을 거친다(206). 그리고 작성된 목록의 음성의 각 부분에 대응하는 음절 혹은 음소 기호를 할당하는 것을 레이블링이라고 한다. 레이블링의 단위는 단어, 문장 등도 가능하며, 음소보다 더 작은 단위를 이용할 수도 있다. 하지만 본 발명에서는 음소들을 통한 음소 레이블링을 통해 입력받은 음성을 처리한다(208). 상기 불특정 다수의 핵심어 음성들로부터 추출된 특징 값과 음소로 레이블링 된 음성 데이터들을 가지고, 그 분포들에 대한 반복을 통해서 확률 통계적인 훈련을 한다(202). 다음으로 훈련된 정보를 가지고, 확률 통계를 바탕으로 하는 패턴을 매칭시켜 접근해 나가는 방식의 하나인 핵심어 은닉 마콥 모델(HMM: Hidden Markov Model, 이하 HMM이라 칭하기로 한다)을 만든다(204). HMM은 관측이 불가능한 프로세스를 관측이 가능한 심볼(symbol)로 발생시키는 프로세스를 가지는 확률 프로세스이다. 때문에 음성과 같이 다변성이 많고 발생 과정을 알 수 없는 프로세스를 표현하는데 적절한 모델링 방법 중의 하나로 적용된다. 또한, 상기 핵심어로 수집된 음성들에 대한 음소 레이블링 과정을 거친 레이블링된 음성들은 핵심어 음소 네트워크를 구성한다(210). 이 네트워크의 구성은 음성인식 전에 미리 단어간 음운 변화 현상들을 적용하여 단어의 앞뒤에 음소 문맥을 적용 되어 있고 레이블링 과정을 거친 음성정보들과 결합하여 구성된다. Referring to FIG. 2, since the present invention has proposed a speaker-independent speech recognition method using key words, first, a key word vowel collection for an unspecified majority should be made. The speech feature is extracted using the vector values of the received speech through an unspecified number of keywords. In addition, the phoneme analysis process of the collected voices is performed by analyzing and analyzing the voices that appear in the collected voices and creating a list through analysis of phonemes below the syllable unit (206). The assignment of syllables or phoneme symbols corresponding to each part of the voice of the created list is called labeling. The unit of labeling may be a word, a sentence, or the like, and a unit smaller than a phoneme may be used. However, the present invention processes the received voice through phoneme labeling through the phonemes (208). Probability statistical training is performed by repeating the distributions of feature data and phoneme-labeled speech data extracted from the unspecified key word speech (202). Next, with the trained information, a key word hidden model (HMM: Hidden Markov Model, HMM), which is a method of matching and accessing patterns based on probability statistics, is created (204). HMM is a stochastic process having a process of generating an unobservable symbol as an observable symbol. Therefore, it is applied as one of the appropriate modeling methods for representing processes that have many variability and unknown processes such as voice. In addition, the labeled voices that have undergone a phoneme labeling process on the voices collected in the keyword form a keyword phoneme network (210). This network consists of phonetic contexts applied before and after words by applying phonological changes between words before speech recognition, and combined with speech information that has been labeled.

그리고 상기 핵심어 HMM 모델과 상기 핵심어 음소 네트워크를 결합하여 음소 단위 핵심어 데이터베이스를 단말기에 저장하기 이전에 구성한다. 상기 음소 단위 핵심어 데이터베이스는 음소 단위로 구성된 방법이므로 음소 단위 이상의 조합이 가능하다. The key word HMM model and the key word phoneme network are combined before the key word database is stored in the terminal. The phoneme unit keyword database is composed of phoneme units, so a combination of phoneme units or more is possible.

본 발명에 있어서, 예를 들면 한국인의 성씨 같은 핵심어는 성씨를 구성하는 가지 수가 약 400여 가지 존재한다. 그래서 핵심어 데이터베이스를 구성하는데 있어서, 기존의 화자 독립 음성인식 시스템에 비해 구성하는 소요 시간을 줄일 수 있고, 인식에 필요한 검색과정에 대한 부담을 완화 시킬 수 있다. 그래서 상기 음성인식을 이용한 이동통신 단말기에서는 그 단어의 가지 수와 종류가 제한되는 핵심어를 사용함으로써 핵심어 데이터베이스 구성에 효율을 높일 수 있다.In the present invention, for example, about 400 key words, such as the surname of a Korean, exist for the surname. Therefore, in constructing the key word database, the time required for constructing can be reduced compared to the existing speaker independent speech recognition system, and the burden on the searching process required for recognition can be alleviated. Therefore, the mobile communication terminal using the voice recognition can increase the efficiency of the keyword database configuration by using a key word of which the number and type of words is limited.

도 3은 본 발명의 실시 예에 따른 자동 다이얼링을 위한 제어 흐름을 보여주는 구성도면이다.3 is a block diagram showing a control flow for automatic dialing according to an embodiment of the present invention.

상기 도 3을 참조하면 상기 이동 통신 단말기는 사용자의 음성을 입력받는다. 상기 입력되는 음성은 사용자가 통화를 원하는 상대방 이름 또는 검색 결과에 따라 사용자가 발성하는 숫자음 등이 될 수 있다. 상기 입력받은 사용자의 음성에 포함되어 있는 잡음을 제거하고 음성 인식에 사용될 실 음성 구간을 끝점 검출 하는 음성 검출 과정을 거친다(302). 그리고 검출된 음성으로부터 각각의 음성들에 대한 특징 벡터들을 추출한다(304). 여기서 상기 특징 벡터 추출 할 때 음소 단위 화자 독립 인식기가 적용되므로 음소 단위의 특징이 추출되어야 한다. 사용자 음성의 입력 후에 상기의 음성을 검출하고 특징 벡터를 추출하는 과정을 음성인식의 전 처리 과정이라고 한다. 상기 전 처리 과정을 거친 사용자의 음성은 상기 이동통신 단말기의 메모리에 저장된 패턴과 비교하여 입력된 음성에 대한 패턴을 인식한다(306). 또는 사용자로부터 입력받아 전 처리 과정을 거친 숫자음을 상기 이동통신 단말기의 메모리에 저장된 패턴과 비교하여 패턴을 인식 한다(306). Referring to FIG. 3, the mobile communication terminal receives a voice of a user. The input voice may be, for example, a name of a counterpart to whom the user wants to talk or a digit sound that is spoken by the user according to a search result. A voice detection process is performed to remove noise included in the received voice of the user and to detect an end point of a real voice section to be used for voice recognition (302). Then, feature vectors for respective voices are extracted from the detected voice (304). Since the phoneme independent speaker recognizer is applied to the feature vector extraction, the feature of the phoneme unit must be extracted. The process of detecting the voice and extracting the feature vector after inputting the user's voice is referred to as preprocessing of voice recognition. The voice of the user who has undergone the preprocessing process recognizes the pattern of the input voice by comparing with the pattern stored in the memory of the mobile communication terminal (306). Alternatively, the pattern is recognized by comparing the digital sound received from the user to the pattern stored in the memory of the mobile communication terminal (306).

그리고 인식되는 상기 패턴 값들은 벡터들의 확률분포를 통해 인식되므로 0과 1사이의 값을 갖게 된다. 그리고 이 값을 로그를 취해서 보다 넓은 영역으로 수의 범위를 확산시킨다. 상기의 단계를 거쳐서 패턴 인식된 출력 값(P)들은 인식거절 문턱 값(R)과 비교된다(308). 상기의 비교 과정은 <수학식 1>에 나타나 있다.The recognized pattern values are recognized through probability distributions of vectors, and thus have a value between 0 and 1. FIG. We then log this value and spread the range of numbers over a wider area. Through the above steps, the pattern recognized output values P are compared with the recognition rejection threshold value R (308). The comparison process is shown in Equation 1.

만일 상기 이동통신 단말기에서 요구하는 인식거절 문턱 값(R)보다 상기 출력된 값들이 작으면 재입력 메시지를 사용자에게 송출한다(316). 그러나 상기 이동통신 단말기에서 요구하는 인식 거절 문턱 값(R)보다 상기 출력된 값들이 크면 인식된 결과 값을 가지고 상기 이동통신 단말기의 메모리에 저장된 사용자가 원하는 전화번호 목록을 검색한다(310). 상기 검색된 목록들은 핵심어와 맵핑된 숫자 인덱스와 이름, 대표 전화번호 등의 정보들이며, 액정에 디스플레이 되거나 TTS와 같은 출력장치를 이용하여 사용자에게 전달된다(312). 사용자는 상기 정보들이 액정에 디스플레이 되었을 경우 키패드나 터치패드 등의 입력장치를 통해 통화를 원하는 상대방을 선택한다. 또한, 상기 정보들이 TTS를 통해 출력되면 숫자 인덱스를 발성하여 통화를 원하는 상대방을 선택한다.If the output values are smaller than the recognition rejection threshold value R required by the mobile communication terminal, a re-input message is sent to the user (316). However, when the output values are larger than the recognition rejection threshold value R required by the mobile communication terminal, the user searches for a phone number list desired by the user stored in the memory of the mobile communication terminal with the recognized result value (310). The searched lists are information, such as a numeric index, a name, a representative telephone number, and the like, which are mapped to keywords, and are displayed on a liquid crystal or transmitted to a user using an output device such as TTS (312). When the information is displayed on the liquid crystal, the user selects a counterpart for a call through an input device such as a keypad or a touch pad. In addition, when the information is output through the TTS, a number index is selected to select a counterpart for a call.

예를 들어 한국인의 성씨를 핵심어로 사용한 경우를 고려하면, 사용자가 '김동수'의 이동 통신 단말기로 자동 다이얼링하기를 원한다고 가정한다. 사용자는 자동 다이얼링하기를 원하는'김동수'이라는 이름을 발성한다. 이에, 상기 이동통신 단말기는 '김동수'이라는 이름의 음성을 입력받고, 상기 입력받은 '김동수'이라는 음성에서 핵심어에 해당하는 특징 벡터들을 추출한다. 상기 추출한 특징 벡터들, 즉 '김'에 해당하는 특징 벡터들을 한국인의 성씨로 구성된 데이터베이스에서 '김'이라는 성씨에 대한 특징 벡터들의 패턴과 비교하여 패턴 인식을 한다. 그리고 상기 이동통신 단말기는 상기 인식된 패턴을 통해 미리 설정된 인식 거절 문턱 값과 비교하는 과정을 거친다. 설정된 인식 거절 문턱치보다 상기 출력된 패턴 값이 작다면 사용자 음성에 대한 재입력 요구 메시지를 송출하고 패턴 값이 상기 인식 거절 문턱치보다 크다면 인식된 '김'이라는 성씨를 가진 사용자들의 숫자 인덱스, 이름, 대표 전화번호(핸드폰, 사무실, 집, 기타) 등을 검색한다. 그리고 상기 이동통신 단말기는'김'씨 성을 가진 상대방들에 대한 검색 결과를 출력한다. 상기 검색 결과가 일 예로, '10:김동수:011-2222-1234:핸드폰', '11:김동수:02-222-1234:집', '12:김영희:016-222-3456:핸드폰', '13:김영희:02-222-3456:집', '14:김철수:031-333-5678:집', '15:김철수:031-444-5678:사무실'등의 결과라고 하면, 상기 검색 결과들은 액정을 통해 디스플레이 되거나 TTS를 통해 사용자에게 출력된다. 상기 검색 결과를 확인한 사용자는 김동수의 핸드폰과 통화하기를 원할 때, 숫자음'10'을 발성한다. 상기 이동통신 단말기는 숫자음 음성 인식 과정을 거친 뒤 '김동수'의 이동통신 단말기로 자동 다이얼링 되거나 키패드 등의 입력 장치를 통해 사용자의 입력을 받아 자동다이얼링을 수행한다. 만일 검색된 상대방이 한명이라면 별도의 선택절차 없이 자동 다이얼링 되거나 다이얼링 여부에 대한 확인 후 자동다이얼링 한다. For example, suppose that the user wants to automatically dial the mobile terminal of Kim Dong-soo, considering the case of using the surname of Koreans as a key word. The user speaks 'Kim Dong-soo' who wants to dial automatically. Accordingly, the mobile communication terminal receives a voice named 'Kim Dong-Soo' and extracts feature vectors corresponding to key words from the received 'Kim Dong-Soo' voice. The extracted feature vectors, that is, feature vectors corresponding to 'kim', are compared with a pattern of feature vectors for the surname 'kim' in a database of Korean surnames. The mobile communication terminal is then compared with a predetermined recognition rejection threshold value through the recognized pattern. If the output pattern value is smaller than the set recognition rejection threshold, a re-input request message for the user voice is sent. If the pattern value is greater than the recognition rejection threshold, the numerical index, name, Search for a representative phone number (phone, office, home, etc.). The mobile terminal outputs a search result for the counterparts having the last name 'Kim'. The search result is one example, '10: Kim Dong-soo: 011-2222-1234: mobile phone ', '11: Kim Dong-soo: 02-222-1234: home', '12: Kim Young-hee: 016-222-3456: mobile phone ',' 13: Kim Young-hee: 02-222-3456: house ', '14: Kim Chul-soo: 031-333-5678: house', '15: Kim Chul-soo: 031-444-5678: office ', etc. Displayed via liquid crystal or outputted to the user via TTS. When the user who confirms the search result wants to talk to Kim Dong-soo's mobile phone, he / she sounds a number tone '10'. The mobile communication terminal performs automatic dialing after receiving a user's input through an input device such as a keypad or a dialing device after 'digital voice recognition'. If there is only one searched party, it is automatically dialed or checked after dialing without additional selection procedure.

한편 본 발명의 상세한 설명에서는 구체적인 실시 예에 관해 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능함은 물론이다. 그러므로 본 발명의 범위는 설명된 실시 예에 국한되어 정해져서는 안 되며 후술하는 특허청구의 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다. Meanwhile, in the detailed description of the present invention, specific embodiments have been described, but various modifications are possible without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the scope of the following claims, but also by the equivalents of the claims.

상술한 바와 같은 본 발명은, 음성인식을 사용하는 이동통신 단말기, 특히 핵심어 음성인식을 통한 다이얼링 방법은 종래의 음성인식 다이얼링 방법과 비교할 때 기존 음성인식 데이터베이스를 구성할 때 그 구현 자체가 용이하고 특정 핵심어를 통해서 인식하므로 데이터 용량의 이용효율 면에서 우수한 성능을 가진다. 또한 이를 단말기에 저장할 때에도 단말기 자체의 메모리 사용효율 향상의 성능을 갖게 하며 검색 과정에서도 사용 성능이 향상된다. 상기 이동통신 단말기의 음성인식은 음소 단위 화자 독립 인식기를 사용하는데 음소 단위의 적용에 있어 언어학적인 면에 있어서 음소는 그 종류가 적고 필요한 훈련 데이터를 쉽게 얻을 수 있다. 또한 이를 단말기에 저장할 때에도 단말기 자체의 메모리의 사용효율 향상의 성능을 갖게 하며 검색 과정에서도 사용 효율이 향상된다. 또한 이 결과를 액정에 디스플레이 하거나 TTS방법을 사용하여 사용자에게 전달한다. 그리고 다시 이동 통신 단말기에 저장된 인덱스 숫자음을 통해서 자동으로 다이얼링하는 효과를 갖는다.The present invention as described above, the mobile communication terminal using the voice recognition, in particular, the dialing method through the key word voice recognition is easy to implement itself and when the configuration of the existing voice recognition database compared to the conventional voice recognition dialing method It recognizes through key words, so it has excellent performance in terms of efficiency of data capacity. In addition, even when storing them in the terminal has the performance of improving the memory usage efficiency of the terminal itself, the performance of the search is improved. The speech recognition of the mobile communication terminal uses a phoneme unit speaker independent recognizer. However, in the linguistic aspect of the phoneme unit, the phoneme has a small number and can easily obtain necessary training data. In addition, even when storing it in the terminal has the performance of improving the use efficiency of the memory of the terminal itself, and the use efficiency is also improved in the search process. The results are also displayed on the liquid crystal or transmitted to the user using the TTS method. And again, it has the effect of automatically dialing through the index number stored in the mobile communication terminal.

도 1은 본 발명의 실시 예에 따른 음성인식을 이용한 이동통신 단말기의 구성을 도시한 도면 1 is a diagram illustrating a configuration of a mobile communication terminal using voice recognition according to an embodiment of the present invention.

도 2는 본 발명의 실시 예에 따른 음성인식을 위한 핵심어 데이터베이스 구성과정을 보여주는 도면2 is a diagram illustrating a process of constructing a keyword database for speech recognition according to an embodiment of the present invention.

도 3은 본 본 발명의 실시 예에 따른 자동 다이얼링을 위한 제어 흐름을 보여주는 구성도면 3 is a block diagram showing a control flow for automatic dialing according to an embodiment of the present invention;

Claims

In the automatic dialing device of a mobile communication terminal,

A voice feature pattern of key words that are a preset number of syllables in a word having a specific structure, voice feature patterns of numeric indices representing numbers, and memory for mapping a plurality of counterpart information to the numeric indices;

An audio processor for analyzing a feature pattern of an input voice;

A display unit for displaying the counterpart information and the numeric indices according to a predetermined control;

When the feature pattern of the first input voice coincides with any one of the key word speech feature patterns, the input voice is determined as the key word, and the counterpart information having the determined key word among the counterpart information and mapped to the key word. Control to display the numeric indices on the display unit, and if the feature pattern of the input voice coincides with the voice feature pattern of any one of the numeric indexes, corresponding to the counterpart information mapped to the corresponding numeric index; Auto-dialing device of a mobile communication terminal, characterized in that it comprises a control unit for controlling to automatically dial.

The method of claim 1,

The control unit recognizes a feature pattern of the input voice by comparing the feature pattern of the input voice with the voice feature patterns of key words stored in the memory using a speaker independent recognition method of a phoneme unit. Automatic dialing device.

In the automatic dialing method of a mobile communication terminal,

A process of mapping and storing voice feature patterns of key words which are a preset number of syllables in a word having a specific structure, voice feature patterns of numeric indices representing numbers, and mapping a plurality of counterpart information with the numeric indices;

Determining the input voice as a key word when the feature pattern of the first input voice matches the voice feature pattern of any one of the key words;

Displaying counterpart information having the determined key word among the counterpart information and numeric indices mapped thereto;

If the voice feature pattern of the voice input after displaying the counterpart information and its numeric indices matches the voice feature pattern of any one of the numeric indices, the counterpart information mapped to the corresponding numeric index corresponds to the counterpart information mapped to the corresponding numeric indices. Automatic dialing method of a mobile communication terminal, characterized in that it comprises a step of automatically dialing.

The method of claim 3, wherein

And a voice feature pattern of the input voice and a voice feature pattern of the key words are determined to be matched using a speaker independent recognition method of a phoneme unit.

In the automatic dialing method of a mobile communication terminal,

Receiving a voice, extracting a key word from the received voice;

Detecting counterpart information corresponding to the key word among counterpart information stored in advance and outputting the detected counterpart information;

And automatically dialing corresponding to the detected counterpart information.

The method of claim 5,

The key word is an automatic dialing method of the mobile communication terminal, characterized in that the preset number of syllables.

In the automatic dialing method of a mobile communication terminal,

When the voice is input, the process of extracting a key word of a predetermined structure from the input voice,

Detecting a plurality of counterpart information corresponding to the key word among counterpart information stored in advance, and outputting the detected counterpart information;

Receiving counterpart information selected for automatic dialing among the plurality of counterpart information;

And automatically dialing corresponding to the selected counterpart information.

The method of claim 7, wherein

Outputting a plurality of counterpart information including the key word; And displaying the plurality of counterpart information on a screen or synthesizing and transmitting the plurality of counterpart information.

The method of claim 7, wherein

The step of receiving the selection of the counterpart information to be automatically dialed among the plurality of counterpart information, wherein the one of the plurality of counterpart information is selected in the form of a voice automatically dialing method of the mobile communication terminal.