KR20050120014A

KR20050120014A - Reference and display method of electron dictionary using voice

Info

Publication number: KR20050120014A
Application number: KR1020040045197A
Authority: KR
Inventors: 이미정; 이상철
Original assignee: 이미정
Priority date: 2004-06-18
Filing date: 2004-06-18
Publication date: 2005-12-22

Abstract

본 발명은 철자나 음절 혹은 단어 단위로 입력되는 음성에 의해 로컬 컴퓨터나 또는 유무선 통신이 가능한 네트워크 서버에 구축된 전자사전으로 해당 단어에 대한 각종 정보를 검색하여 읽어와 문자 및/또는 합성음으로 출력(표시)함으로써 로컬 컴퓨터나 혹은 네트워크상의 서버와의 통신이 가능하고 최소한의 키들을 가지는 휴대전화나 PDA 등과 같은 단말기를 이용해서도 전자사전에 구축된 정보를 검색하기 위한 단어를 보다 쉽고도 정확하게 입력할 수 있다.The present invention is an electronic dictionary built on a local computer or a network server capable of wired / wireless communication by spelling, syllable, or word-by-word voice to search and read various types of information about the word and output it as text and / or synthesized sound. Can be used to communicate with a local computer or a server on a network, and it is easier and more accurate to enter a word to retrieve information constructed in an electronic dictionary using a terminal such as a mobile phone or a PDA with minimal keys. Can be.

Description

{Reference and display method of electron dictionary using voice}

본 발명은 음성인식을 통한 전자사전의 단어검색 및 결과 표시방법에 관한 것으로서, 더욱 상세하게는 철자나 음절 혹은 단어 단위로 입력되는 음성에 의해 로컬 컴퓨터나 혹은 유무선 통신이 가능한 네트워크 서버에 구축된 전자사전으로 해당 단어에 대한 각종 정보를 검색하여 읽어와 문자 및/또는 합성음으로 출력(표시)할 수 있도록 하는 음성인식을 통한 전자사전의 단어검색 및 결과 표시방법에 관한 것이다.The present invention relates to a method of word search and result display of an electronic dictionary through voice recognition. More particularly, the present invention relates to an electronic computer built in a local computer or a network server capable of wired and wireless communication by voice inputted in spelling, syllable, or word units. The present invention relates to a method of searching a word and displaying a result of an electronic dictionary through voice recognition for searching and reading a variety of information on a corresponding word using a dictionary and outputting (displaying) the text and / or a compound sound.

통상의 국어사전, 영어사전, 일어사전 등과 같은 서적사전은 사전에 실린 단어들을 철자 순으로 검색하는 방식을 취하고 있다. 이와 같은 서적사전은 부피가 크고 철자 순으로 실린 단어를 사용자가 일일이 찾아야 하는 번거로움이 있다.Conventional book dictionaries such as Korean dictionary, English dictionary, Japanese dictionary, etc. take a method of searching words in the alphabetical order. Such a book dictionary has a hassle that a user must find a word that is bulky and spelled in order.

컴퓨터 및 통신기술의 발전으로 각 단어들에 대한 각종 정보가 퍼스널 컴퓨터나 노트북 및 웹 서버 상에 데이터베이스(database)화되어 있는 전자사전이 등장하였는데, 이 전자사전은 장소나 시간에 구애받지 않고 퍼스널 컴퓨터나 노트북 혹은 인터넷 통신이 가능한 단말기만 있다면 검색하고자 하는 단어를 키 조작으로 입력하는 것만으로도 해당 단어에 대한 각종 정보를 보다 빠르게 검색할 수 있다.Advances in computer and communication technology have led to electronic dictionaries, where information on each word is databased on personal computers, laptops, and web servers. This electronic dictionary can be used anywhere and anytime. If you only have a laptop or a terminal capable of Internet communication, you can search various information about the word more quickly by simply inputting the word you want to search.

하지만, 퍼스널 컴퓨터나 노트북 등과 같은 단말기는 국어나 영어, 일어 등과 같은 각종 언어를 입력할 수 있는 알파벳에 대한 키들이 구비되어 있어 단어의 입력이 수월하나, 휴대와 이동성이 좋은 휴대전화나 PDA 등과 같은 단말기에는 극히 일부분의 키들만이 구비되어 있어, 키 조작으로 단어들을 일일이 입력해야 하는 전자사전에 적합하지 못하다.However, a terminal such as a personal computer or a laptop has keys for alphabets that can input various languages such as Korean, English, Japanese, etc., so that words can be easily input, but mobile and mobile phones, PDAs, etc. The terminal is provided with only a few keys, which is not suitable for electronic dictionaries that require manual entry of words.

대한민국 공개특허공보 제2000-30825호(2000. 06. 05. 공개)의'인터넷상에서의 외국어 단어, 숙어, 문장 음성 출력 검색엔진'은 인터넷상의 전자사전으로부터 단순한 단어정보 뿐만 아니라 발음기호나 단어, 숙어에 대한 정보까지도 검색하고, 그 결과를 음성으로 들을 수 있는 기술이 개시되어 있으나, 이는 검색하고자 하는 단어를 키 조작으로 일일이 입력해야 하는 번거로움으로부터 벗어나지 못하고 있다.The search engine for foreign words, idioms, sentences and sentences on the Internet of Korean Patent Application Publication No. 2000-30825 (published on June 05, 2000) is not only simple word information but also phonetic symbols, words, There is disclosed a technology for searching information on idioms and hearing the results by voice, but this does not deviate from the hassle of having to input a word to search by key manipulation.

또한 수많은 단어를 인식대상단어로 하는 음성 데이터베이스가 구축된 전자사전이 등장하였는데, 이는 많은 단어에 대한 인식과 검색이 가능하다는 이점은 있으나, 유사 발음이 너무 많아 검색하고자 하는 발음(단어)에 대한 인식률이 현저히 떨어진다.In addition, an electronic dictionary with a speech database containing a number of words as a target word appeared, which has the advantage of being able to recognize and search a large number of words, but the recognition rate for the pronunciation (word) to be searched because there are too many similar pronunciations. This drops significantly.

상기한 바와 같은 종래의 한계점을 극복하기 위한 본 발명은 단순한 키 입력방식에만 의존하지 않고도 로컬 컴퓨터나 혹은 유무선 통신이 가능한 네트워크상의 서버와 연결되어 있는 휴대전화나 PDA 등과 같은 단말기를 이용해서도 사용자 음성으로 전자사전으로부터 검색하고자 하는 단어를 시간이나 공간에 구애받지 않고 쉽게 입력할 수 있도록 하는 것을 목적으로 한다.The present invention for overcoming the limitations of the prior art as described above, the user voice using a terminal such as a mobile phone or PDA connected to a local computer or a server on a network capable of wired and wireless communication without resorting to a simple key input method. It aims to be able to easily input words to search from the electronic dictionary regardless of time or space.

또한 본 발명은 전자사전으로부터 검색하고자 하는 단어에 대한 음성을 철자나 음절 혹은 단어별로 분리해서 입력함으로써 해당 음성을 보다 정확하게 인식할 수 있도록 하는 것을 목적으로 한다.In addition, an object of the present invention is to input a voice for a word to be searched from the electronic dictionary separately by spelling, syllable or word, so that the voice can be recognized more accurately.

상기한 바와 같은 목적을 달성하기 위한 본 발명은, 철자, 음절, 단어모드 중 하나의 음성 인식모드와 단어에 대한 정보출력을 위한 문자 및/또는 합성음모드 중 하나 이상의 출력모드를 선택하는 제1단계와, 상기 철자, 음절, 단어상태로 입력되는 음성을 철자, 음절, 단어별로 인식하여 버퍼에 일시 저장하였다가 하나의 단어를 조합하는 제2단계와, 상기 철자, 음절 및 단어에 의해 하나의 단어를 구성하기 위한 음성의 입력이 종료되면 전자사전 데이터베이스로부터 해당 단어에 대한 정보를 검색하는 제3단계 및 상기 검색된 단어에 대한 정보를 상기 제1단계에서 선택된 출력모드에 따라 문자 및/또는 합성음으로 출력하는 제4단계를 포함하는 것을 특징으로 한다.The present invention for achieving the above object, the first step of selecting one or more output modes of the voice recognition mode of the spelling, syllable, word mode and the character and / or synthesized sound mode for the information output for the word And a second step of recognizing the words inputted in the spelling, syllable, and word state for each spelling, syllable, and word, temporarily storing them in a buffer, and combining one word with one word by the spelling, syllable, and word. When the input of the voice for constituting the voice is completed, the third step of searching for information on the corresponding word from the electronic dictionary database and the information on the searched word are output as letters and / or synthesized sounds according to the output mode selected in the first step. It characterized in that it comprises a fourth step.

또한 본 발명에 있어서, 상기 2단계에는 하나 이상의 철자나 음절의 조합으로 된 단어나 하나의 단어에 대한 발음의 입력이 완료된 후 해당 단어의 동음이의어(同音異議語)를 문자나 합성음으로 출력하여 확인하는 단계가 더 구비되는 것이 바람직하다.Also confirmed in the present invention, in the second step, it outputs the homonym (同音異議語) of the word after completion of input of the pronunciation for the words or one word, a combination of one or more spelling or syllable as a letter or synthesized Preferably, the step is further provided.

또한 본 발명에 있어서, 상기 제3단계에서 상기 검색하고자 하는 하나의 단어를 구성하는 철자, 음절, 단어에 대한 음성을 입력하는 중에 아무런 음성신호의 입력 없이 일정시간이 경과되거나 음성입력의 종료를 알리는 특정키가 조작되거나 혹은 음성 입력의 완료를 알리기 위해 임의로 설정된 음성이 입력되면, 검색하고자 하는 단어에 대한 음성의 입력이 종료된 것으로 판단하는 것이 바람직하다.In the present invention, in the third step, while inputting the spelling, syllables, or words for the words constituting the single word to be searched, a predetermined time has elapsed or no end of the voice input is input. When a specific key is manipulated or a voice set arbitrarily to notify the completion of the voice input is input, it is preferable to determine that the voice input for the word to be searched is finished.

또한 본 발명에 있어서, 상기 단어검색을 위한 음성의 입력과 인식, 단어의 검색 및 단어에 대한 정보의 출력이 로컬 컴퓨터나 또는 이와 통신 가능토록 접속된 네트워크 서버에서 이루어지는 것이 바람직하다.In addition, in the present invention, it is preferable that the input and recognition of the voice for the word search, the search of the word and the output of information on the word are made in a local computer or a network server connected to communicate with the word.

이하, 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail.

먼저, 도1 및 도2에 의해 음성인식을 통해 로컬 컴퓨터 등에 탑재된 전자사전 데이터베이스로부터 특정 단어에 대한 정보를 검색하기 위한 구성과 과정을 설명한다.1 and 2, a configuration and process for searching for information on a specific word from an electronic dictionary database mounted on a local computer or the like through voice recognition will be described.

도1을 참조하면, 본 발명은 통상의 음성을 수집하는 마이크와 음성을 출력하는 스피커 및 문자를 디스플레이 하는 모니터 등이 로컬 컴퓨터(local computer)의 제어부(1)에 접속되어 있는데, 이 제어부(1)는 철자나 음절 혹은 단어 단위로 입력되는 음성을 인식하여 전자사전을 검색하고 그 결과를 표시하기 위한 각종 과정을 제어하는 역할을 한다. 또한 제어부(1)는 음성을 처리하는 과정에서 철자나 음절 혹은 단어 단위로 인식되는 음성에 대한 데이터를 일시 저장하기 위한 버퍼(3)를 갖는다.Referring to Fig. 1, in the present invention, a microphone for collecting a normal voice, a speaker for outputting voice, a monitor for displaying text, and the like are connected to a control unit 1 of a local computer. ) Controls the various processes for searching the electronic dictionary and displaying the result by recognizing the voice inputted in the spelling, syllable or word unit. In addition, the controller 1 has a buffer 3 for temporarily storing data about a voice recognized in units of spells, syllables, or words in a process of processing a voice.

상기 제어부(1)에 접속되어 있는 음성 인식부(2)는 마이크를 통해 입력되는 음성이 어떤 철자나 음절 혹은 단어인지를 인식하여 해당 철자나 음절 혹은 단어에 대한 데이터를 제어부(1)로 전송하는 역할을 하는데, 이 음성 인식부(2)는 입력되는 음성을 철자나 음절 혹은 단어모드에 따라 선택적으로 인식할 수 있도록 구분되어 있으며, 각 모드들은 공유 혹은 최적화된 개별의 HMM(Hidden Markov Model) 파라미터, 문법정보, 철자나 음절 혹은 단어모드의 단어사전, 철자나 음절 단어모드의 단어 리스트 등을 가지며 음성인식엔진에 의해 실시간으로 구동된다.The voice recognition unit 2 connected to the control unit 1 recognizes a spell, syllable or word that is input through a microphone, and transmits data on the spelling, syllable or word to the control unit 1. This voice recognition unit 2 is divided to selectively recognize the input voice according to spelling, syllable or word mode, and each mode is a shared or optimized individual HMM (Hidden Markov Model) parameter. , Grammar information, spelling or syllable or word mode word dictionary, spelling or syllable word mode word list, etc. are driven in real time by the voice recognition engine.

상기 제어부(1)에는 음성 인식부(2)를 통해 인식한 각종 인식결과 단어에 대한 정보를 검색할 수 있는 전자사전 데이터베이스(4)와, 상기 전자사전 데이터베이스(4)로부터 검색된 단어에 대한 정보를 모니터를 통해 디스플레이 하거나 혹은 스피커를 통해 출력하기 위해 단어의 정보에 대한 합성음을 형성하기 위한 음성 합성부(5)가 접속되어 있다.The control unit 1 includes an electronic dictionary database 4 capable of searching for information on various recognition result words recognized by the speech recognition unit 2, and information on words retrieved from the electronic dictionary database 4. A speech synthesizer 5 is connected to form a synthesized sound for information of words for display on a monitor or output through a speaker.

도2에 의해 도1과 같은 구성을 가지는 로컬 컴퓨터상에서 음성인식을 통해 전자사전으로부터 단어를 검색하고 그 결과를 표시하는 과정을 설명한다.2, a process of searching for a word from an electronic dictionary and displaying the result of the word through an electronic recognition on a local computer having the configuration as shown in FIG.

먼저, 외부입력에 따라 제어부(11)는 음성인식을 통한 사전 검색 및 결과표시를 위해 음성 인식부(2)에 구비된 철자나 음절 또는 단어모드 중 하나의 음성 인식모드를 선택함과 동시에 특정 단어에 대한 정보출력을 위한 문자 및/또는 합성음모드 중 하나 이상의 출력모드를 선택한다(S11).First, according to an external input, the control unit 11 selects a voice recognition mode of one of spelling, syllable, or word modes provided in the voice recognition unit 2 for dictionary search and result display through voice recognition. Select one or more output modes of the character and / or synthesized sound mode for outputting information on (S11).

상기 선택된 음성 인식모드가 철자모드(S12)인 경우 제어부(1)는 사용자의 발성을 철자단위로 인식하기 위해 음성 인식부(2)를 구동시켜 철자모드, 즉 각 모드 간 공유 혹은 최적화된 개별의 HMM 파라미터, 해당 문법정보, 철자모드 단어사전, 철자모드 단어 리스트 등을 로딩(loading)한다(S13).When the selected speech recognition mode is the spelling mode (S12), the controller 1 drives the speech recognition unit 2 to recognize the user's utterance in the spelling unit. The HMM parameter, the corresponding grammar information, the spelling mode word dictionary, the spelling mode word list, and the like are loaded (S13).

철자모드에서 사용자 발성을 대기(S14)하는 중에 외부로부터 사용자의 발성이 입력(S15)되면, 제어부(1)는 음성 인식부(2)의 음성인식엔진과 철자모드의 데이터베이스를 실시간으로 동적 로딩(loading)시켜 음성인식을 수행한다(S16). 즉, 제어부(1)는 음성 인식부(2)에 의해 인식된 철자가 사용자가 발성한 철자와 동일한지 확인(S17)할 수 있도록 인식된 철자를 문자(모니터)나 혹은 합성음(스피커)으로 출력한다.When the user's voice is input from the outside (S15) while waiting for the user's voice in the spelling mode (S14), the controller 1 dynamically loads the voice recognition engine of the voice recognition unit 2 and the database of the spelling mode in real time. loading) to perform voice recognition (S16). That is, the controller 1 outputs the recognized spelling as a character (monitor) or a synthesized sound (speaker) so that the spelling recognized by the speech recognition unit 2 can be confirmed as the same as the spelling made by the user (S17). do.

또한 음성 인식부(2)에 의해 발성된 철자를 인식하지 못했을 경우, 즉 음성이 너무 작거나 잡음이 과도하게 많이 포함되어 특정 철자로 인식하기 어려울 때 제어부(1)는 해당 철자에 대한 발성을 다시 입력할 것을 사용자에게 요구한다(S18).In addition, when it is difficult to recognize the spelling spoken by the speech recognition unit 2, that is, when the speech is too small or too much noise is difficult to recognize the specific spelling, the control unit 1 reproduces the speech for the spelling. The user is asked to input (S18).

이어서, 하나의 철자에 대한 발음이 입력되고 나면, 제어부(1)는 하나의 단어를 구성하기 위한 모든 철자에 대한 발성의 입력이 종료되었는지를 확인(S19)한 후 하나의 단어를 구성하기 위한 철자에 대한 음성의 입력이 종료된 경우 전자사전 데이터베이스(4)로부터 해당 단어에 대한 정보(뜻)를 검색한다(S21). Subsequently, after the pronunciation of one spell is input, the controller 1 checks whether the input of the utterance for all the spells constituting one word is finished (S19) and then spells one word. When the input of the voice for is terminated, information about the corresponding word is searched from the electronic dictionary database 4 (S21).

하지만, 하나의 단어를 구성하기 위한 모든 철자가 입력되지 않았을 경우 제어부(1)는 지금까지 입력된 철자에 대한 데이터를 버퍼(3)에 일시 저장해 놓고(S20) 다시 마지막 철자가 입력되기까지 단계(S14∼S19)를 반복적으로 수행하여 하나의 단어를 구성하기 위한 모든 철자를 인식한다.However, when all the spells for constituting a single word have not been entered, the controller 1 temporarily stores the data of the spells entered so far in the buffer 3 (S20) and again until the last spell is input (S20). Repeat steps S14 to S19 to recognize all the spellings that make up one word.

이때 제어부(1)는 마지막 철자에 대한 발성이 입력되고 나서 일정시간(3, 4초의 시간)의 대기시간을 가지거나 특정키가 입력되거나 혹은 특정 음성이 입력될 경우 하나의 단어를 구성하기 위한 철자에 대한 발성의 입력이 종료된 것으로 인식할 수 있다. 단, 제어부(1)는 마지막 철자에 대한 발성이 입력되고 3, 4초의 시간이 경과하였는데도 불구하고 지금까지 입력된 철자로는 하나의 단어를 구성하지 못하는 경우에는 하나의 단어가 구성되기까지 새로운 철자의 입력을 요구할 수 있다. At this time, the control unit 1 has a waiting time for a predetermined time (3, 4 seconds) after the last voice is input, or a spell for forming a single word when a specific key is input or a specific voice is input. It can be recognized that the input of the utterance for is finished. However, even if 3 or 4 seconds have elapsed after the utterance of the last spell is input, the controller 1 does not form a single word if the spelled words have not been composed until now. Prompt for spelling.

바람직하게는 하나 이상의 철자의 조합으로 된 단어에 대한 발음의 입력이 완료된 후 해당 단어와 동일한 발음을 가지는 하나 이상의 단어(동음이의어)를 문자나 합성음으로 출력하여 사용자로 하여금 선택할 수 있도록 함으로써 사용자가 원하는 단어를 보다 정확하게 인식하여 검색할 수 있을 것이다. Preferably, after input of a pronunciation of a word having a combination of one or more spells is completed, one or more words (synonyms) having the same pronunciation as the word are output as a letter or a compound sound so that the user can select the desired word. You will be able to recognize and search words more accurately.

이렇게 하나 이상의 철자로 구성된 단어에 대한 정보(뜻)를 전자사전 데이터베이스(4)로부터 검색한 제어부(1)는 단계(S11)에서 출력모드를 문자모드 혹은 문자와 합성음모드로 선택하였는지를 확인한다(S22). 그리고 문자모드만을 선택했을 경우 제어부(1)는 모니터를 통해 해당 단어에 대한 뜻을 문자로 표시(S24)하고, 문자와 합성음모드를 모두 선택했을 경우 제어부(1)는 해당 단어에 대한 뜻을 문자로 표시함과 동시에 음성 합성부(5)를 구동시켜 해당 단어에 대한 뜻을 합성음으로도 출력(S23)한다.The control unit 1, which retrieves information about the word composed of one or more spells from the electronic dictionary database 4, checks whether the output mode is selected as a character mode or a character and compound sound mode in step S11 (S22). ). If only the character mode is selected, the control unit 1 displays the meaning of the word through the monitor in letters (S24), and if both the character and the synthesis sound mode are selected, the control unit 1 displays the meaning of the word in letters. Simultaneously, the voice synthesizing unit 5 is driven to output the meaning of the word as synthesized sound (S23).

또한 단계(S11)에서 선택된 음성 인식모드가 음절모드(S26)인 경우 제어부(1)는 사용자의 발성을 음절단위로 인식하기 위해 음성 인식부(2)를 구동시켜 음절모드, 즉 공유 혹은 최적화된 개별의 HMM 파라미터, 해당 문법정보, 음절모드 단어사전, 음절모드 단어 리스트 등을 로딩(loading)한다(S27).In addition, when the speech recognition mode selected in step S11 is the syllable mode S26, the controller 1 drives the speech recognition unit 2 to recognize the user's utterance in syllable units. Individual HMM parameters, corresponding grammar information, syllable mode word dictionary, syllable mode word list, etc. are loaded (S27).

음절모드에서 사용자 발성을 대기(S28)하는 중에 외부로부터 사용자의 발성이 입력(S29)되면, 제어부(1)는 음성 인식부(2)의 음성인식엔진과 음절모드의 데이터베이스를 실시간으로 동적 로딩(loading)시켜 음성인식을 수행한다(S30). 즉, 제어부(1)는 음성 인식부(2)에 의해 인식된 음절이 사용자가 발성한 음절과 동일한지 확인(S31)할 수 있도록 인식된 음절을 문자(모니터)나 혹은 합성음(스피커)으로 출력한다. When the user's voice is input from the outside (S29) while waiting for the user's voice in syllable mode (S28), the controller 1 dynamically loads the voice recognition engine of the voice recognition unit 2 and the database of the syllable mode in real time. loading) to perform voice recognition (S30). That is, the controller 1 outputs the recognized syllables as text (monitor) or synthesized sound (speaker) so that the syllable recognized by the speech recognition unit 2 is the same as the syllables uttered by the user (S31). do.

또한 음성 인식부(2)에 의해 발성된 음절을 인식하지 못했을 경우, 즉 음성이 너무 작거나 잡음이 과도하게 많이 포함되어 특정 음절로 인식하기 어려울 때 제어부(1)는 해당 음절에 대한 발성을 다시 입력할 것을 사용자에게 요구한다(S32).In addition, when the syllable spoken by the speech recognition unit 2 is not recognized, that is, when the speech is too small or excessively loud, it is difficult to recognize the syllable as a specific syllable, the controller 1 may re-utter the speech for the syllable. The user is asked to input (S32).

이어서, 하나의 음절에 대한 발음이 입력되고 나면, 제어부(1)는 하나의 단어를 구성하기 위한 모든 음절에 대한 발성의 입력이 완료되었는지를 확인(S33)한 후 하나의 단어를 구성하기 위한 음절에 대한 음성의 입력이 종료된 경우 전자사전 데이터베이스(4)로부터 해당 단어에 대한 정보(뜻)를 검색한다(S35).Subsequently, after the pronunciation of one syllable is input, the controller 1 checks whether input of utterances for all syllables constituting one word is completed (S33) and then syllables for composing one word. When the input of the voice for is terminated, information about the corresponding word is searched from the electronic dictionary database 4 (S35).

하지만, 하나의 단어를 구성하기 위한 모든 음절이 입력되지 않았을 경우 제어부(1)는 지금까지 입력된 음절에 대한 데이터를 버퍼(3)에 일시 저장해 놓고(S34) 다시 마지막 음절이 입력되기까지 단계(S28∼S33)를 반복적으로 수행하여 하나의 단어를 구성하기 위한 모든 음절을 인식한다.However, when all the syllables for constituting a single word are not input, the controller 1 temporarily stores data about the syllables input so far in the buffer 3 (S34) and again until the last syllable is input (S34). S28 to S33) are repeatedly performed to recognize all the syllables that constitute one word.

이때 제어부(1)는 마지막 음절에 대한 발성이 입력되고 나서 일정시간(3, 4초 정도의 시간)의 대기시간을 가지거나 특정키가 입력되거나 혹은 특정 음성이 입력될 경우 하나의 단어를 구성하기 위한 음절에 대한 발성의 입력이 종료된 것으로 인식할 수 있다. 단, 제어부(1)는 마지막 음절에 대한 발성이 입력되고 3, 4초의 시간이 경과하였는데도 불구하고 지금까지 입력된 음절로는 하나의 단어를 구성하지 못하는 경우에는 하나의 단어가 구성되기까지 새로운 음절의 입력을 요구할 수 있다. At this time, the control unit 1 has a waiting time for a predetermined time (about 3 or 4 seconds) after the utterance for the last syllable is input, or when a specific key is input or a specific voice is inputted to form a single word. The speech input for the syllable may be recognized as being terminated. However, even if 3 or 4 seconds have elapsed after the utterance of the last syllable is input, the controller 1 does not form a single word with the syllables input until now until a single word is composed. It may require input of syllables.

바람직하게는 하나 이상의 음절의 조합으로 된 단어에 대한 발음의 입력이 완료된 후 해당 단어와 동일한 발음을 가지는 하나 이상의 단어를 문자나 합성음으로 출력하여 사용자로 하여금 선택할 수 있도록 함으로써 사용자가 원하는 단어를 보다 정확하게 인식하여 검색할 수 있을 것이다.Preferably, after a pronunciation input for a word composed of one or more syllable combinations is completed, one or more words having the same pronunciation as the word may be output as letters or a compound sound so that the user can select the word that the user wants more accurately. You will be able to recognize and search.

이렇게 하나 이상의 음절로 구성된 단어에 대한 정보(뜻)를 전자사전 데이터베이스(4)로부터 검색한 제어부(1)는 단계(S11)에서 출력모드를 문자모드 혹은 문자와 합성음모드로 선택하였는지를 확인한다(S36). 그리고 문자모드만을 선택했을 경우 제어부(1)는 모니터를 통해 해당 단어에 대한 뜻을 문자로 표시(S38)하고, 문자와 합성음모드를 모두 선택했을 경우 제어부(1)는 해당 단어에 대한 뜻을 문자로 표시함과 동시에 음성 합성부(5)를 구동시켜 해당 단어에 대한 뜻을 합성음으로도 출력(S37)한다.The controller 1, which retrieves information about the word composed of one or more syllables from the electronic dictionary database 4, checks whether the output mode is selected as the text mode or the text and the synthesized sound mode in step S11 (S36). ). If only the character mode is selected, the controller 1 displays the meaning of the word through the monitor in letters (S38). When both the character and the synthesized sound mode are selected, the controller 1 displays the meaning of the word in letters. Simultaneously, the voice synthesizing unit 5 is driven to output the meaning of the word as synthesized sound (S37).

마지막으로 단계(S11)에서 선택된 음성 인식모드가 단어모드(S40)인 경우 제어부(1)는 사용자의 발성을 그대로 하나의 단어단위로 인식하기 위해 음성 인식부(2)를 구동시켜 단어모드, 즉 공유 혹은 최적화된 개별의 HMM 파라미터, 해당 문법정보, 단어모드 단어사전, 단어모드 단어 리스트 등을 로딩(loading)한다(S41).Finally, when the voice recognition mode selected in step S11 is the word mode S40, the controller 1 drives the voice recognition unit 2 to recognize the user's speech as a single word unit. The shared or optimized individual HMM parameters, corresponding grammar information, word mode word dictionary, word mode word list, etc. are loaded (S41).

단어모드에서 사용자 발성을 대기(S42)하는 중에 외부로부터 사용자의 발성이 입력(S43)되면, 제어부(1)는 음성 인식부(2)의 음성인식엔진과 단어모드의 데이터베이스를 실시간으로 동적 로딩(loading)시켜 음성인식을 수행한다(S44). 즉, 제어부(1)는 음성 인식부(2)에 의해 인식된 단어가 사용자가 발성한 단어와 동일한지 확인(S45)할 수 있도록 인식된 단어를 문자(모니터)나 혹은 합성음(스피커)으로 출력한다. When the user's voice is input from the outside (S43) while waiting for the user's voice in the word mode (S42), the controller 1 dynamically loads the voice recognition engine of the voice recognition unit 2 and the database of the word mode in real time. loading) to perform voice recognition (S44). That is, the controller 1 outputs the recognized word as a character (monitor) or a synthesized sound (speaker) so as to confirm whether the word recognized by the speech recognition unit 2 is the same as the word spoken by the user (S45). do.

또한 음성 인식부(2)에 의해 발성된 단어를 인식하지 못했을 경우, 즉 음성이 너무 작거나 잡음이 과도하게 많이 포함되어 특정 단어로 인식하기 어려울 때 제어부(1)는 해당 단어에 대한 발성을 다시 입력할 것을 사용자에게 요구한다(S46).In addition, when the speech recognized by the speech recognizer 2 is not recognized, that is, when the speech is too small or the noise is excessively large and it is difficult to recognize the specific word, the control unit 1 again reproduces the speech for the word. The user is asked to input (S46).

이어서, 하나의 단어에 대한 발음이 입력되고 나서 3, 4초 정도의 시간이 경과하게 되면, 제어부(1)는 전자사전 데이터베이스(4)로 해당 단어에 대한 정보(뜻)를 검색한다(S47). Subsequently, when a time of about 3 or 4 seconds has elapsed after the pronunciation of one word is input, the controller 1 searches the electronic dictionary database 4 for information on the word (S47). .

이때 제어부(1)는 특정 단어에 대한 발성이 입력되고 나서 일정시간의 대기시간이 경과하거나 특정키가 입력되거나 혹은 특정 음성이 입력될 경우 하나의 단어에 대한 발성의 입력이 종료된 것으로 인식할 수 있다. 단, 제어부(1)는 마지막 단어에 대한 발성이 입력되고 3, 4초의 시간이 경과하였는데도 불구하고 지금까지 입력된 단어로는 하나의 단어를 구성하지 못하는 경우에는 하나의 단어가 구성되기까지 새로운 단어의 입력을 요구할 수 있다. In this case, the controller 1 may recognize that the input of the utterance of one word is ended when a waiting time of a certain time elapses, a specific key is input, or a specific voice is input after the utterance of a specific word is input. have. However, even if 3 or 4 seconds have elapsed after the utterance of the last word is input, the control unit 1 does not form a single word using the words entered so far until a single word is composed. May require input of words.

이렇게 하나의 단어에 대한 정보(뜻)를 전자사전 데이터베이스(4)로부터 검색한 제어부(1)는 단계(S11)에서 출력모드를 문자모드 혹은 문자와 합성음모드로 선택하였는지를 확인한다(S48). 그리고 문자모드만을 선택했을 경우 제어부(1)는 모니터를 통해 해당 단어에 대한 뜻을 문자로 표시(S50)하고, 문자와 합성음모드를 모두 선택했을 경우 제어부(1)는 해당 단어에 대한 뜻을 문자로 표시함과 동시에 음성 합성부(5)를 구동시켜 해당 단어에 대한 뜻을 합성음으로도 출력(S49)한다.The control unit 1, which retrieves information about one word from the electronic dictionary database 4, checks whether the output mode is selected as a character mode or a character and synthesized sound mode in step S11 (S48). When only the character mode is selected, the controller 1 displays the meaning of the word through the monitor as a character (S50). When both the character and the synthesized sound mode are selected, the controller 1 displays the meaning of the word as a character. Simultaneously, the voice synthesizing unit 5 is driven to output the meaning of the word as synthesized sound (S49).

지금까지는 전자사전 데이터베이스가 로컬 컴퓨터에 탑재된 경우를 설명하였으나, 이하에서는 도3 및 도4에 의해 음성인식을 통해 휴대전화나 PDA 등과 같은 단말기와 이와 통신 가능하도록 접속된 네트워크 서버(또는 원격 서버)에 탑재된 전자사전 데이터베이스로부터 특정 단어에 대한 정보를 검색하기 위한 구성과 과정을 설명한다.Up to now, the electronic dictionary database has been described in a case where the local computer is installed. Hereinafter, a network server (or a remote server) connected to communicate with a terminal such as a mobile phone or a PDA through voice recognition will be described with reference to FIGS. 3 and 4. The configuration and process for retrieving information about a specific word from an electronic dictionary database mounted in the system will be described.

도3을 참조하면, 본 발명은 휴대전화나 일반전화 혹은 인터넷 통신이 가능한 로컬 컴퓨터 등과 같은 단말기가 무선망이나 PSTN 혹은 인터넷 망을 통해 네트워크 서버(10) 상에 있는 제어부(11)에 접속되어 있는데, 이 제어부(11)는 철자나 음절 혹은 단어 단위로 입력되는 음성을 인식하여 전자사전을 검색하고 그 결과를 표시하기 위한 각종 과정을 제어하는 역할을 한다. 또한 제어부(11)는 음성을 처리하는 과정에서 철자나 음절 혹은 단어 단위로 인식되는 음성에 대한 데이터를 일시 저장하기 위한 버퍼(13)를 갖는다.Referring to FIG. 3, in the present invention, a terminal such as a mobile phone, a regular phone, or a local computer capable of internet communication is connected to a control unit 11 on a network server 10 through a wireless network, a PSTN, or an Internet network. The controller 11 controls a variety of processes for retrieving an electronic dictionary by recognizing a voice input by spelling, syllable or word unit, and displaying the result. In addition, the control unit 11 has a buffer 13 for temporarily storing data about a voice recognized in units of spells, syllables, or words in the process of processing a voice.

특히, 로컬 컴퓨터 등과는 달리 통신 가능한 단말기를 이용하여 누구나가 접속 가능하므로 별도의 인증부(16)를 두어 회원으로 가입한 회원에게만 전자사전을 검색하고 그 결과를 확인할 수 있도록 하는 것이 바람직할 것이다. 물론, 각 회원들의 ID나 패스워드 등과 같은 회원정보를 저장하기 위한 회원정보 데이터베이스를 별도로 구비해야 함은 당연하다. 물론, 무선망에 연결된 휴대전화나 PSTN에 접속된 일반전화는 ID나 패스워드를 이용하지 않고 CID 인증을 통해 회원임을 확인할 수도 있다. In particular, unlike a local computer or the like, anyone can connect using a communication terminal, so it would be desirable to have a separate authentication unit 16 so that only the members registered as members can search the electronic dictionary and check the results. Of course, it is natural to have a separate member information database for storing member information such as ID and password of each member. Of course, a mobile phone connected to a wireless network or a general phone connected to a PSTN may be identified as a member through CID authentication without using an ID or password.

상기 네트워크 서버(10) 상에 위치한 제어부(11)에 접속되어 있는 음성 인식부(12)는 휴대전화나 PDA, 일반전화, 로컬 컴퓨터 등(이하, '단말기'라 한다.)을 통해 입력되는 음성이 어떤 철자나 음절 혹은 단어인지를 인식하여 해당 철자나 음절 혹은 단어에 대한 데이터를 제어부(11)로 전송하는 역할을 하는데, 이 음성 인식부(12)는 입력되는 음성을 철자나 음절 혹은 단어모드에 따라 선택적으로 분석 인식할 수 있도록 구분되어 있으며, 각 모드들은 공유 혹은 최적화된 개별의 HMM 파라미터, 문법정보, 철자나 음절 혹은 단어모드 단어사전, 철자나 음절 단어모드 단어 리스트 등을 가지며 음성인식엔진에 의해 실시간으로 구동된다.The voice recognition unit 12 connected to the control unit 11 located on the network server 10 is a voice input through a mobile phone, a PDA, a general phone, a local computer, etc. (hereinafter, referred to as a "terminal"). Recognizes the spelling, syllable, or word and transmits data on the spelling, syllable, or word to the controller 11, and the voice recognition unit 12 transmits the input voice to the spelling, syllable, or word mode. Each mode has its own HMM parameter, grammar information, spelling or syllable or word mode word dictionary, spelling or syllable word mode word list, etc. Is driven in real time.

여기서, 네트워크 서버(10)상에 있는 음성 인식부(12)에 의해 철자나 음절 혹은 단어 단위로 음성을 인식하는 방법으로는 첫째, 사용자가 발성을 했을 때 독립형태의 음성파일이 아닌 실시간으로 음성 한 부분 한 부분을 스트리밍(stream)으로 음성 인식부(12)로 전송하여 음성인식엔진에 의해 인식하는 방법을 들 수 있다. 두 번째 방법은 사용자가 한 음절이나 한 단어를 발성했을 때 한 음절을 검출하여 WAV, PCM 등의 부호한 독립형태의 음성파일로 음성 인식부(12)에 전송하여 음성인식엔진에 의해 인식하는 방법이다. 마지막 방법은 사용자가 한 단어 또는 한 음절을 발성했을 때 한 단어 또는 한 음절의 음성인식을 위한 특징만을 추출하여 파일형태로 음성 인식부(12)로 전송하여 음성인식엔진에 의해 인식하는 방법이다. Here, the voice recognition unit 12 on the network server 10 to recognize the speech in the form of spelling, syllables, or words, firstly, when the user speaks, the voice is generated in real time instead of the independent voice file. A part may be transmitted to the voice recognition unit 12 as a stream and recognized by the voice recognition engine. The second method is to detect one syllable when the user speaks one syllable or one word, and transmits it to the speech recognition unit 12 as a signed independent voice file such as WAV or PCM to be recognized by the speech recognition engine. to be. The last method is a method of extracting only a feature for speech recognition of one word or syllable when the user utters one word or one syllable and transmitting the file to the speech recognition unit 12 in a file form for recognition by the speech recognition engine.

특히, 음성특징 전송방식은 음성파일 전송방식에 비해 파일크기가 작다는 특징을 가지고, 음성 스트리밍(stream)방식은 네트워크(network) 상황이 불량하거나 유무선망이 불량한 경우 위 두 가지 방식에 비해 전송 중 일부 패킷의 손실로 인해 음성 인식성능이 떨어질 수 있다.In particular, the voice feature transmission method has a smaller file size than the voice file transmission method, and the voice streaming method is transmitting in the case of poor network conditions or poor wired / wireless networks. Loss of some packets can reduce speech recognition performance.

상기 네트워크 서버(10)상의 제어부(11)에는 음성 인식부(12)를 통해 인식한 각종 단어에 대한 정보를 검색할 수 있는 전자사전 데이터베이스(14)와, 상기 음성 인식부(12)에 의해 인식된 철자나 음절 혹은 단어 단위의 음성이나 혹은 제어부(11)에 의해 전자사전 데이터베이스(14)로부터 검색된 단어에 대한 정보를 단말기를 통해 문자로 표시하거나 혹은 합성음으로 출력하기 위해 단어의 정보에 대한 합성음을 형성하는 음성 합성부(15)가 접속되어 있다.The control unit 11 on the network server 10 includes an electronic dictionary database 14 capable of searching for information on various words recognized through the voice recognition unit 12, and the voice recognition unit 12. The synthesized sound of the information of the word to display the information about the word searched from the electronic dictionary database 14 by the spelling, syllable or word unit, or the electronic dictionary database 14 in a letter form or as a synthesized sound. The speech synthesis unit 15 to be formed is connected.

도4에 의해 도3과 같은 구성을 가지는 휴대전화나 PDA, 일반전화, 로컬 컴퓨터 등과 같은 단말기와 이에 연결된 네트워크 서버(또는 원격 서버)상에서 음성인식을 통해 전자사전으로부터 단어를 검색하고 그 결과를 표시하는 과정을 설명한다.Fig. 4 retrieves words from the electronic dictionary through voice recognition on a terminal such as a mobile phone, PDA, general telephone, local computer, etc. and a network server (or remote server) connected thereto with the configuration as shown in Fig. 3 and displays the result. Explain the process.

먼저, 네트워크 서버(10) 상의 제어부(11)는 휴대전화나 PDA, 일반전화 혹은 로컬 컴퓨터 등과 같은 단말기를 이용하여 네트워크 서버(10)에 접속(S57)한 사용자에게 ID와 패스워드 등과 같은 사용자 정보를 입력(58)하도록 요구한 후 입력된 회원정보와 회원정보 데이터베이스에 저장된 회원정보를 검색(S59)하여 접속한 사용자가 정당한 회원인지를 확인한다(S60). 물론, 사용자가 무선망에 연결된 휴대전화나 PSTN에 접속된 일반전화인 경우에는 ID나 패스워드를 이용하지 않고 CID 인증을 통해 사용자가 회원임을 확인할 수도 있다.First, the control unit 11 on the network server 10 provides user information such as an ID and password to a user who accesses the network server 10 by using a terminal such as a mobile phone, a PDA, a general phone or a local computer (S57). After requesting the input (58), it searches for the inputted member information and the member information stored in the member information database (S59) to check whether the connected user is a legitimate member (S60). Of course, when the user is a mobile phone connected to a wireless network or a general phone connected to a PSTN, the user may be identified as a member through CID authentication without using an ID or password.

확인결과, 정당한 회원(사용자 CID를 확인)인 경우 외부입력에 따라 네트워크 서버(10) 상의 제어부(11)는 음성인식을 통한 사전 검색 및 결과표시를 위해 음성 인식부(12)에 구비된 철자나 음절 또는 단어모드 중 하나의 음성 인식모드를 선택함과 동시에 특정 단어에 대한 정보출력을 위한 문자 및/또는 합성음모드 중 하나 이상의 출력모드를 선택한다(S61).As a result of the verification, in the case of a legitimate member (confirming the user's CID), the control unit 11 on the network server 10 according to an external input may spell or be provided in the speech recognition unit 12 for preliminary search and result display through voice recognition. In addition to selecting one of the syllable and word modes, a voice recognition mode is selected, and at least one output mode among the text and / or synthesized sound modes for outputting information on a specific word is selected (S61).

상기 선택된 음성 인식모드가 철자모드(S62)인 경우 네트워크 서버(10) 상의 제어부(11)는 사용자의 발성을 철자단위로 인식하기 위해 음성 인식부(12)를 구동시켜 철자모드, 즉 공유 혹은 최적화된 개별의 HMM 파라미터, 해당 문법정보, 철자모드의 단어사전, 철자모드의 단어 리스트 등을 로딩(loading)한다(S63).When the selected voice recognition mode is the spelling mode (S62), the control unit 11 on the network server 10 drives the voice recognition unit 12 to recognize the user's utterance in spelling units, that is, spelling mode, that is, sharing or optimization. The individual HMM parameters, the corresponding grammar information, the word dictionary in the spelling mode, and the word list in the spelling mode are loaded (S63).

철자모드에서 사용자 발성을 대기(S64)하는 중에 각종 통신망을 통해 단말기로부터 사용자의 발성이 입력(S65)되면, 네트워크 서버(10) 상의 제어부(11)는 음성 인식부(12)의 음성인식엔진과 철자모드의 데이터베이스를 실시간으로 동적 로딩(loading)시켜 음성인식을 수행한다(S66). 즉, 네트워크 서버(10) 상의 제어부(11)는 음성 인식부(12)에 의해 인식된 철자가 사용자가 발성한 철자와 동일한지 확인(S67)할 수 있도록 인식된 철자를 문자나 혹은 합성음으로 출력한다.When the user's voice is input from the terminal through various communication networks (S65) while waiting for the user's voice in the spelling mode (S64), the control unit 11 on the network server 10 is connected to the voice recognition engine of the voice recognition unit 12. Speech recognition is performed by dynamically loading the database in the spelling mode in real time (S66). That is, the control unit 11 on the network server 10 outputs the recognized spelling as a character or a synthesized sound so that the spelling recognized by the speech recognition unit 12 can be confirmed as the spelling made by the user (S67). do.

또한 음성 인식부(12)에 의해 발성된 철자를 인식하지 못했을 경우, 즉 음성이 너무 작거나 잡음이 과도하게 많이 포함되어 특정 철자로 인식하기 어려울 때 네트워크 서버(10) 상의 제어부(11)는 해당 철자에 대한 발성을 다시 입력할 것을 사용자에게 요구한다(S68).In addition, when it is difficult to recognize the spelling spoken by the speech recognition unit 12, that is, when the speech is too small or too much noise is difficult to recognize the specific spelling, the control unit 11 on the network server 10 may respond to the corresponding spelling. The user is asked to input the utterance for spelling again (S68).

이어서, 하나의 철자에 대한 발음이 입력되고 나서 3, 4초 정도의 시간이 경과하게 되면, 네트워크 서버(10) 상의 제어부(11)는 하나의 단어를 구성하기 위한 모든 철자에 대한 발성의 입력이 종료되었는지를 확인(S69)한 후 하나의 단어를 구성하기 위한 철자에 대한 음성의 입력이 종료된 경우 전자사전 데이터베이스(14)로 해당 단어에 대한 정보(뜻)를 검색한다(S71). Subsequently, when a time of about 3 or 4 seconds has elapsed after the pronunciation of one spell is input, the control unit 11 on the network server 10 inputs the utterance of all the spells for forming one word. When the input of the voice for spelling for constituting a single word is terminated after checking whether the word is terminated (S69), the electronic dictionary database 14 searches for information on the word (S71).

하지만, 하나의 단어를 구성하기 위한 모든 철자가 입력되지 않았을 경우 네트워크 서버(10) 상의 제어부(11)는 지금까지 입력된 철자에 대한 데이터를 버퍼(13)에 일시 저장해 놓고(S70) 다시 마지막 철자가 입력되기까지 단계(S64∼S69)를 반복적으로 수행하여 하나의 단어를 구성하기 위한 모든 철자를 인식한다.However, when all the letters for constituting a single word are not entered, the control unit 11 on the network server 10 temporarily stores the data for the spellings input so far in the buffer 13 (S70) and again the last letter. Steps S64 to S69 are repeatedly executed until inputting is performed to recognize all the spellings that constitute one word.

이때 네트워크 서버(10) 상의 제어부(11)는 마지막 철자에 대한 발성이 입력되고 나서 일정시간의 대기시간을 가지거나 특정키가 입력되거나 혹은 특정 음성이 입력될 경우 하나의 단어를 구성하기 위한 철자에 대한 발성의 입력이 종료된 것으로 인식할 수 있다. 단, 제어부(1)는 마지막 철자에 대한 발성이 입력되고 3, 4초의 시간이 경과하였는데도 불구하고 지금까지 입력된 철자로는 하나의 단어를 구성하지 못하는 경우에는 하나의 단어가 구성되기까지 새로운 철자의 입력을 요구할 수 있다. At this time, the control unit 11 on the network server 10 has a wait time of a certain time after the last utterance is input, or a specific key is input or a specific voice is inputted to form a word. It can be recognized that the input of utterances for the voice is completed. However, even if 3 or 4 seconds have elapsed after the utterance of the last spell is input, the controller 1 does not form a single word if the spelled words have not been composed until now. Prompt for spelling.

바람직하게는 하나 이상의 철자의 조합으로 된 단어에 대한 발음의 입력이 완료된 후 해당 단어와 동일한 발음을 가지는 하나 이상의 단어를 문자나 합성음으로 출력하여 사용자로 하여금 선택할 수 있도록 함으로써 사용자가 원하는 단어를 보다 정확하게 인식하여 검색할 수 있을 것이다.Preferably, after the pronunciation input for a word composed of one or more combinations of letters is completed, one or more words having the same pronunciation as the word may be output as letters or a compound sound so that the user can select the desired word more accurately. You will be able to recognize and search.

이렇게 하나 이상의 철자로 구성된 단어에 대한 정보(뜻)를 전자사전 데이터베이스(14)로부터 검색한 네트워크 서버(10) 상의 제어부(11)는 단계(S61)에서 출력모드를 문자모드 혹은 문자와 합성음모드로 선택하였는지를 확인한다(S72). 그리고 문자모드만을 선택했을 경우 네트워크 서버(10) 상의 제어부(11)는 모니터를 통해 해당 단어에 대한 뜻을 문자로 표시(S74)하고, 문자와 합성음모드를 모두 선택했을 경우 네트워크 서버(10) 상의 제어부(11)는 해당 단어에 대한 뜻을 문자로 표시함과 동시에 음성 합성부(15)를 구동시켜 해당 단어에 대한 뜻을 합성음으로도 출력(S73)한다.The control unit 11 on the network server 10, which retrieves information about the word composed of one or more spells from the electronic dictionary database 14, changes the output mode to the character mode or the character and synthesized sound mode in step S61. Check whether it is selected (S72). If only the character mode is selected, the control unit 11 on the network server 10 displays the meaning of the word as a character through a monitor (S74). When both the character and the synthesized sound mode are selected, the controller 11 on the network server 10 is displayed. The controller 11 displays the meaning of the word as a letter and simultaneously drives the speech synthesis unit 15 to output the meaning of the word as a synthesized sound (S73).

또한 단계(S61)에서 선택된 음성 인식모드가 음절모드(S76)인 경우 네트워크 서버(10) 상의 제어부(11)는 사용자의 발성을 음절단위로 인식하기 위해 음성 인식부(12)를 구동시켜 음절모드, 즉 공유 혹은 최적화된 개별의 HMM 파라미터, 해당 문법정보, 음절모드의 단어사전, 음절모드의 단어 리스트 등을 로딩(loading)한다(S77).In addition, when the speech recognition mode selected in step S61 is the syllable mode S76, the controller 11 on the network server 10 drives the speech recognition unit 12 to recognize the user's speech in syllable units. That is, the shared or optimized individual HMM parameters, the corresponding grammar information, the word dictionary in the syllable mode, the word list in the syllable mode, and the like are loaded (S77).

음절모드에서 사용자 발성을 대기(S78)하는 중에 각종 통신망을 통해 단말기로부터 사용자의 발성이 입력(S79)되면, 네트워크 서버(10) 상의 제어부(11)는 음성 인식부(12)의 음성인식엔진과 음절모드의 데이터베이스를 실시간으로 동적 로딩(loading)시켜 음성인식을 수행한다(S80). 즉, 네트워크 서버(10) 상의 제어부(11)는 음성 인식부(12)에 의해 인식된 음절이 사용자가 발성한 음절과 동일한지 확인(S81)할 수 있도록 인식된 음절을 문자나 혹은 합성음으로 출력한다. When the user's voice is input from the terminal through various communication networks (S79) while waiting for the user's voice in the syllable mode (S78), the control unit 11 on the network server 10 is connected to the voice recognition engine of the voice recognition unit 12. Speech recognition is performed by dynamically loading the syllable mode database in real time (S80). That is, the control unit 11 on the network server 10 outputs the recognized syllables as text or synthesized sound so that the syllables recognized by the speech recognition unit 12 are the same as the syllables uttered by the user (S81). do.

또한 음성 인식부(12)에 의해 발성된 음절을 인식하지 못했을 경우, 즉 음성이 너무 작거나 잡음이 과도하게 많이 포함되어 특정 음절로 인식하기 어려울 때 네트워크 서버(10) 상의 제어부(11)는 해당 음절에 대한 발성을 다시 입력할 것을 사용자에게 요구한다(S82).In addition, when it is difficult to recognize the syllables spoken by the voice recognition unit 12, that is, when the voice is too small or excessively contains noise, it is difficult to recognize the specific syllables. The user is asked to input the utterance for the syllable again (S82).

이어서, 하나의 음절에 대한 발음이 입력되고 나서 3, 4초 정도의 시간이 경과하게 되면, 네트워크 서버(10) 상의 제어부(11)는 하나의 단어를 구성하기 위한 모든 음절에 대한 발성의 입력이 완료되었는지를 확인(S83)한 후 하나의 단어를 구성하기 위한 음절에 대한 음성의 입력이 종료된 경우 전자사전 데이터베이스(14)로 해당 단어에 대한 정보(뜻)를 검색한다(S85). Subsequently, when a time of about 3 or 4 seconds has elapsed after the pronunciation of one syllable is input, the control unit 11 on the network server 10 inputs the utterance input for all syllables to compose one word. When the input of the voice for the syllable for constituting a single word is finished after checking whether the word is completed (S83), the electronic dictionary database 14 searches for information on the word (S85).

하지만, 하나의 단어를 구성하기 위한 모든 음절이 입력되지 않았을 경우 네트워크 서버(10) 상의 제어부(11)는 지금까지 입력된 음절에 대한 데이터를 버퍼(13)에 일시 저장해 놓고(S84) 다시 마지막 음절이 입력되기까지 단계(S78∼S83)를 반복적으로 수행하여 하나의 단어를 구성하기 위한 모든 음절을 인식한다.However, when all syllables for composing one word are not input, the control unit 11 on the network server 10 temporarily stores data about the syllables input so far in the buffer 13 (S84) and again the last syllable. Steps S78 to S83 are repeatedly performed until this input is performed to recognize all syllables for forming one word.

이때 네트워크 서버(10) 상의 제어부(11)는 마지막 음절에 대한 발성이 입력되고 나서 일정시간의 대기시간을 가지거나 특정키가 입력되거나 혹은 특정 음성이 입력될 경우 하나의 단어를 구성하기 위한 음절에 대한 발성의 입력이 종료된 것으로 인식할 수 있다. 단, 제어부(1)는 마지막 음절에 대한 발성이 입력되고 3, 4초의 시간이 경과하였는데도 불구하고 지금까지 입력된 음절로는 하나의 단어를 구성하지 못하는 경우에는 하나의 단어가 구성되기까지 새로운 음절의 입력을 요구할 수 있다. At this time, the control unit 11 on the network server 10 has a waiting time of a certain time after the voice is input for the last syllable, a specific key is input, or a specific voice is input to the syllable for forming a single word. It can be recognized that the input of utterances for the voice is completed. However, even if 3 or 4 seconds have elapsed after the utterance of the last syllable is input, the controller 1 does not form a single word with the syllables input until now until a single word is composed. It may require input of syllables.

바람직하게는 하나 이상의 음절의 조합으로 된 단어에 대한 발음의 입력이 완료된 후 해당 단어와 동일한 발음을 가지는 하나 이상의 단어를 문자나 합성음으로 출력하여 사용자로 하여금 선택할 수 있도록 함으로써 사용자가 원하는 단어를 보다 정확하게 인식하여 검색할 수 있을 것이다. Preferably, after a pronunciation input for a word composed of one or more syllable combinations is completed, one or more words having the same pronunciation as the word may be output as letters or a compound sound so that the user can select the word that the user wants more accurately. You will be able to recognize and search.

이렇게 하나 이상의 음절로 구성된 단어에 대한 정보(뜻)를 전자사전 데이터베이스(14)로부터 검색한 네트워크 서버(10) 상의 제어부(11)는 단계(S61)에서 출력모드를 문자모드 혹은 문자와 합성음모드로 선택하였는지를 확인한다(S86). 그리고 문자모드만을 선택했을 경우 네트워크 서버(10) 상의 제어부(11)는 해당 단어에 대한 뜻을 문자로 표시(S88)하고, 문자와 합성음모드를 모두 선택했을 경우 네트워크 서버(10) 상의 제어부(11)는 해당 단어에 대한 뜻을 문자로 표시함과 동시에 음성 합성부(15)를 구동시켜 해당 단어에 대한 뜻을 합성음으로도 출력(S87)한다. The control unit 11 on the network server 10, which retrieves information about the word composed of one or more syllables from the electronic dictionary database 14, changes the output mode to the character mode or the character and synthesized sound mode in step S61. Check whether it is selected (S86). If only the character mode is selected, the control unit 11 on the network server 10 displays the meaning of the word in letters (S88). When both the character and the synthesized sound mode are selected, the control unit 11 on the network server 10 is selected. ) Displays the meaning of the word as a letter and simultaneously drives the speech synthesis unit 15 to output the meaning of the word as a synthesized sound (S87).

마지막으로 단계(S61)에서 선택된 음성 인식모드가 단어모드(S90)인 경우 네트워크 서버(10) 상의 제어부(11)는 사용자의 발성을 그대로 하나의 단어단위로 인식하기 위해 음성 인식부(12)를 구동시켜 단어모드, 즉 공유 혹은 최적화된 개별의 HMM 파라미터, 해당 문법정보, 단어모드의 단어사전, 단어모드의 단어 리스트 등을 로딩(loading)한다(S91).Finally, when the speech recognition mode selected in step S61 is the word mode S90, the controller 11 on the network server 10 may recognize the speech recognition unit 12 in order to recognize the user's speech as one word unit. In operation S91, a word mode, that is, an individual HMM parameter shared or optimized, corresponding grammar information, a word dictionary of a word mode, a word list of a word mode, and the like are loaded.

단어모드에서 사용자 발성을 대기(S92)하는 중에 각종 통신망을 통해 단말기로부터 사용자의 발성이 입력(S93)되면, 네트워크 서버(10) 상의 제어부(11)는 음성 인식부(12)의 음성인식엔진과 단어모드의 데이터베이스를 실시간으로 동적 로딩(loading)시켜 음성인식을 수행한다(S94). 즉, 네트워크 서버(10) 상의 제어부(11)는 음성 인식부(12)에 의해 인식된 단어가 사용자가 발성한 단어와 동일한지 확인(S95)할 수 있도록 인식된 음절을 문자나 혹은 합성음으로 출력한다.When the user's voice is input from the terminal through various communication networks (S93) while waiting for the user's voice in the word mode (S92), the control unit 11 on the network server 10 is connected to the voice recognition engine of the voice recognition unit 12. Speech recognition is performed by dynamically loading the database of the word mode in real time (S94). That is, the control unit 11 on the network server 10 outputs the recognized syllables as characters or synthesized sounds so that the words recognized by the voice recognition unit 12 are identical to the words spoken by the user (S95). do.

또한, 음성 인식부(12)에 의해 발성된 단어를 인식하지 못했을 경우, 즉 음성이 너무 작거나 잡음이 과도하게 많이 포함되어 특정 단어로 인식하기 어려울 때 네트워크 서버(10) 상의 제어부(11)는 해당 단어에 대한 발성을 다시 입력할 것을 사용자에게 요구한다(S96).In addition, when it is difficult to recognize a word spoken by the voice recognition unit 12, that is, when the voice is too small or excessively much noise, it is difficult to recognize a specific word, the control unit 11 on the network server 10 The user is asked to input the utterance for the word again (S96).

이어서, 하나의 단어에 대한 발음이 입력되고 3, 4초 정도의 시간이 경과하게 되면, 네트워크 서버(10) 상의 제어부(11)는 전자사전 데이터베이스(14)로 해당 단어에 대한 정보(뜻)를 검색한다(S97). Subsequently, when the pronunciation of one word is input and the time of about 3 or 4 seconds has elapsed, the control unit 11 on the network server 10 sends information about the word to the electronic dictionary database 14. Search (S97).

이때 네트워크 서버(10) 상의 제어부(11)는 특정 단어에 대한 발성이 입력되고 나서 일정시간의 대기시간이 경과하거나 특정키가 입력되거나 혹은 특정 음성이 입력될 경우 하나의 단어에 대한 발성의 입력이 종료된 것으로 인식할 수 있다. 단, 제어부(1)는 마지막 단어에 대한 발성이 입력되고 3, 4초의 시간이 경과하였는데도 불구하고 지금까지 입력된 단어로는 하나의 단어를 구성하지 못하는 경우에는 하나의 단어가 구성되기까지 새로운 단어의 입력을 요구할 수 있다.At this time, the control unit 11 on the network server 10 inputs a utterance for a single word when a waiting time of a certain time elapses, a specific key is input, or a specific voice is input after a utterance is input for a specific word. It can be recognized as finished. However, even if 3 or 4 seconds have elapsed after the utterance of the last word is input, the control unit 1 does not form a single word using the words entered so far until a single word is composed. May require input of words.

이렇게 하나의 단어에 대한 정보(뜻)를 전자사전 데이터베이스(14)로부터 검색한 네트워크 서버(10) 상의 제어부(11)는 단계(S61)에서 출력모드를 문자모드 혹은 문자와 합성음모드로 선택하였는지를 확인한다(S98). 그리고 문자모드만을 선택했을 경우 네트워크 서버(10) 상의 제어부(11)는 해당 단어에 대한 뜻을 문자로 표시(S90)하고, 문자와 합성음모드를 모두 선택했을 경우 네트워크 서버(10) 상의 제어부(11)는 해당 단어에 대한 뜻을 문자로 표시함과 동시에 음성 합성부(15)를 구동시켜 해당 단어에 대한 뜻을 합성음으로도 출력(S99)한다.The control unit 11 on the network server 10 that retrieves information about one word from the electronic dictionary database 14 checks whether the output mode is selected as a text mode or a text and synthesized sound mode in step S61. (S98). If only the character mode is selected, the control unit 11 on the network server 10 displays the meaning of the word in letters (S90). When both the character and the synthesized sound mode are selected, the control unit 11 on the network server 10 is selected. ) Displays the meaning of the word as a letter and simultaneously drives the speech synthesis unit 15 to output the meaning of the word as a synthesized sound (S99).

따라서 본 발명에 의한 음성인식을 통한 전자사전의 단어검색 및 결과 표시방법에 의하면, Therefore, according to the word search and result display method of the electronic dictionary through the voice recognition according to the present invention,

1) 본 발명은 단순한 키 입력방식에만 의존하지 않고도 로컬 컴퓨터나 유무선 통신이 가능한 네트워크 서버(원격서버)와 연결되어 있는 휴대전화나 PDA 등과 같은 단말기를 이용해서도 사용자 음성으로 전자사전으로부터 검색하고자 하는 단어를 시간이나 공간에 구애받지 않고 쉽게 입력할 수 있는 효과가 있다.1) The present invention is to search from the electronic dictionary by user voice even using a terminal such as a mobile phone or a PDA connected to a local computer or a network server (remote server) capable of wired or wireless communication without relying on a simple key input method. There is an effect that can easily enter a word regardless of time or space.

2) 또한 본 발명은 전자사전으로부터 검색하고자 하는 단어에 대한 음성을 철자나 음절 혹은 단어별로 분리해서 입력함으로써 화자 독립 음성인식률의 향상을 가져오는 효과가 있다. 2) In another aspect, the present invention has an effect of improving the speaker independent speech recognition rate by separately inputting a voice for a word to be searched from an electronic dictionary by spelling, syllable or word.

이상에서 본 발명은 기재된 구체적인 실시예에 대해서만 상세히 설명하였지만 본 발명의 기술사상 범위 내에서 다양한 변형 및 수정이 가능함은 당업자에게 있어서 명백한 것이며, 이러한 변형 및 수정이 첨부된 특허청구범위에 속함은 당연한 것이다. Although the present invention has been described in detail only with respect to the specific embodiments described, it will be apparent to those skilled in the art that various changes and modifications are possible within the technical scope of the present invention, and such modifications and modifications belong to the appended claims. .

도1은 본 발명에 의해 음성인식을 통해 전자사전에서 단어를 검색하고 그 결과를 표시할 수 있는 기능을 구비한 로컬 컴퓨터 주요부분의 구성도이다.1 is a block diagram of a main part of a local computer having a function of searching for a word in an electronic dictionary and displaying the result through voice recognition according to the present invention.

도2는 도1과 같은 구성을 가지는 로컬 컴퓨터상에서 음성인식을 통해 전자사전으로부터 단어를 검색하고 그 결과를 표시하는 방법을 나타낸 흐름도이다.FIG. 2 is a flowchart illustrating a method of searching a word from an electronic dictionary and displaying the result of the word through a voice recognition on a local computer having the configuration as shown in FIG.

도3은 본 발명에 의해 음성인식을 통해 전자사전에서 단어를 검색하고 그 결과를 표시할 수 있는 기능을 구비한 휴대전화나 PDA 등과 같은 단말기와 이에 연결된 네트워크 서버 주요부분의 구성도이다.3 is a block diagram of a terminal such as a mobile phone or a PDA having a function of searching a word in an electronic dictionary and displaying the result through voice recognition according to the present invention, and a main part of a network server connected thereto.

도4는 도3과 같은 구성을 가지는 휴대전화나 PDA 등과 같은 단말기와 이에 연결된 네트워크 서버 상에서 음성인식을 통해 전자사전으로부터 단어를 검색하고 그 결과를 표시하는 방법을 나타낸 흐름도이다.FIG. 4 is a flowchart illustrating a method of searching for a word from an electronic dictionary and displaying the result through voice recognition on a terminal such as a mobile phone or PDA having a configuration as shown in FIG. 3 and a network server connected thereto.

- 도면의 주요부분에 대한 부호의 설명 --Explanation of symbols for the main parts of the drawings-

1, 11 : 제어부 2, 12 : 음성 인식부 1, 11: control unit 2, 12: speech recognition unit

3, 13 : 버퍼 4, 14 : 전자사전 데이터베이스 3, 13: buffer 4, 14: electronic dictionary database

5, 15 : 음성 합성부 10 : 네트워크 서버 5, 15: speech synthesizer 10: network server

16 : 인증부 16: certification unit

Claims

Selecting at least one output mode among a voice recognition mode of one of spelling, syllable, and word modes, and a text and / or synthesized sound mode for outputting information about a word;

A second step of recognizing the audio inputted in the spelling, syllable, and word states for each spelling, syllable, and word, temporarily storing the voice in a buffer, and combining the words;

A third step of retrieving information on the word from the electronic dictionary database when the input of the voice for constituting a single word is terminated by the spelling, the syllable, and the word; And

A fourth step of outputting information on the searched word as text and / or synthesized sound according to the output mode selected in the first step;

Word search and result display method of the electronic dictionary through the voice recognition, characterized in that it comprises a.

The method of claim 1,

In the second step, after the input of a word having a combination of one or more letters or syllables or a pronunciation of one word is completed, the step of outputting and confirming a homonym of the word as a letter or a compound sound is further provided. Word search and result display method of the electronic dictionary through the voice recognition, characterized in that.

The method of claim 1,

In the third step, while inputting the spelling, syllable, or the voice for the word constituting the single word to be searched, a specific time has elapsed without inputting any voice signal or a specific key for notifying the end of the voice input is operated or When the voice is set arbitrarily to inform the completion of the voice input, the word search and result display method of the electronic dictionary through the voice recognition, characterized in that it is determined that the input of the voice for the word to be searched is finished.

The method of claim 1,

The word search of the electronic dictionary through voice recognition, characterized in that the input and recognition of the voice for the word search, the search of the word and the output of information on the word is performed on the local computer or a network server connected to communicate with the word. How to display the results.