KR20130053690A

KR20130053690A - Speech recognition system using hangul

Info

Publication number: KR20130053690A
Application number: KR1020110119254A
Authority: KR
Inventors: 최성환; 최용호; 최철환
Original assignee: 최성환; 최용호
Priority date: 2011-11-16
Filing date: 2011-11-16
Publication date: 2013-05-24

Abstract

PURPOSE: A voice recognizing system using Korean is provided to implement the character conversion of a user voice by converting the user voice into a phonogram according to sound rules set by a voice recognition engine function. CONSTITUTION: A noise removing unit(20) removes a noise from a user voice inputted from an audio unit(10). A voice recognition engine unit(30) analyzes pronunciation information from a voice signal delivered from the noise removing unit to detect the smallest sound unit. A character generating unit(40) generates phonograms writing sounds based on the smallest sound unit. A control unit(50) executes a voice command through the phonograms and outputs the voice signal through a monitor or speaker. [Reference numerals] (10) Audio unit; (20) Noise removing unit; (32) Morpheme analysis unit; (33) Sound unit detecting unit; (41) Language selecting unit; (42) Character matching unit; (50) Control unit; (60) Transceiving unit; (70) Voice conversion engine

Description

Speech Recognition System Using Korean Characters {SPEECH RECOGNITION SYSTEM USING HANGUL}

The present invention relates to a speech recognition system using Hangul, and more particularly, to extract a basic consonant and a basic vowel from Korean, set a sound law, and convert a user's voice into phonetic characters according to the sound law. The present invention relates to a voice recognition system using Hangul, which enables simple character conversion of a user's voice.

Recently, various kinds of computers, communication devices, and multimedia devices, such as personal computers, tablet PCs, notebook computers, mobile phones and personal digital assistants (PDAs), have been widely used.

Such multimedia devices are indispensable in living an information society, and how skillful and effective they are is regarded as the ability of modern society to live in an information society.

In particular, a computer that handles a variety of information processes the input information through input devices such as a keyboard, mouse, joystick, tablet, etc. and outputs the calculation results through output devices such as monitors and speakers. It follows the menu operation and requires accurate input because it deals with binary digital information.

On the other hand, in recent years, for the convenience of the user, a voice recognition function that can be driven only by the user's voice command has been developed, and a computer driven by the user's command is also produced. The speech recognition technology of the speech recognition function is being advanced more and more, such as 98% or more recognition rate in some isolated words.

However, such a computer system takes a method of converting a voice command input by a user into a corresponding system drive command and providing the same to the system. Therefore, when a voice recognition program for converting a voice command into a system drive command is supported, the voice command is provided. Is possible.

In addition, the voice recognition engine applied to a communication device supports a method of connecting a call to a corresponding phone number through voice instead of inputting a phone number through a keying operation when a call is connected, and recently, a caller sends a message by voice. When transmitting, the server having a voice recognition engine converts the transmitted voice into text and transmits the converted text message to the recipient's mobile communication device through a Short Message Service Center (SMSC). It is under development.

As such, computers and communication devices are making efforts to provide various services using voices. However, even though the voices are the same language, voices vary greatly depending on the person who pronounces them, ie, the gender, age of the speaker, and the state of pronunciation. Not only does it change, but when it is pronounced by itself and when it is pronounced in a word or sentence, its properties change.

Due to such complex characteristics, the speech recognition engine detects the feature vector of the speech, and after the complex process of pattern recognition, morphological analysis, pre-matching and syntax recognition, can generate the speech as a sentence.

Accordingly, the method of converting the speech into the text using the speech recognition engine must be configured through the server having the resources for performing the above complicated process, and only after that, the time is delayed in the speech / text conversion. There is a problem that an additional maintenance cost occurs according to the operation of the server.

In addition, the speech recognition engine and the program applied to the computer are mainly at the basic level to recognize only the execution of the program and the command of the basic operating system. Especially, the system equipped with the speaker-dependent speech recognition engine has As the recognition rate varies greatly depending on the tone, it is possible to use the product after repeating the user's voice after repeating the input of the user's voice as a learning period to increase the recognition rate before the product is used in earnest after shipment. There is a problem in that the endurance of the user according to the voice repetition training and a lot of effort must be taken.

However, these problems arise to express voice as ideograms. For reference, ideograph is a character that indicates meaning, and it is a character that is defined by only one character (letter).

As such, the speech recognition engine performs a complicated process to express a single character when converting a voice into a text, that is, to implement a text having a low error rate.

In addition, in the case of placing a greater weight on the meaning transfer than the exact letter, it is possible to use phonetic letters rather than ideographs. Phonetic letters are letters that represent sound, and are letters that are written mainly on the sound being pronounced.

On the other hand, the phonetic letters are divided into syllable letters written mainly on the syllables to be pronounced again and lowercase letters written mainly on the phonemes being pronounced. For reference, the representative language of syllable letters is Japanese, and the representative language of lower case letters is Korean.

The typical Japanese of the syllables is composed of only 50 syllables, so there is a limit because all the sounds of the world must be represented. Because of this, countless sounds can be written according to the combination of consonants and vowels, and no matter how many letters are combined, if you understand the principle of discipleship, there is no problem for people to learn.

Therefore, by using the basic principle of Hangul, which is a lower case letter, we have developed a speech recognition engine that can accurately recognize Korean, English, Chinese, etc., and accurately represent all sounds of the world. Therefore, there is an urgent need for researches that can be applied to multimedia devices to enable voice commands and text services using voice.

Accordingly, an object of the present invention is to extract the basic consonants and basic vowels from Korean, set the sound law, and simply convert the text of the user's voice by executing a speech recognition engine function that converts the user's voice into phonetic characters according to the sound law. In addition to this, it is possible to provide a speech recognition system using Korean to apply the speech recognition engine function to various multimedia devices such as computers and communication devices so that the system can be driven only by speech.

In order to achieve the above object of the present invention, a voice recognition system using Korean according to the present invention includes extracting only a voice signal by removing noise from an audio unit to which a user's voice signal is input and a user's voice input from the audio unit. A speech recognition engine unit for analyzing the pronunciation information from the speech signal transmitted from the noise removing unit, and detecting the minimum sound unit by pronunciation based on the Hangul language, and the minimum sound detected by the speech recognition engine unit Character generation unit for generating phonetic characters that represent sound based on units, and according to the user's selection, executes a voice command through the phonetic characters generated by the character generator or outputs them through a monitor or a speaker. Characterized in that consisting of a control unit for performing a function.

Here, the speech recognition engine unit, a sound law database for generating and storing a specific sound law by extracting the basic consonants and basic vowels required for the minimum sound unit in Hangul, and a morpheme analysis unit for separating the morphemes as they are pronounced. And a sound unit detector that detects the minimum sound unit pronounced based on the morpheme separated from the morpheme analyzer according to the sound law stored in the sound law database.

At this time, in the sound law of the sound law database, the basic consonants are o, ㅁ, a, c, ㅂ, ㅅ, ㅈ, and the basic vowels are ㅡ, l, ㅏ, ㅓ, ㅗ, TT, ㅐ.

The character generator may further include a language selector configured to comprehensively analyze a minimum sound unit detected by the speech recognition engine to select a user language such as Korean, English, and Chinese, and the language corresponding to the language selected by the language selector. And a character matching unit configured to read a sound law stored in a sound law database and generate a phonetic character for representing a sound based on a minimum sound unit detected by the speech recognition engine unit.

On the other hand, it characterized in that it further comprises a voice command execution key that can drive the system according to the user's voice instructions.

The controller may recognize voice command execution, voice command execution wait, voice command execution wait release, and voice command execution end according to the number of times of pressing or holding of the voice command execution key.

The controller may output a pop-up window in which user commands related to various voice commands are displayed on one side of the screen when the voice command execution key operates.

According to the speech recognition system using Hangul as described above, it is possible to extract the basic consonants and basic vowels from the Hangul, set the sound law, and execute the speech recognition engine function to convert the user's voice into phonetic characters according to the sound law. Not only is it possible to convert the text of the user voice, but also the voice recognition engine function is applied to various multimedia devices such as computers and communication devices, so that user-friendly functions such as system driving, text transmission, and voice mail transmission can be executed by only voice. .

In addition, the present invention can execute a predetermined command by a user's voice only without the input device by a hand operation such as a keyboard or a mouse to execute a program and various functions in a computer or a communication device, Operation can be freely performed with only voice commands, making it easier and more free to use multimedia devices such as computers and communication devices to the general public as well as the visually impaired and various people with disabilities. It also has the effect of using a multimedia device while using a computer or doing other work from a distance.

In addition, the present invention has the effect that can be used worldwide as it can be converted to sound in the case of Hangul as well as foreign language using the sound law defined by extracting the basic consonants and basic vowels from the Hangul.

1 is an internal block diagram of a speech recognition system using Korean characters according to an embodiment of the present invention.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

The present invention can be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of the computer-readable recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like, and also implemented in the form of a carrier wave (for example, transmission over the Internet) . The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Referring to FIG. 1, a speech recognition system using Korean characters according to an exemplary embodiment of the present invention includes an audio unit 10 to which a user's voice signal is input, and ambient noise from a user's voice input from the audio unit 10. A noise removing unit 20 for extracting only a speech signal by removing a signal and a speech recognition engine unit for analyzing the pronunciation information from the voice signal transmitted from the noise removing unit 20 and detecting the minimum sound unit pronounced based on the Hangul language. 30, a character generator 40 for generating a phonetic character for indicating a sound based on the minimum sound unit detected by the speech recognition engine unit 30, and the character generator 40 according to a user's selection. Control unit 50 for executing a voice command or output through a monitor or a speaker (SP) through the phonetic alphabet generated in the), and the user command under the control of the control unit 50 Although made of a transmission and reception section 60 for transmitting and receiving data it is not limited thereto.

The speech recognition engine unit 30 separates the basic consonants and basic vowels necessary for the minimum sound unit from the Korean alphabet by using a sound law database 31 for generating and storing a specific sound law, and morphemes as the phonetic information is sounded. The sound unit detector 33 detects the minimum sound unit pronounced based on the morpheme analyzer 32 and the morpheme separated from the morpheme analyzer 32 according to the sound law stored in the sound law database 31. It consists of.

In this case, when the sound law stored in the sound law database 31 is called on sound, the basic consonants of the on sound are o, ㅁ, a, c, ㅂ, ㅅ, ㅈ, and the basic vowels are ㅡ, l, ㅏ, ㅓ, ㅗ, TT, ㅐ.

Table 1 shows the basic consonants in order of Hunminjeongeum, Hangeul, and Onsori.

TABLE 1

In Table 2, the flatness of Hunminjeongeum and the main sound of Hangeul are named elementary, the giants of Hunminjeongeum and Hangeul of Hangeul are named as rock, and the Sangsang of Hunminjeongeum and the strong sound of Hangeul are named Hinsori.

And the sound value is set to 'ㅎ' with 'ㅎ' as θ,

silver

Moya, and ㅂ

, ㅍ is ㅂ,

silver

. In addition, looking at Table 2, it can be seen that when the 'ㅡ' sounds in the elementary sound, it becomes a sound, and when the 'l' sound sounds in the elementary sound, it becomes a hinting sound.

Looking at the above Table 2, the 'ㅢ' sound is rejected, four double vowels of 'ㅐ, ㅔ, ㅒ, ㅖ' are expressed as the sound of 'ㅐ', and 'ㅙ, ㅞ'

If it is expressed as, it can be seen that seven short vowels can represent both the double vowel and the sound consisting of a double vowel.

As shown in Tables 1 and 2 above, foreign languages such as Korean, English, Chinese, and the like may be expressed as sounds based on the on-sound consisting of minimum sound units.

On the other hand, the character generation unit 40, a language selection unit 41 for comprehensively analyzing the minimum sound unit detected by the speech recognition engine unit 30 to select a user language such as Korean, English, Chinese, and; A phonetic sound that reads a sound law stored in the sound law database 31 corresponding to a language selected by the language selector 41 and displays a sound based on the minimum sound unit detected by the speech recognition engine 30. It consists of a character matching part 42 which produces a character.

For example, if the user language is English, as shown in Table 3, a sound law matched with English, on-sound, and phonetic symbols can be created.

On the other hand, in the case of a computer, a communication device, and other multimedia devices, in order to control the system using only the user's voice, a voice command execution key (not shown) may be provided to drive the system according to the user's voice. In addition to the voice command execution key, a special function key such as a voice text transmission key, a voice mail input key, and an Internet execution key may be provided as necessary.

In this case, the special function keys such as the voice command execution key, the voice text transmission key, the voice mail input key, and the Internet execution key may be added to a touch pad, a keypad or a keyboard, or the like. In addition to the keypad or keyboard, it may be installed at a specific location of the device according to the user's convenience.

The controller 50 may recognize the voice command execution, the voice command execution wait, the voice command execution wait release, and the voice command execution end according to the number of times of pressing or holding the voice command execution key in advance.

For example, the controller 50 is a voice command execution when the voice command execution key is pressed for a predetermined time, and is a voice command execution wait when the voice command execution key is pressed while the voice command execution is in progress, and executes a voice command. If the voice command execution key is pressed twice in the standby state, the voice command execution key is released. If the voice command execution key is pressed for a certain time while the voice command execution is in progress, the voice command execution end can be set.

In addition, the controller 50 may define various user commands by combining the voice command execution key and the existing key.

Meanwhile, when the voice command execution key operates, the controller 50 may output a popup window displaying user commands related to various voice commands on one side of the screen. In this case, the pop-up window displays the main items such as Internet execution, voice text transmission, voice mail input, Hangul execution, music listening, and movie viewing, and lists detailed items by main items.

For example, the main item of the Internet execution is a detailed item such as inputting an Internet address, refreshing, going to a favorite bar, executing a mouse function, and moving a web page. If the user sequentially inputs the voice on the Internet and inputs the Internet address, the controller 50 recognizes the voice command, executes the Internet browser, moves the cursor to the Internet address input window, and waits for the user voice input. Done.

Hereinafter, a speech recognition system using Korean characters according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

According to an exemplary embodiment of the present invention, if a user recognizes a sound through a microphone (MIC) while the user operates a voice command execution key, the user recognizes the voice in the audio unit 10. The “light up” sound is inputted, and after the noise removing unit 20 removes noise around the user, that is, a car sound or a third party's voice, the user's voice is transmitted to the voice recognition engine unit 30.

In particular, the noise removing unit 20 determines the user's voice input at the position closest to the microphone as normal data by comparing the magnitude of the sound pressure and the like, and removes other voices or sounds as ambient noise.

The speech recognition control unit 50 detects the minimum sound unit pronounced based on the separated morphemes after separating the morphemes as they sound from the voice signal transmitted from the noise removing unit 20 based on the sound law database 31. do.

Then, the character generator 40 generates a phonetic character for indicating a sound based on the minimum sound unit detected by the speech recognition engine 30.

In other words, "light up the light" is written "pronounced wealth," according to the phoneme of the Hangul 'ㅂ ㄹ ㄹ ㄹ バダキ', according to the sound of the sound law database (31) '

ㅜㄹ ㅡ ㄹ

Phonetic characters will be generated.

On the other hand, when the user pronounces "Pianist" as a voice, it is pronounced "pianist", and the phonetic letters are generated as 'ㅂ o ㅏ니 ㅅ ㅡ ㅌ ㅡ', as shown in Table 3. The phonetic letters are matched with phonetic symbols to complete the word.

Therefore, when the user commands the display of characters on the monitor, the controller 50 matches the generated phonetic letters with a language selected by the user, that is, English, Korean, Chinese, etc., to display the completed characters.

In addition, the controller 50 may output a text or mail transmitted to the other party or a sentence on a web page selected by the user to the speaker SP through the voice conversion engine 70, and the user transmits a text service or mail. When the service is requested, the user's voice is converted into phonetic characters through the transceiver 60 and transmitted.

In the above, an embodiment of implementing a speech recognition function in a computer such as a personal PC, a notebook computer, a tablet PC, etc. has been described. However, the speech recognition system using the Hangul language of the present invention is not limited to being performed in a computer. The present invention can also be applied to multimedia devices such as devices and televisions.

It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It will be possible.

The present invention relates to a speech recognition system using Hangul, and more particularly, to extract a basic consonant and a basic vowel from Korean, and set a sound law, and to convert a user's voice into phonetic characters according to the sound law. Not only is it possible to convert the text of the user's voice by executing the function, but also the voice recognition engine function is applied to various multimedia devices such as computers and communication devices, so that user-friendly functions such as system driving, text transmission, and voice mail transmission can be performed only by voice. The present invention relates to a speech recognition system using Hangul.

10: audio unit 20: noise removal unit
30: speech recognition engine unit 31: sound law database
40: character generator 50: controller
60: transceiver 70: voice conversion engine

Claims

An audio unit to which a user's voice signal is input;
A noise removing unit extracting only a voice signal by removing noise from a user voice input from the audio unit;
A speech recognition engine unit for analyzing the pronunciation information from the speech signal transmitted from the noise removing unit and detecting the pronunciation unit based on a Hangul-based minimum sound unit;
A character generator configured to generate a phonetic character for representing a sound based on a minimum sound unit detected by the speech recognition engine unit;
According to the user's selection, the voice recognition system using the Hangul, characterized in that the control unit to execute a voice command through the phonetic characters generated by the text generation unit, or to perform various control functions to output through a monitor or a speaker

The method of claim 1, wherein the speech recognition engine unit,
A sound law database that extracts basic consonants and basic vowels required for the minimum sound unit from Korean and creates and stores specific sound laws;
A morpheme analysis unit for separating morphemes from the phonetic information by phonetic sound;
Speech recognition system using a Hangul, characterized in that consisting of a sound unit detector for detecting the minimum sound unit pronounced based on the morpheme separated from the morpheme analysis unit according to the sound law stored in the sound law database

The method of claim 1, wherein in the sound law of the sound law database, the basic consonants are ,, ㅁ, ,, ,, ㅂ, ㅅ, ,, and the basic vowels are ㅡ, ,, ㅏ, ㅓ, ㅗ, ㅜ, ㅐ. Korean Speech Recognition System

The method of claim 2, wherein the character generating unit,
A language selection unit for comprehensively analyzing the minimum sound units detected by the speech recognition engine unit to select a user language such as Korean, English, and Chinese;
And a character matching unit configured to read a sound law stored in the sound law database corresponding to the language selected by the language selection unit and to generate a phonetic character indicating a sound based on the minimum sound unit detected by the speech recognition engine unit. Korean Speech Recognition System

The voice recognition system of claim 1, further comprising a voice command execution key for operating the system according to a voice command of the user.

The voice command execution key of claim 5, wherein the controller recognizes a voice command execution, a voice command execution wait, a voice command execution wait release, or a voice command execution end according to the number of presses or the holding time of the voice command execution key. When the operation is performed, the controller outputs a pop-up window in which user commands related to various voice commands are displayed on one side of the screen.