KR20130053690A - Speech recognition system using hangul - Google Patents
Speech recognition system using hangul Download PDFInfo
- Publication number
- KR20130053690A KR20130053690A KR1020110119254A KR20110119254A KR20130053690A KR 20130053690 A KR20130053690 A KR 20130053690A KR 1020110119254 A KR1020110119254 A KR 1020110119254A KR 20110119254 A KR20110119254 A KR 20110119254A KR 20130053690 A KR20130053690 A KR 20130053690A
- Authority
- KR
- South Korea
- Prior art keywords
- unit
- sound
- voice
- user
- speech recognition
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
Abstract
Description
The present invention relates to a speech recognition system using Hangul, and more particularly, to extract a basic consonant and a basic vowel from Korean, set a sound law, and convert a user's voice into phonetic characters according to the sound law. The present invention relates to a voice recognition system using Hangul, which enables simple character conversion of a user's voice.
Recently, various kinds of computers, communication devices, and multimedia devices, such as personal computers, tablet PCs, notebook computers, mobile phones and personal digital assistants (PDAs), have been widely used.
Such multimedia devices are indispensable in living an information society, and how skillful and effective they are is regarded as the ability of modern society to live in an information society.
In particular, a computer that handles a variety of information processes the input information through input devices such as a keyboard, mouse, joystick, tablet, etc. and outputs the calculation results through output devices such as monitors and speakers. It follows the menu operation and requires accurate input because it deals with binary digital information.
On the other hand, in recent years, for the convenience of the user, a voice recognition function that can be driven only by the user's voice command has been developed, and a computer driven by the user's command is also produced. The speech recognition technology of the speech recognition function is being advanced more and more, such as 98% or more recognition rate in some isolated words.
However, such a computer system takes a method of converting a voice command input by a user into a corresponding system drive command and providing the same to the system. Therefore, when a voice recognition program for converting a voice command into a system drive command is supported, the voice command is provided. Is possible.
In addition, the voice recognition engine applied to a communication device supports a method of connecting a call to a corresponding phone number through voice instead of inputting a phone number through a keying operation when a call is connected, and recently, a caller sends a message by voice. When transmitting, the server having a voice recognition engine converts the transmitted voice into text and transmits the converted text message to the recipient's mobile communication device through a Short Message Service Center (SMSC). It is under development.
As such, computers and communication devices are making efforts to provide various services using voices. However, even though the voices are the same language, voices vary greatly depending on the person who pronounces them, ie, the gender, age of the speaker, and the state of pronunciation. Not only does it change, but when it is pronounced by itself and when it is pronounced in a word or sentence, its properties change.
Due to such complex characteristics, the speech recognition engine detects the feature vector of the speech, and after the complex process of pattern recognition, morphological analysis, pre-matching and syntax recognition, can generate the speech as a sentence.
Accordingly, the method of converting the speech into the text using the speech recognition engine must be configured through the server having the resources for performing the above complicated process, and only after that, the time is delayed in the speech / text conversion. There is a problem that an additional maintenance cost occurs according to the operation of the server.
In addition, the speech recognition engine and the program applied to the computer are mainly at the basic level to recognize only the execution of the program and the command of the basic operating system. Especially, the system equipped with the speaker-dependent speech recognition engine has As the recognition rate varies greatly depending on the tone, it is possible to use the product after repeating the user's voice after repeating the input of the user's voice as a learning period to increase the recognition rate before the product is used in earnest after shipment. There is a problem in that the endurance of the user according to the voice repetition training and a lot of effort must be taken.
However, these problems arise to express voice as ideograms. For reference, ideograph is a character that indicates meaning, and it is a character that is defined by only one character (letter).
As such, the speech recognition engine performs a complicated process to express a single character when converting a voice into a text, that is, to implement a text having a low error rate.
In addition, in the case of placing a greater weight on the meaning transfer than the exact letter, it is possible to use phonetic letters rather than ideographs. Phonetic letters are letters that represent sound, and are letters that are written mainly on the sound being pronounced.
On the other hand, the phonetic letters are divided into syllable letters written mainly on the syllables to be pronounced again and lowercase letters written mainly on the phonemes being pronounced. For reference, the representative language of syllable letters is Japanese, and the representative language of lower case letters is Korean.
The typical Japanese of the syllables is composed of only 50 syllables, so there is a limit because all the sounds of the world must be represented. Because of this, countless sounds can be written according to the combination of consonants and vowels, and no matter how many letters are combined, if you understand the principle of discipleship, there is no problem for people to learn.
Therefore, by using the basic principle of Hangul, which is a lower case letter, we have developed a speech recognition engine that can accurately recognize Korean, English, Chinese, etc., and accurately represent all sounds of the world. Therefore, there is an urgent need for researches that can be applied to multimedia devices to enable voice commands and text services using voice.
Accordingly, an object of the present invention is to extract the basic consonants and basic vowels from Korean, set the sound law, and simply convert the text of the user's voice by executing a speech recognition engine function that converts the user's voice into phonetic characters according to the sound law. In addition to this, it is possible to provide a speech recognition system using Korean to apply the speech recognition engine function to various multimedia devices such as computers and communication devices so that the system can be driven only by speech.
In order to achieve the above object of the present invention, a voice recognition system using Korean according to the present invention includes extracting only a voice signal by removing noise from an audio unit to which a user's voice signal is input and a user's voice input from the audio unit. A speech recognition engine unit for analyzing the pronunciation information from the speech signal transmitted from the noise removing unit, and detecting the minimum sound unit by pronunciation based on the Hangul language, and the minimum sound detected by the speech recognition engine unit Character generation unit for generating phonetic characters that represent sound based on units, and according to the user's selection, executes a voice command through the phonetic characters generated by the character generator or outputs them through a monitor or a speaker. Characterized in that consisting of a control unit for performing a function.
Here, the speech recognition engine unit, a sound law database for generating and storing a specific sound law by extracting the basic consonants and basic vowels required for the minimum sound unit in Hangul, and a morpheme analysis unit for separating the morphemes as they are pronounced. And a sound unit detector that detects the minimum sound unit pronounced based on the morpheme separated from the morpheme analyzer according to the sound law stored in the sound law database.
At this time, in the sound law of the sound law database, the basic consonants are o, ㅁ, a, c, ㅂ, ㅅ, ㅈ, and the basic vowels are ㅡ, l, ㅏ, ㅓ, ㅗ, TT, ㅐ.
The character generator may further include a language selector configured to comprehensively analyze a minimum sound unit detected by the speech recognition engine to select a user language such as Korean, English, and Chinese, and the language corresponding to the language selected by the language selector. And a character matching unit configured to read a sound law stored in a sound law database and generate a phonetic character for representing a sound based on a minimum sound unit detected by the speech recognition engine unit.
On the other hand, it characterized in that it further comprises a voice command execution key that can drive the system according to the user's voice instructions.
The controller may recognize voice command execution, voice command execution wait, voice command execution wait release, and voice command execution end according to the number of times of pressing or holding of the voice command execution key.
The controller may output a pop-up window in which user commands related to various voice commands are displayed on one side of the screen when the voice command execution key operates.
According to the speech recognition system using Hangul as described above, it is possible to extract the basic consonants and basic vowels from the Hangul, set the sound law, and execute the speech recognition engine function to convert the user's voice into phonetic characters according to the sound law. Not only is it possible to convert the text of the user voice, but also the voice recognition engine function is applied to various multimedia devices such as computers and communication devices, so that user-friendly functions such as system driving, text transmission, and voice mail transmission can be executed by only voice. .
In addition, the present invention can execute a predetermined command by a user's voice only without the input device by a hand operation such as a keyboard or a mouse to execute a program and various functions in a computer or a communication device, Operation can be freely performed with only voice commands, making it easier and more free to use multimedia devices such as computers and communication devices to the general public as well as the visually impaired and various people with disabilities. It also has the effect of using a multimedia device while using a computer or doing other work from a distance.
In addition, the present invention has the effect that can be used worldwide as it can be converted to sound in the case of Hangul as well as foreign language using the sound law defined by extracting the basic consonants and basic vowels from the Hangul.
1 is an internal block diagram of a speech recognition system using Korean characters according to an embodiment of the present invention.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.
Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.
The present invention can be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of the computer-readable recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like, and also implemented in the form of a carrier wave (for example, transmission over the Internet) . The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.
1 is an internal block diagram of a speech recognition system using Korean characters according to an embodiment of the present invention.
Referring to FIG. 1, a speech recognition system using Korean characters according to an exemplary embodiment of the present invention includes an
The speech
In this case, when the sound law stored in the
Table 1 shows the basic consonants in order of Hunminjeongeum, Hangeul, and Onsori.
TABLE 1
In Table 2, the flatness of Hunminjeongeum and the main sound of Hangeul are named elementary, the giants of Hunminjeongeum and Hangeul of Hangeul are named as rock, and the Sangsang of Hunminjeongeum and the strong sound of Hangeul are named Hinsori.
And the sound value is set to 'ㅎ' with 'ㅎ' as θ,
silver Moya, and ㅂ , ㅍ is ㅂ, silver . In addition, looking at Table 2, it can be seen that when the 'ㅡ' sounds in the elementary sound, it becomes a sound, and when the 'l' sound sounds in the elementary sound, it becomes a hinting sound.<Table 2>
Looking at the above Table 2, the 'ㅢ' sound is rejected, four double vowels of 'ㅐ, ㅔ, ㅒ, ㅖ' are expressed as the sound of 'ㅐ', and 'ㅙ, ㅞ'
If it is expressed as, it can be seen that seven short vowels can represent both the double vowel and the sound consisting of a double vowel.As shown in Tables 1 and 2 above, foreign languages such as Korean, English, Chinese, and the like may be expressed as sounds based on the on-sound consisting of minimum sound units.
On the other hand, the
For example, if the user language is English, as shown in Table 3, a sound law matched with English, on-sound, and phonetic symbols can be created.
<Table 3>
On the other hand, in the case of a computer, a communication device, and other multimedia devices, in order to control the system using only the user's voice, a voice command execution key (not shown) may be provided to drive the system according to the user's voice. In addition to the voice command execution key, a special function key such as a voice text transmission key, a voice mail input key, and an Internet execution key may be provided as necessary.
In this case, the special function keys such as the voice command execution key, the voice text transmission key, the voice mail input key, and the Internet execution key may be added to a touch pad, a keypad or a keyboard, or the like. In addition to the keypad or keyboard, it may be installed at a specific location of the device according to the user's convenience.
The
For example, the
In addition, the
Meanwhile, when the voice command execution key operates, the
For example, the main item of the Internet execution is a detailed item such as inputting an Internet address, refreshing, going to a favorite bar, executing a mouse function, and moving a web page. If the user sequentially inputs the voice on the Internet and inputs the Internet address, the
Hereinafter, a speech recognition system using Korean characters according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.
According to an exemplary embodiment of the present invention, if a user recognizes a sound through a microphone (MIC) while the user operates a voice command execution key, the user recognizes the voice in the
In particular, the
The speech
Then, the
In other words, "light up the light" is written "pronounced wealth," according to the phoneme of the Hangul 'ㅂ ㄹ ㄹ ㄹ バ ダ キ', according to the sound of the sound law database (31) '
ㅜㄹ ㅡ ㄹ Phonetic characters will be generated.On the other hand, when the user pronounces "Pianist" as a voice, it is pronounced "pianist", and the phonetic letters are generated as 'ㅂ o ㅏ니 ㅅ ㅡ ㅌ ㅡ', as shown in Table 3. The phonetic letters are matched with phonetic symbols to complete the word.
Therefore, when the user commands the display of characters on the monitor, the
In addition, the
In the above, an embodiment of implementing a speech recognition function in a computer such as a personal PC, a notebook computer, a tablet PC, etc. has been described. However, the speech recognition system using the Hangul language of the present invention is not limited to being performed in a computer. The present invention can also be applied to multimedia devices such as devices and televisions.
It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. It will be possible.
The present invention relates to a speech recognition system using Hangul, and more particularly, to extract a basic consonant and a basic vowel from Korean, and set a sound law, and to convert a user's voice into phonetic characters according to the sound law. Not only is it possible to convert the text of the user's voice by executing the function, but also the voice recognition engine function is applied to various multimedia devices such as computers and communication devices, so that user-friendly functions such as system driving, text transmission, and voice mail transmission can be performed only by voice. The present invention relates to a speech recognition system using Hangul.
10: audio unit 20: noise removal unit
30: speech recognition engine unit 31: sound law database
40: character generator 50: controller
60: transceiver 70: voice conversion engine
Claims (6)
A noise removing unit extracting only a voice signal by removing noise from a user voice input from the audio unit;
A speech recognition engine unit for analyzing the pronunciation information from the speech signal transmitted from the noise removing unit and detecting the pronunciation unit based on a Hangul-based minimum sound unit;
A character generator configured to generate a phonetic character for representing a sound based on a minimum sound unit detected by the speech recognition engine unit;
According to the user's selection, the voice recognition system using the Hangul, characterized in that the control unit to execute a voice command through the phonetic characters generated by the text generation unit, or to perform various control functions to output through a monitor or a speaker
A sound law database that extracts basic consonants and basic vowels required for the minimum sound unit from Korean and creates and stores specific sound laws;
A morpheme analysis unit for separating morphemes from the phonetic information by phonetic sound;
Speech recognition system using a Hangul, characterized in that consisting of a sound unit detector for detecting the minimum sound unit pronounced based on the morpheme separated from the morpheme analysis unit according to the sound law stored in the sound law database
A language selection unit for comprehensively analyzing the minimum sound units detected by the speech recognition engine unit to select a user language such as Korean, English, and Chinese;
And a character matching unit configured to read a sound law stored in the sound law database corresponding to the language selected by the language selection unit and to generate a phonetic character indicating a sound based on the minimum sound unit detected by the speech recognition engine unit. Korean Speech Recognition System
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020110119254A KR20130053690A (en) | 2011-11-16 | 2011-11-16 | Speech recognition system using hangul |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020110119254A KR20130053690A (en) | 2011-11-16 | 2011-11-16 | Speech recognition system using hangul |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20130053690A true KR20130053690A (en) | 2013-05-24 |
Family
ID=48662849
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020110119254A KR20130053690A (en) | 2011-11-16 | 2011-11-16 | Speech recognition system using hangul |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20130053690A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101994780B1 (en) * | 2018-10-08 | 2019-09-30 | 넷마블 주식회사 | Method and apparatus for registering shortcut key and excuting the shortcut key |
-
2011
- 2011-11-16 KR KR1020110119254A patent/KR20130053690A/en not_active Application Discontinuation
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101994780B1 (en) * | 2018-10-08 | 2019-09-30 | 넷마블 주식회사 | Method and apparatus for registering shortcut key and excuting the shortcut key |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107516511B (en) | Text-to-speech learning system for intent recognition and emotion | |
US11514886B2 (en) | Emotion classification information-based text-to-speech (TTS) method and apparatus | |
CN107077841B (en) | Superstructure recurrent neural network for text-to-speech | |
US20040111272A1 (en) | Multimodal speech-to-speech language translation and display | |
EP3151239A1 (en) | Method and system for text-to-speech synthesis | |
US20180068662A1 (en) | Generation of text from an audio speech signal | |
CN111462740A (en) | Voice command matching for voice-assisted application prototyping for non-speech alphabetic languages | |
US10930274B2 (en) | Personalized pronunciation hints based on user speech | |
WO2018079332A1 (en) | Information processing device and information processing method | |
JP2016521383A (en) | Method, apparatus and computer readable recording medium for improving a set of at least one semantic unit | |
Fellbaum et al. | Principles of electronic speech processing with applications for people with disabilities | |
KR20130112654A (en) | Apparatus for translation and method thereof | |
Koester | User performance with speech recognition: A literature review | |
KR101460447B1 (en) | Apparatus of learning intonations for learning foreign language and method thereof | |
US20190267028A1 (en) | Information processing apparatus and information processing method | |
JP6397641B2 (en) | Automatic interpretation device and method | |
JP2016161935A (en) | Method and device for improving set of at least one semantic unit by using voice, and computer readable recording medium | |
KR20130053690A (en) | Speech recognition system using hangul | |
Patil¹ et al. | Multilingual speech and text recognition and translation using image | |
KR101038727B1 (en) | Latin alphabet input device and mobile terminal including the same | |
Graham et al. | Evaluating OpenAI's Whisper ASR: Performance analysis across diverse accents and speaker traits | |
Matsuda | Finger braille teaching system using tablet computer | |
WO2023166651A1 (en) | Information processing device and information processing program | |
JP7165439B2 (en) | How to Train an Augmented Language Speech Recognition Model with Source Language Speech | |
EP1729284A1 (en) | Method and systems for a accessing data by spelling discrimination letters of link names |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WITN | Withdrawal due to no request for examination |