KR101355284B1

KR101355284B1 - Method for Recommending Words and Completing Sentences in Touch Screen Devices

Info

Publication number: KR101355284B1
Application number: KR1020120019525A
Authority: KR
Inventors: 김판구; 최동진; 고병규; 고미아
Original assignee: 조선대학교산학협력단
Priority date: 2012-02-27
Filing date: 2012-02-27
Publication date: 2014-01-28
Also published as: KR20130097888A

Abstract

본 발명은 가상키보드를 이용하여 문자를 입력하고, 입력된 문자를 바탕으로 선정된 추천단어가 표시되며, 표시된 추천단어를 선택할 수 있는 터치스크린부와, 상기 입력된 문자를 바탕으로 비교대상이 되는 엔-그램 데이터를 추출하고 출현 빈도수가 높은 순으로 추천단어를 선정하여 터치스크린부로 출력하는 추천단어 추출부와, 상기 추천단어 추출부와 연결되어 엔-그램 데이터를 제공하고, 정제된 특정 도메인으로부터 엔-그램을 추출하여 엔-그램 데이터베이스를 구축하는 엔-그램 DB 구축부를 포함하는 것을 특징으로 하는 터치스크린 환경에서 단어 추천 및 문장 완성 시스템 및 그 방법을 제공한다.
본 발명의 터치스크린 환경에서 단어 추천 및 문장 완성 시스템 및 그 방법에 따르면, 신뢰성 높은 웹문서에서 특정 도메인만을 선정하여 구축된 엔-그램 데이터를 바탕으로 사용자가 입력하는 단어의 순번과 해당 음절에 부합하는 엔-그램 데이터를 선정하고, 출현빈도 수가 높은 단어 즉, 빈번히 사용하는 단어 순으로 추천단어를 출력하며, 사용자의 의도와 일치하는 추천단어를 선택하도록 함으로써, 최소한의 키입력으로 정확한 단어를 입력하고 문장을 쉽게 완성시킬 수 있다.According to the present invention, a character is input using a virtual keyboard, and a selected recommendation word is displayed based on the input character, and a touch screen unit for selecting the displayed recommendation word is to be compared based on the input character. A word extraction unit for extracting en-gram data and selecting a recommendation word in order of high frequency of appearance and outputting the recommended word to the touch screen unit; and providing the en-gram data in connection with the recommendation word extracting unit, The present invention provides a word recommendation and sentence completion system and a method in a touch screen environment including an en-gram DB construction unit for extracting an en-gram to build an en-gram database.
According to the word recommendation and sentence completion system and method in the touch screen environment of the present invention, based on the en-gram data constructed by selecting only a specific domain in a highly reliable web document, the sequence of words entered by the user and the corresponding syllables are matched. Input the correct word with the minimum key input by selecting the en-gram data to be selected, outputting the recommended words in order of frequently used words, that is, frequently used words, and selecting the recommended words that match the user's intention. And complete the sentence easily.

Description

Method for Recommending Words and Completing Sentences in Touch Screen Devices}

본 발명은 터치스크린 환경에서 단어 추천 및 문장 완성 시스템 및 그 방법에 관한 것으로, 좀 더 상세하게는 터치스크린이 적용된 기기에서 문자를 입력할 시 단어를 추천하고, 문장을 완성하는 시스템 및 그 방법에 관한 것이다.
The present invention relates to a word recommendation and sentence completion system and method in a touch screen environment, and more particularly, to a system and method for recommending words and completing sentences when a character is input in a device to which a touch screen is applied. It is about.

최근 급격한 스마트폰의 발달과 각종 터치스크린이 적용된 기기들의 보급으로 인하여 물리적으로 키판을 누르며 단어를 입력하는 방법을 탈피하여 가상키보드가 적용된 터치스크린 자판에서 단어를 입력하는 환경으로 바뀌었다. Recently, due to the rapid development of smart phones and the spread of devices with various touch screens, the way of inputting words by physically pressing the keyboard has been changed to the environment for inputting words on the touch screen keyboard with the virtual keyboard.

이러한 가상키보드는 쿼티(qwerty) 자판을 사용하여 단어를 입력하지만 작은 단말기의 크기와 물리적 경계의 부재, 터치스크린의 센서 문제 등으로 인하여 오탈자가 빈번히 발생하게 된다. 또한, 작은 화면의 가상버튼을 눌러야 하는 어려움으로 인해 문자 입력시간이 길어지게 되고, 사용자로 하여금 불편함을 느끼게 만든다. 더불어 신체적으로 불편함을 겪는 사람들은 무엇보다도 최소한이 키 입력으로 원하는 단어를 입력해야 하는 필요성이 대두 된다.
Such a virtual keyboard uses a qwerty keyboard to input words, but typos frequently occur due to the size of a small terminal, lack of physical boundaries, sensor problems on the touch screen, and the like. In addition, due to the difficulty of pressing a virtual button of a small screen, the character input time is lengthened, making the user feel uncomfortable. In addition, people suffering from physical discomfort, among other things, the need to enter the desired word at least by keystrokes.

한국공개특허: 10-2008-0039009 (공개일 2008. 05.07)Korean Publication Patent: 10-2008-0039009 (Published 2008. 05.07)

한국등록특허: 10-0892003 (공고일 2009. 04.07)
Korea Patent Registration: 10-0892003 (Notice date 2009. 04.07)

본 발명은 종래의 문제점을 해결하기 위해 안출 된 것으로서, SUMMARY OF THE INVENTION The present invention has been made to solve the conventional problems,

본 발명의 목적은 신뢰성 높은 웹문서에서 특정 도메인만을 선정하여 구축된 엔-그램 데이터를 바탕으로 사용자가 입력하는 단어의 순번과 해당 음절에 부합하는 엔-그램 데이터를 선정하고, 출현빈도 수가 높은 단어 순으로 추천단어를 출력하며, 사용자의 의도와 일치하는 추천단어를 선택하도록 함으로써, 최소한의 키입력으로 정확한 단어를 입력하고 문장을 쉽게 완성시킬 수 있는 터치스크린 환경에서 단어 추천 및 문장 완성 시스템 및 그 방법을 제공하는 데 있다.
An object of the present invention is to select a sequence of words entered by the user and the en-gram data corresponding to the corresponding syllables based on the en-gram data constructed by selecting only a specific domain in a highly reliable web document, and a word having a high frequency of occurrence. Suggested words are output in order, and by selecting the recommended words that match the user's intention, the word recommendation and sentence completion system and the sentence completion system in a touch screen environment that can input the correct word with a minimum key input and can easily complete the sentence. To provide a way.

상기와 같은 목적을 달성하기 위해 제공되는 일 관점에 따른 본 발명의 터치스크린 환경에서 단어 추천 및 문장 완성 시스템은 터치스크린으로 입력된 단어와 음절을 바탕으로 비교대상이 되는 엔-그램 데이터를 추출하고, 추출된 엔-그램 데이터 중 출현빈도 수가 높은 순으로 단어를 추천하여 출력하며, 출력된 추천단어의 선택으로 문장이 완성되는 것을 특징으로 한다. In the touch screen environment of the present invention according to an aspect provided to achieve the above object, the word recommendation and sentence completion system extracts en-gram data to be compared based on words and syllables input to the touch screen. In the extracted en-gram data, a word is recommended and output in the order of the high frequency of appearance, and the sentence is completed by selecting the output recommendation word.

상기 시스템은 문자의 입력 및 추천단어를 선택할 수 있는 터치스크린부와; 상기 입력된 문자를 바탕으로 비교대상이 되는 엔-그램 데이터를 추출하고 출현 빈도수가 높은 순으로 단어를 추천하여 출력하는 추천단어 추출부와; 상기 추천단어 추출부에 엔-그램 데이터를 제공하기 위한 엔-그램 데이터베이스를 구축하는 엔-그램 DB 구축부를 포함하여 구성되는 것을 특징으로 한다.The system includes a touch screen unit for inputting letters and selecting recommended words; A recommendation word extracting unit extracting en-gram data to be compared based on the input characters and recommending and outputting words in order of appearance frequency; And an en-gram DB construction unit for constructing an en-gram database for providing en-gram data to the recommendation word extracting unit.

여기서, 상기 터치스크린부는 문자를 입력하는 텍스트 입력부와, 입력된 문자 및 완성된 문장을 출력하는 입력단어 출력부와, 추천단어를 출력하는 추천단어 출력부로 구성되는 것을 특징으로 한다. The touch screen unit may include a text input unit for inputting a character, an input word output unit for outputting an input character and a completed sentence, and a recommended word output unit for outputting a recommended word.

상기 엔-그램 DB 구축부는 한 단어로 구성된 유니그램 데이터와, 연속된 두 단어의 조합인 바이그램 데이터와, 연속된 세 단어의 조합인 트라이그램 데이터로 구분되며, 각 단어의 출현 빈도수를 계산하여 저장한 것을 특징으로 한다.The en-gram DB construction unit is divided into unigram data consisting of one word, bigram data consisting of a combination of two consecutive words, and trigram data consisting of a combination of three consecutive words, and calculating and storing the frequency of occurrence of each word. It is characterized by one.

그리고, 상기 추천단어 추출부는 입력된 문자의 첫 음절이 첫 번째, 두 번째, 세 번째 단어 중 어디에 속하는지를 판단하고, 첫 번째 단어의 첫 음절인 경우 유니그램 데이터에서 검색하고, 두 번째 단어의 첫 음절인 경우 바이그램 데이터에서 검색하며, 세 번째 단어의 첫 음절인 경우 트라이그램 데이터에서 검색하는 것을 특징으로 한다. The recommended word extracting unit determines whether the first syllable of the input character belongs to the first, second, or third words, and if the first syllable is the first syllable, searches the unigram data and searches the first syllable of the first word. If the syllable is searched in the bigram data, and the first syllable of the third word is searched in the trigram data.

또한, 상기 엔-그램 DB 구축부는 웹문서에서 특정 도메인만을 선정하고, 선정된 특정 도메인의 초록문을 추출하며, 특수문자를 제거하는 전처리과정을 통해 엔-그램 데이터를 구축하는 것을 특징으로 한다.In addition, the en-gram DB construction unit selects only a specific domain from the web document, extracts the abstract text of the selected specific domain, characterized in that the construction of the en-gram data through a preprocessing process to remove the special characters.

다른 관점에 따른 본 발명의 본 발명의 터치스크린 환경에서 단어 추천 및 문장 완성 시스템은 가상키보드를 이용하여 문자를 입력하고, 입력된 문자를 바탕으로 선정된 추천단어가 표시되며, 표시된 추천단어를 선택할 수 있는 터치스크린부; 상기 입력된 문자를 바탕으로 비교대상이 되는 엔-그램 데이터를 추출하고 출현 빈도수가 높은 순으로 추천단어를 선정하여 터치스크린부로 출력하는 추천단어 추출부; 및 상기 추천단어 추출부와 연결되어 엔-그램 데이터를 제공하고, 정제된 특정 도메인으로부터 엔-그램을 추출하여 엔-그램 데이터베이스를 구축하는 엔-그램 DB 구축부;를 포함하는 것을 특징으로 한다.According to another aspect of the present invention, in the touch screen environment of the present invention, the word recommendation and sentence completion system inputs a character using a virtual keyboard, selects a selected recommendation word based on the input character, and selects a displayed recommendation word. Touch screen unit that can be; A recommendation word extraction unit configured to extract en-gram data to be compared based on the input characters, select recommended words in order of appearance frequency, and output the selected words to the touch screen unit; And an en-gram DB constructing unit connected to the recommendation word extracting unit to provide en-gram data and extracting the en-gram from the purified specific domain to build an en-gram database.

상기 터치스크린부는 가상키보드를 이용하여 문자를 입력하는 텍스트 입력부와, 추천단어를 출력하여 표시하고 표시된 추천단어를 선택할 수 있는 추천단어 출력부와, 상기 텍스트 입력부에서 입력한 문자가 출력되어 표시되는 입력단어 출력부로 구성되는 것을 특징으로 한다. The touch screen unit may include a text input unit for inputting a character using a virtual keyboard, a recommendation word output unit for outputting and displaying a recommendation word, and selecting a displayed recommendation word, and an input for outputting and displaying the character input from the text input unit. Characterized in that the word output unit.

여기서, 상기 추천단어 출력부에 출력되어 표시되는 추천단어를 바탕으로 단어를 선택하면, 선택된 단어가 입력단어 출력부에 표시되는 것을 특징으로 한다.Here, when a word is selected based on the recommended word displayed on the recommended word output unit, the selected word is displayed on the input word output unit.

상기 추천단어 추출부는 입력되는 단어와 음절을 바탕으로 비교대상이 되는 엔-그램을 선정하는 비교 엔-그램 결정부와, 선정된 엔-그램 데이터 중 출현빈도가 높은 순으로 추천단어를 선정하는 추천단어 선정부로 구성되는 것을 특징으로 한다.The recommendation word extracting unit selects an en-gram to be compared based on the input word and syllable, and recommends selecting the recommendation word in order of high frequency of appearance among the selected en-gram data. Characterized in that the word selection unit.

그리고, 상기 엔-그램 DB 구축부는 한 단어로 구성되고 각 단어의 빈도수를 계산하여 저장하는 유니그램 데이터부와, 연속된 두 단어의 조합이고 각 단어의 빈도수를 계산하여 저장하는 바이그램 데이터부와, 연속된 세 단어의 조합이고 각 단어의 빈도수를 계산하여 저장하는 트라이그램 데이터부로 구성되는 것을 특징으로 한다.The gramgram DB construction unit includes a single word, a unigram data unit configured to calculate and store the frequency of each word, a combination of two consecutive words, and a gramgram data unit configured to calculate and store the frequency of each word; Combination of three consecutive words, characterized in that consisting of a trigram data unit for calculating and storing the frequency of each word.

또한, 상기 엔-그램 DB 구축부는 대량의 웹문서를 간소화시키는 전처리부를 포함하며, 상기 전처리부는 주요 관심 분야을 선정하는 특정 도메인 선정부와, 초록문을 추출하는 추출부와, 특수문자를 제거하는 필터링부로 이루어지는 것을 특징으로 한다.
In addition, the engram DB construction unit includes a preprocessor for simplifying a large amount of web documents, the preprocessor is a specific domain selection unit for selecting a major area of interest, an extraction unit for extracting the abstract text, filtering to remove special characters It is characterized by consisting of wealth.

본 발명의 터치스크린 환경에서 단어 추천 및 문장 완성 시스템 및 그 방법에 따르면, 신뢰성 높은 웹문서에서 특정 도메인만을 선정하여 구축된 엔-그램 데이터를 바탕으로 사용자가 입력하는 단어의 순번과 해당 음절에 부합하는 엔-그램 데이터를 선정하고, 출현빈도 수가 높은 단어 즉, 빈번히 사용하는 단어 순으로 추천단어를 출력하며, 사용자의 의도와 일치하는 추천단어를 선택하도록 함으로써, 최소한의 키입력으로 정확한 단어를 입력하고 문장을 쉽게 완성시킬 수 있는 효과가 있다. According to the word recommendation and sentence completion system and method in the touch screen environment of the present invention, based on the en-gram data constructed by selecting only a specific domain in a highly reliable web document, the sequence of words entered by the user and the corresponding syllables are matched. Input the correct word with the minimum key input by selecting the en-gram data to be selected, outputting the recommended words in order of frequently used words, that is, frequently used words, and selecting the recommended words that match the user's intention. And the effect is easy to complete the sentence.

또한, 최소한의 키입력으로 정확한 단어를 입력하고 문장을 쉽게 완성시킬 수 있음에 따라 오탈자가 발생되는 것을 방지하고, 문자의 입력시간을 단축시킬 수 있으며, 사용자의 편의성을 향상시키는 효과가 있다.
In addition, by entering the correct word with a minimum key input and easy to complete the sentence, it is possible to prevent typos from occurring, shorten the input time of characters, and improve user convenience.

도 1 은 본 발명에 따른 터치스크린 환경에서 단어 추천 및 문장 완성 시스템을 도시한 구성도.
도 2는 본 발명에 따른 터치스크린 환경에서 단어 추천 및 문장 완성 시스템에서 특정 도메인 선정에 따른 엔-그램 구축과 이를 이용한 단어추천시스템을 도시한 개념도.
도 3은 본 발명에 따른 터치스크린 환경에서 단어 추천 및 문장 완성 방법을 도시한 순서도.
도 4는 본 발명에 따른 터치스크린 환경에서 단어 추천 및 문장 완성 방법에서 추천단어를 추출하는 과정을 나타내는 흐름도.1 is a block diagram illustrating a word recommendation and sentence completion system in a touch screen environment according to the present invention.
2 is a conceptual diagram illustrating an en-gram construction and a word recommendation system using the same according to a specific domain selection in a word recommendation and sentence completion system in a touch screen environment according to the present invention.
3 is a flowchart illustrating a word recommendation and sentence completion method in a touch screen environment according to the present invention.
4 is a flowchart illustrating a process of extracting a recommended word in a word recommendation and sentence completion method in a touch screen environment according to the present invention.

본 발명의 상기와 같은 목적, 특징 및 다른 장점들은 첨부도면을 참조하여 본 발명의 바람직한 실시 예를 상세히 설명함으로써 더욱 명백해질 것이다. 이하, 첨부된 도면을 참조하여 터치스크린 환경에서 단어 추천 및 문장 완성 시스템 및 그 방법을 상세히 설명하기로 한다. 본 명세서를 위해서, 도면에서의 동일한 참조번호들은 달리 지시하지 않는 한 동일한 구성부분을 나타낸다.These and other objects, features and other advantages of the present invention will become more apparent by describing in detail preferred embodiments of the present invention with reference to the accompanying drawings. Hereinafter, a word recommendation and sentence completion system and a method thereof in a touch screen environment will be described in detail with reference to the accompanying drawings. For purposes of this specification, like reference numerals in the drawings denote like elements unless otherwise indicated.

도 1 은 본 발명에 따른 터치스크린 환경에서 단어 추천 및 문장 완성 시스템을 도시한 구성도이고, 도 2는 본 발명에 따른 터치스크린 환경에서 단어 추천 및 문장 완성 시스템에서 특정 도메인 선정에 따른 엔-그램 구축과 이를 이용한 단어추천시스템을 도시한 개념도이다.1 is a block diagram illustrating a word recommendation and sentence completion system in a touch screen environment according to the present invention, Figure 2 is an en-gram according to the selection of a specific domain in the word recommendation and sentence completion system in a touch screen environment according to the present invention A conceptual diagram showing the construction and the word recommendation system using the same.

본 발명의 터치스크린 환경에서 단어 추천 및 문장 완성 시스템은 터치스크린으로 입력된 단어와 음절을 바탕으로 비교대상이 되는 엔-그램 데이터를 추출하고, 추출된 엔-그램 데이터 중 출현 빈도수가 높은 순으로 단어를 추천하여 출력하며, 출력된 추천단어의 선택으로 문장을 완성시킨다. In the touch screen environment of the present invention, the word recommendation and sentence completion system extracts the en-gram data to be compared based on the words and syllables input on the touch screen, and in the order of appearance frequency among the extracted en-gram data. Suggest a word and print it, and complete the sentence by selecting the suggested word.

도 1, 도 2 에 도시한 바와 같이 본 발명의 시스템은 터치스크린부(100)와, 추천단어 추출부(200)와, 엔-그램 DB 구축부(300)를 포함한다. As shown in FIG. 1 and FIG. 2, the system of the present invention includes a touch screen unit 100, a word extraction unit 200, and an en-gram DB construction unit 300.

터치스크린부(100)는 터치스크린이 적용된 기기에서 터치에 의해 입력이 이루어지는 화면 창을 말한다. 이러한 터치스크린부(100)는 가상키보드가 적용되어 문자의 입력이 이루어지는 텍스트 입력부(110)와, 추천단어를 출력하여 표시하고 표시된 추천단어를 선택할 수 있는 추천단어 출력부(120)와, 텍스트 입력부(110)에서 입력한 문자가 출력되어 표시되고 아울러 추천단어 출력부(120)에서 추천단어를 선택하면 선택된 추천단어가 출력되어 표시되는 입력단어 출력부(130)로 구성된다.여기서, 텍스트 입력부(110)의 가상키보드는 쿼티(qwerty) 자판이 사용되는 것이 바람직하다. The touch screen unit 100 refers to a screen window in which an input is made by a touch in a device to which a touch screen is applied. The touch screen unit 100 is a text input unit 110, the virtual keyboard is applied to the character is input, and the recommended word output unit 120 and the text input unit for outputting and displaying the recommended words and select the displayed recommended words; The character input at 110 is outputted and displayed, and when the recommended word is selected at the recommended word output unit 120, the selected recommended word is output and displayed at the input word output unit 130. As for the virtual keyboard of 110, a qwerty keyboard is preferably used.

추천단어 추출부(200)는 비교 엔-그램 결정부(210)와, 추천단어 선정부(220)로 구성된다.The recommended word extracting unit 200 includes a comparison en-gram determining unit 210 and a recommended word selecting unit 220.

비교 엔-그램 결정부(210)는 텍스트 입력부(110)에서 입력된 단어와 음절을 바탕으로 비교 대상이 되는 엔-그램 데이터를 엔-그램 DB 구축부(300)로부터 추출하여 선정한다.The comparison en-gram determination unit 210 extracts and selects the en-gram data to be compared from the en-gram DB construction unit 300 based on the words input from the text input unit 110 and the syllables.

그리고, 추천단어 선정부(220)는 비교 엔-그램 결정부(210)에서 선정된 엔-그램 데이터 중 출현 빈도수가 높은 순 즉, 빈번이 사용하는 단어 순으로 추천단어를 선정하고, 선정된 추천단어를 터치스크린부(100)의 추천단어 출력부(120)로 출력한다.In addition, the recommendation word selecting unit 220 selects the recommendation words in order of high frequency of appearance, that is, frequently used words, among the engram data selected by the comparison engram determining unit 210, and selects the selected recommendation. The word is output to the recommended word output unit 120 of the touch screen unit 100.

엔-그램 DB 구축부(300)는 전처리부(320)을 통해 대량의 웹문서(310)를 간소화시킨다. 전처리부(320)는 주요 관심 분야을 선정하는 특정 도메인 선정부(321)와, 초록문을 추출하는 초록문 추출부(322)와, 특수문자를 제거하는 필터링부(323)로 구성된다. The en-gram DB building unit 300 simplifies a large amount of web documents 310 through the preprocessor 320. The preprocessor 320 includes a specific domain selector 321 for selecting a main field of interest, an abstract sentence extractor 322 for extracting an abstract sentence, and a filtering unit 323 for removing special characters.

특정 도메인 선정부(321)는 우선적으로 신뢰성이 우수한 특정 도메인을 선정한다. 즉, 위키피디아 또는 뉴스와 같이 문장이 구조적으로 명확한 웹문서 및 주요관심분야의 웹문서를 선정하는 특정 도메인만을 선정한다. 웹문서(310)는 각양각색의 분야에 대하여 기술한 내용으로써 특정 분야로 초점을 맞출 필요성이 있다. 그 이유는 분야에 따라서 사용되는 단어의 종류가 상이하기 때문에 엔-그램 구축에 필요한 웹문서를 특정 도메인으로 선정을 한다. The specific domain selector 321 first selects a specific domain having excellent reliability. That is, select a specific domain that selects a web document in a sentence that is structurally clear, such as Wikipedia or news, and a web document in a major area of interest. The web document 310 is described in various fields, and there is a need to focus on a specific field. The reason for this is that since the types of words used are different according to the field, the web documents necessary for constructing an engram are selected as specific domains.

초록문 추출부(322)는 선정된 특정 도메인으로부터 초록문을 추출한다. 특히, 위키피디아 문서의 경우 초록문이 가장 핵심적인 부분이므로 초록문을 추출한다. The abstract sentence extractor 322 extracts the abstract sentence from the selected specific domain. Especially, in the case of Wikipedia documents, the abstract text is the most important part.

필터링부(323)는 추출된 초록문에서 따옴표나 쉼표 등 특수문자를 모두 제거한다. 웹문서(310)에는 의미를 지니지 않는 불필요한 단어와 특수문자들이 자주 등장하게 된다. 불필요한 단어와 특수문자들은 정확한 엔-그램 데이터를 구축하는 과정에서 오류를 범할 수 있는 역할을 수행할 수 있기 때문에 특수문자를 제거하여 엔-그램 데이터베이스부(330)를 구축한다. The filtering unit 323 removes all special characters such as quotation marks and commas from the extracted abstract text. In the web document 310, unnecessary words and special characters that do not have meaning frequently appear. Unnecessary words and special characters can play a role to make an error in the process of building the correct en-gram data, so the special character is removed to build the en-gram database unit 330.

전처리부를 거쳐 구축된 엔-그램 데이터베이스부(330)는 한 단어로 구성되고 각 단어의 빈도수를 계산하여 저장하는 유니그램(Unigram) 데이터부(331)와, 연속된 두 단어의 조합이고 각 단어의 빈도수를 계산하여 저장하는 바이그램(Bigram) 데이터부(332)와, 연속된 세 단어의 조합이고 각 단어의 빈도수를 계산하여 저장하는 트라이그램(Trigram) 데이터부(333)로 구분된다. The engram database unit 330 constructed through the preprocessing unit includes a unigram data unit 331 consisting of one word and calculating and storing a frequency of each word, and a combination of two consecutive words, A bigram data unit 332 that calculates and stores a frequency and a trigram data unit 333 that combines three consecutive words and calculates and stores a frequency of each word.

상기의 구성으로 이루어진 터치스크린 환경에서 단어 추천 및 문장 완성 시스템은 터치스크린부(100)의 텍스트 입력부(110)에서 단어의 첫 음절이 입력되었을 경우, 추천단어 추출부(200)의 비교 엔-그램 결정부(210)에서 입력된 첫 음절이 첫 번째, 두 번째, 세 번째 단어 중 어디에 속하는 지를 판단하고, 입력된 음절이 첫 번째 단어의 첫 음절(211)일 경우, 엔-그램 DB 구축부(300)의 유니그램(Unigram) 데이터(331)에서 입력된 음절로 시작하는 유니그램을 검색한다. 예를 들어, 영어 음절 T가 입력되었을 경우, 유니그램 데이터 중 T로 시작하는 단어를 검색하여 추출한다. 그리고 추천단어 선정부(220)는 T로 시작하는 단어 중 출현빈도 수가 높은 순으로 정렬하여 추천단어 출력부(120)에 출력한다. 이때, 추천단어가 존재하지 않을 경우, 다음 음절을 입력받아 보다 정확한 추천단어를 선정한다. 예를 들어, 영어음절 T 다음의 음절 h가 입력되었을 경우 비교 엔-그램 결정부(210)는 Th로 시작되는 유니그램을 검색하고, 추천단어 선정부(220)는 Th 시작하는 단어 중 출현빈도 수가 높은 순으로 정렬하여 추천단어 출력부(120)에 출력한다. 이어, 터치스크린부의 추천단어 출력부(129)에 출력된 추천단어를 바탕으로 사용자 의도와 일치하는 단어를 선택하면, 선택된 단어가 입력단어 출력부(130)에 출력되어 표시된다. In the touch screen environment having the above configuration, the word recommendation and sentence completion system is a comparison diagram of the recommendation word extracting unit 200 when the first syllable of a word is input from the text input unit 110 of the touch screen unit 100. The determination unit 210 determines whether the first syllable input belongs to the first, second, or third words, and if the input syllable is the first syllable 211 of the first word, the engram DB construction unit ( The unigram starting from the input syllable is searched in the unigram data 331 of 300. For example, when an English syllable T is input, the word beginning with T is searched for and extracted from unigram data. The recommendation word selecting unit 220 sorts in order of the frequency of appearance of the words starting with T and outputs the sorted word to the recommendation word output unit 120. In this case, if the recommendation word does not exist, the next syllable is input to select a more accurate recommendation word. For example, when a syllable h following an English syllable T is input, the comparison en-gram determination unit 210 searches for a unigram starting with Th, and the recommendation word selecting unit 220 shows an occurrence frequency among words starting with Th. The numbers are sorted in ascending order and output to the recommended word output unit 120. Subsequently, when a word matching the user's intention is selected based on the recommendation word output to the recommendation word output unit 129 of the touch screen unit, the selected word is output and displayed on the input word output unit 130.

그리고, 텍스트 입력부(110)를 통해 입력된 음절이 두 번째 단어의 첫 음절(212)일 경우, 추천단어 추출부의 비교 엔-그램 결정부(210)는 엔-그램 DB 구축부(300)의 바이그램(Bigram) 데이터(332)에서 기 입력된 첫 번째 단어와 현재 입력된 첫 음절로 시작하는 바이그램을 검색한다. 예를 들어, 기 입력된 첫 단어가 This이고, 현재 입력된 첫 음절이 i일 때 바이그램 데이터 중 첫 단어가 This이고 두 번째 단어가 i로 시작하는 바이그램을 검색한다. 그리고 추천단어 선정부(220)는 This i로 시작하는 바이그램 중 출현빈도 수가 높은 순으로 정렬하여 추천단어 출력부(120)에 출력한다. 즉, This information, This interest, This is, This investigation 등이 현재 과정의 예로 나타날 수 있다. 이때, This i로 시작하는 추천단어가 존재하지 않을 경우, This i 다음의 음절을 입력받아 보다 정확한 추천단어를 선정한다. 이어, 추천단어 출력부(120)에 출력된 This information, This interest, This is, This investigation 중 사용자가 의도와 일치하는 단어인 This information을 선택하면, 선택된 This information는 입력단어 출력부(130)에 표시된다. In addition, when the syllable input through the text input unit 110 is the first syllable 212 of the second word, the comparison en-gram determination unit 210 of the recommendation word extracting unit is a bigram of the en-gram DB construction unit 300. In the bigram data 332, the first word inputted and the first user's first input syllable are searched. For example, when the first word previously input is This and the first syllable currently input is i, the first word of the data is searched for this first and the second word starts with i. The recommendation word selecting unit 220 outputs the recommended word selecting unit 220 to the recommendation word output unit 120 by sorting in order of the frequency of appearance of the viagrams starting with This i. That is, this information, This interest, This is, This investigation, etc. may appear as an example of the current process. At this time, if there is no recommendation word starting with This i, the correct syllable word is selected by receiving the syllable after This i. Subsequently, when the user selects the word information corresponding to the intention among the information, This interest, This is, and This investigation, which is output to the recommendation word output unit 120, the selected This information is input to the input word output unit 130. Is displayed.

그리고, 텍스트 입력부(110)를 통해 입력된 음절이 세 번째 단어의 첫 음절(213)일 경우, 추천단어 추출부의 비교 엔-그램 결정부(210)는 엔-그램 DB 구축부(300)의 트라이그램(Trigram) 데이터(333)에서 기 입력된 첫 번째 단어와 두 번째 단어 그리고 현재 입력된 첫 음절로 시작하는 트라이그램을 검색한다. 예를 들어, 기 입력된 두 번째 단어가 This information이고 현재 입력된 첫 음절이 w일 때, 비교 엔-그램 결정부(210)는 트라이그램(Trigram) 데이터 중 두 어절이 This information이고 세 번째 단어가 w로 시작하는 트라이그램을 검색한다. 그리고 추천단어 선정부(220)는 This information w로 시작하는 트라이그램 중 출현빈도 수가 높은 순으로 정렬하여 추천단어 출력부(120)에 출력한다. 이때, This information w로 시작하는 추천단어가 존재하지 않을 경우, This information w 다음의 음절을 입력받아 보다 정확한 추천단어를 선정한다. 이어, 추천단어 출력부(120)에 출력된 추천단어를 바탕으로 사용자가 의도와 일치하는 단어를 선택하면, 선택된 단어가 입력단어 출력부(130)에 출력되어 표시된다. In addition, when the syllable input through the text input unit 110 is the first syllable 213 of the third word, the comparison en-gram determination unit 210 of the recommendation word extracting unit is a tri-section of the en-gram DB construction unit 300. The trigram data 333 searches for a trigram starting with the first word, the second word, and the first syllable currently input. For example, when the second word previously input is This information and the first syllable currently input is w, the comparison engram determining unit 210 determines that two words of the trigram data are This information and the third word. Searches for trigrams starting with w. In addition, the recommendation word selecting unit 220 sorts the order of appearance frequency among the trigrams starting with This information w and outputs the result to the recommended word output unit 120. At this time, if there is no recommendation word starting with This information w, a more accurate recommendation word is selected by receiving the syllable following this information w. Subsequently, when the user selects a word that matches the intention based on the recommendation word output to the recommendation word output unit 120, the selected word is output and displayed on the input word output unit 130.

이와 같은 본 발명의 시스템은 터치스크린 환경에서 텍스트 입력부(110)를 통해 문자를 입력할 시 입력된 단어와 음절을 바탕으로 단어가 추천되고, 추천된 단어 중 사용자의 의도와 일치하는 단어를 선택하면 문장이 완성됨에 따라 최소한의 키 입력으로 정확한 단어를 입력하고 문장을 쉽게 완성시킬 수 있게 된다. In the system of the present invention as described above, a word is recommended based on a word and a syllable input when a character is input through the text input unit 110 in a touch screen environment. As the sentence is completed, you can enter the correct word and complete the sentence easily with a minimum of keystrokes.

도 3은 본 발명에 따른 터치스크린 환경에서 단어 추천 및 문장 완성 방법을 도시한 순서도이고, 도 4는 본 발명에 따른 터치스크린 환경에서 단어 추천 및 문장 완성 방법에서 추천단어를 추출하는 과정을 나타내는 흐름도이다. 3 is a flowchart illustrating a word recommendation and sentence completion method in a touch screen environment according to the present invention, and FIG. 4 is a flowchart illustrating a process of extracting a recommended word in the word recommendation and sentence completion method in a touch screen environment according to the present invention. to be.

도 3, 도 4에 도시된 바와 같이 본 발명의 터치스크린 환경에서 단어 추천 및 문장 완성방법은 터치스크린 환경에서 가상키보드를 이용하여 문자를 입력하는 단계(S100); 상기 입력된 문자를 바탕으로 비교대상이 되는 엔-그램 데이터를 추출하는 단계(S200); 상기 추출된 엔-그램 데이터 중 출현 빈도수가 높은 순으로 추천단어를 선정하여 출력하는 단계(S300); 및 상기 출력된 추천단어를 바탕으로 사용자의 의도와 일치하는 단어 선택하여 문장을 완성시켜가는 단계(S400)를 포함한다.3 and 4, the word recommendation and sentence completion method in the touch screen environment of the present invention comprises the steps of inputting a character using a virtual keyboard in a touch screen environment (S100); Extracting en-gram data to be compared based on the input character (S200); Selecting and outputting a recommended word in order of appearance frequency among the extracted en-gram data (S300); And completing a sentence by selecting a word that matches the intention of the user based on the output of the recommended word (S400).

S100 단계는 터치스크린 환경의 가상 키보드를 이용하여 단어와 음절을 입력한다.In step S100, words and syllables are input using a virtual keyboard of a touch screen environment.

S200 단계에서 엔-그램 데이터는 대량의 웹문서 중 문장이 구조적으로 명확하고 웹문서 및 엔-그램 구축에 필요한 주요관심 분야의 웹문서의 특정 도메인만으로 선정하고, 초록문을 추출하며, 특수문자를 모두 제거하는 전처리과정을 통해 엔-그램 데이터를 구축한 엔-그램 데이터베이스로부터 추출이 이루어진다. In the step S200, the en-gram data is selected in a specific domain of the web document of the main field of interest that is structurally clear and the web document and the en-gram of a large amount of web documents, extracting the abstract sentence, and extracting special characters. The preprocessing process removes all the data and extracts them from the en-gram database that constructs the en-gram data.

그리고, 엔-그램 데이터베이스는 한 단어로 구성된 유니그램(Unigram) 데이터와, 연속된 두 단어의 조합인 바이그램(Bigram) 데이터와, 연속된 세 단어의 조합인 트라이그램(Trigram) 데이터로 구분하여 구축되며, 각 단어의 출현빈도 수가 계산되어 저장된다.And, the en-gram database is constructed by dividing into unigram data consisting of one word, bigram data, which is a combination of two consecutive words, and trigram data, which is a combination of three consecutive words. The number of occurrences of each word is calculated and stored.

도 4에 도시된 바와 같이 S200 단계의 엔-그램 데이터를 추출은 S100 단계에서 입력된 단어와 음절을 바탕으로 비교대상이 될 엔-그램 데이터의 종류를 선정하기 위하여 입력된 첫 음절이 첫 번째 단어(S211), 두 번째 단어(S221), 세 번째 단어(S231) 중 어디에 속하는지를 판단한다. As shown in FIG. 4, the extraction of en-gram data in step S200 is performed based on the word and syllable input in step S100, and the first syllable input first to select the type of en-gram data to be compared is the first word. In operation S211, the second word S221 and the third word S231 are determined.

판단결과, 입력된 음절이 첫 번째 단어의 첫 음절일 경우(S211), 엔-그램 데이터베이스의 유니그램(Unigram) 데이터에서 입력된 음절로 시작하는 유니그램을 검색한다(S212). 이때, 검색결과 입력된 음절이 존재하지 않은 경우 다음 음절을 입력받아 보다 정확한 추천단어를 선정한다. 예를 들어, 첫 번째 단어의 첫 음절이 T인 경우, 유니그램 데이터를 검색하여 T로 시작하는 단어를 검색하고, T로 시작하는 단어가 존재하는지를 판단한다(S241). 그리고, 검색결과 T로 시작하는 단어가 존재하지 않을 경우(S241), 다음 음절을 입력받는다(S242). 예를 들어, 첫 음절 T 다음의 음절인 h를 입력받아 유니그램 데이터에서 Th로 시작되는 단어를 검색한다. As a result of the determination, when the input syllable is the first syllable of the first word (S211), the unigram starting with the input syllable is searched from the unigram data of the en-gram database (S212). In this case, if the syllable input does not exist, the next syllable is input to select a more accurate recommendation word. For example, when the first syllable of the first word is T, the unigram data is searched for a word starting with T, and it is determined whether a word starting with T exists (S241). When the word starting with the search result T does not exist (S241), the next syllable is input (S242). For example, a word beginning with Th is searched for in a unigram data by receiving a syllable h after the first syllable T.

그리고, 판단결과, 입력된 음절이 두 번째 단어의 첫 음절일 경우(S221), 엔-그램 데이터베이스의 바이그램(Bigram) 데이터에서 기 입력된 첫 번째 단어와 현재 입력된 첫 음절로 시작하는 바이그램을 검색한다(S222). 예를 들어, 기 입력된 첫 단어가 This이고, 현재 입력된 첫 음절이 i일 때 바이그램 데이터 중 첫 단어가 This이고 두 번째 단어가 i로 시작하는 바이그램을 검색한다. 즉, This information, This interest, This is, This investigation 등이 현재 과정의 예로 나타날 수 있다. 이때, This i로 시작하는 추천단어가 존재하지 않을 경우, This i 다음의 음절을 입력받아 재검색을 실시한다(S242). If the input syllable is the first syllable of the second word (S221), the first word input from the bigram data of the engram database and the firstgram currently input with the first syllable are searched. (S222). For example, when the first word previously input is This and the first syllable currently input is i, the first word of the data is searched for this first and the second word starts with i. That is, this information, This interest, This is, This investigation, etc. may appear as examples of the current process. At this time, if there is no recommendation word starting with This i, the syllable is input after the This i and re-searched (S242).

또한, 판단결과, 입력된 음절이 세 번째 단어의 첫 음절일 경우(S231), 엔-그램 데이터베이스의 트라이그램(Trigram) 데이터에서 기 입력된 첫 번째 단어와 두 번째 단어 그리고 현재 입력된 첫 음절로 시작하는 트라이그램을 검색한다(S232). 예를 들어, 기 입력된 두 번째 단어가 This information이고 현재 입력된 첫 음절이 w일 때, 트라이그램(Trigram) 데이터 중 두 어절이 This information이고 세 번째 단어가 w로 시작하는 트라이그램을 검색한다. 이때, This information w로 시작하는 추천단어가 존재하지 않을 경우(S241), This information w 다음의 음절을 입력받아 재검색을 실시한다(S242).In addition, when the input syllable is the first syllable of the third word (S231), the first word, the second word, and the first syllable currently input from the trigram data of the engram database are determined. Search for the starting trigram (S232). For example, when the second word entered is This information and the first syllable currently entered is w, the two words in the trigram data are This information and the third word starts with w. . At this time, if there is no recommendation word starting with This information w (S241), the next syllable is input and then re-searched (S242).

S300 단계는 S200 단계의 검색결과 추천단어가 존재하는 경우, 검색된 추천 단어 중 출현 빈도수가 높은 순으로 선정하여 출력한다. In step S300, when the search result recommendation word in step S200 exists, the search word is selected and output in order of appearance frequency.

그리고, S400 단계는 출력된 추천단어를 바탕으로 사용자 의도와 일치하는 단어를 선택한다. 선택된 단어는 완성된 단어 및 문장으로 터치스크린 상에 출력되어 표시된다. In operation S400, the word corresponding to the user's intention is selected based on the output of the recommended word. The selected words are outputted and displayed on the touch screen as completed words and sentences.

이와 같이 본 발명의 터치스크린 환경에서 단어 추천 및 문장 완성 방법은 신뢰성 높은 웹문서에서 특정 도메인만을 선정하여 구축된 엔-그램 데이터를 바탕으로 사용자가 입력하는 단어의 순번과 해당 음절에 부합하는 엔-그램 데이터를 선정하고 출현빈도 수가 높은 단어 즉, 빈번히 사용하는 단어 위주로 터치스크린 화면상에 출력함으로써, 최소한의 키입력으로 정확한 단어를 입력하고 사용자의 의도와 일치하는 단어의 선택으로 문장을 쉽게 완성시킬 수 있게 된다. As described above, in the touch screen environment of the present invention, the word recommendation and sentence completion method selects a specific domain from a highly reliable web document based on en-gram data that is input by the user, and matches the sequence number of the word entered by the user and the corresponding syllable. By selecting the gram data and outputting the word on the touch screen mainly on words with high frequency of occurrence, that is, frequently used words, it is possible to input the correct word with minimum key input and easily complete the sentence by selecting the word that matches the user's intention. It becomes possible.

이상에서 본 발명의 바람직한 실시 예에 대하여 설명하였으나, 본 발명은 상술한 특정의 실시 예에 한정되지 아니한다. 즉, 본 발명이 속하는 기술분야에서 통상의 지식을 가지는 자라면 첨부된 특허청구범위의 사상 및 범주를 일탈함이 없이 본 발명에 대한 다수의 변경 및 수정이 가능하며, 그러한 모든 적절한 변경 및 수정은 균등물들로 본 발명의 범위에 속하는 것으로 간주 되어야 할 것이다.
Although the preferred embodiments of the present invention have been described, the present invention is not limited to the specific embodiments described above. It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the appended claims, And equivalents may be resorted to as falling within the scope of the invention.

100: 터치스크린부 110: 텍스트 입력부
120: 추천단어 출력부 130: 입력단어 출력부
200: 추천단어 추출부 210: 비교 엔-그램 결정부
220: 추천단어 선정부 300: 엔-그램 구축부
310: 웹문서 320: 전처리부
321: 특정 도메인 선정부 322: 초록문 추출부
323: 필터링부 331: 유니그램 데이터부
332: 바이그램 데이터부 333: 트라이그램 데이터부100: touch screen unit 110: text input unit
120: recommended word output unit 130: input word output unit
200: recommended word extracting unit 210: comparison en-gram determination unit
220: recommended word selection unit 300: N-gram building unit
310: web document 320: preprocessor
321: specific domain selector 322: abstract sentence extractor
323: Filtering unit 331: Unigram data unit
332: Baigram data portion 333: Trigram data portion

Claims

delete

Inputting text using a virtual keyboard in a touch screen environment;
Extracting en-gram data to be compared based on the input characters, wherein the en-gram data is extracted from an en-gram database constructed by extracting en-grams from a specific purified domain;
Selecting and outputting a recommended word in order of appearance frequency among the extracted en-gram data; And
Comprising the step of completing the sentence by selecting the word based on the output of the recommended word,
Extracting en-gram data to be compared based on the input characters, wherein the en-gram data is extracted from an en-gram database constructed by extracting the en-gram from a specific purified domain;
The en-gram database constructs en-gram data through a preprocessing process that simplifies a large amount of web documents, but the pre-processing process selects web documents of major areas of interest necessary for en-gram construction as a specific domain. To extract all of the special characters.
The en-gram database is constructed by dividing unigram data composed of one word, bigram data composed of two consecutive words, and trigram data composed of three consecutive words, and calculating the frequency of occurrence of each word. Save it,
The en-gram data extraction determines whether the first syllable of the input word is the first syllable of the first word, retrieves the unigram data when the first syllable is the first syllable, and the bigram data when the first syllable is the first syllable of the second word. , The first syllable of the third word, extracted by searching the trigram data,
If the recommended word is not determined by the first syllable of the input character, the word recommendation and sentence completion method in a touch screen environment characterized in that the second syllable is input and re-search.

delete