KR20160139484A

KR20160139484A - Method and apparatus for extracting words

Info

Publication number: KR20160139484A
Application number: KR1020150074288A
Authority: KR
Inventors: 정동훈
Original assignee: 삼성에스디에스 주식회사
Priority date: 2015-05-27
Filing date: 2015-05-27
Publication date: 2016-12-07
Also published as: KR102326105B1

Abstract

Provided are a method and an apparatus for extracting words. The method for extracting the words includes steps of: receiving a term including a plurality of characters; sequentially extracting n characters (n is an integer of 1 or more) from the terms according to a predetermined proceeding direction; searching a word dictionary for a first word including the n characters; upon success in the searching, sequentially extracting m characters (m is an integer of 1 or more) from the terms according to the proceeding direction with the exception of the n characters corresponding to the first word in the terms; and searching the word dictionary for a second word including of the extracted m characters.

Description

[0001] The present invention relates to a method and apparatus for extracting words,

본 발명은 워드 추출 방법 및 장치에 관한 것이다. 본 발명은 보다 상세하게는 일정한 규칙을 이용하여 복수의 문자를 포함하는 용어로부터 하나 이상의 워드를 효율적으로 추출하는 방법 및 장치에 관한 것이다.The present invention relates to a word extracting method and apparatus. More particularly, the present invention relates to a method and apparatus for efficiently extracting one or more words from a term containing a plurality of characters using certain rules.

용어(terminology)는 하나 이상의 워드(word)로 분리될 수 있다. 여기서 워드는 어떤 의미를 가진 문자들의 집합으로 정의할 수 있다. 워드는 그 자체로 하나의 의미를 표현하는 최소 단위(예컨대, 형태소)인 단일어에 해당될 수도 있고, 둘 이상의 최소 단위가 모여서 또 다른 의미를 표현하는 복합어에 해당될 수도 있다. 즉, 용어는 하나 이상의 워드가, 예를 들어 순차적으로 연결되어 형성된 것일 수 있다.Terminology can be separated into one or more words. Here, a word can be defined as a set of characters with some meaning. The word itself may correspond to a single word which is a minimum unit (for example, a morpheme) expressing a single meaning, or may correspond to a compound word in which two or more minimum units are gathered to express another meaning. That is, the terms may be formed by one or more words, for example, sequentially connected.

용어로부터 워드들을 추출하는 경우, 특히 컴퓨팅 시스템을 이용하여 수많은 용어들로부터 각각을 구성하는 워드들을 추출하는 경우, 단순히 용어를 구성하는 각각의 문자들의 조합에 대한 모든 경우의 수를 따지는 방법은 컴퓨팅 시스템의 자원을 낭비할 뿐 아니라 그 연산 시간도 현실적이지 않다. 따라서 용어로부터 워드들을 효율적으로 빠른 시간 내에 추출하기 위한 방안이 요구된다.In the case of extracting words from a term, especially when extracting words constituting each of a number of terms using a computing system, a method of determining the number of all cases for each combination of characters constituting a term, Not only does it waste resources, but its computation time is also not realistic. Therefore, there is a need for a method for extracting words from terms efficiently and quickly.

본 발명이 해결하고자 하는 기술적 과제는, 용어로부터 워드를 추출하기 위한 미리 설정된 규칙을 이용하여, 효과적이고 신속하게 용어로부터 하나 이상의 워드를 추출하는 워드 추출 방법을 제공하는 것이다.SUMMARY OF THE INVENTION It is an object of the present invention to provide a word extracting method for extracting one or more words from a term effectively and quickly using a predetermined rule for extracting words from the term.

본 발명이 해결하고자 하는 다른 기술적 과제는, 용어로부터 워드를 추출하기 위한 미리 설정된 규칙을 이용하여, 효과적이고 신속하게 용어로부터 하나 이상의 워드를 추출하는 워드 추출 장치를 제공하는 것이다.Another object of the present invention is to provide a word extracting apparatus for extracting one or more words from a term effectively and quickly by using a predetermined rule for extracting words from the term.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해 될 수 있을 것이다.The technical problems of the present invention are not limited to the above-mentioned technical problems, and other technical problems which are not mentioned can be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시예에 따른 워드 추출 방법은, 복수의 문자를 포함하는 용어를 입력 받는 단계; 미리 정해진 진행 방향에 따라, 용어에서 n 개(단, n은 1 이상의 정수)의 문자를 순차적으로 추출하는 단계; 추출된 n 개의 문자로 이루어진 제1 워드를 워드 사전에서 검색하는 단계; 검색에 성공한 경우, 용어에서 제1 워드에 대응하는 n 개의 문자를 제외하고, 진행 방향에 따라, 용어에서 m 개(단, m은 1 이상의 정수)의 문자를 순차적으로 추출하는 단계; 및 추출된 m 개의 문자로 이루어진 제2 워드를 워드 사전에서 검색하는 단계를 포함한다.According to another aspect of the present invention, there is provided a method of extracting words, the method comprising: inputting a term including a plurality of characters; Sequentially extracting n characters (where n is an integer of 1 or more) in the term according to a predetermined traveling direction; Searching in a word dictionary a first word consisting of n extracted characters; Sequentially extracting m (where m is an integer equal to or greater than 1) characters in the term according to the proceeding direction excluding the n characters corresponding to the first word in the term when the search is successful; And retrieving a second word made up of extracted m characters in a word dictionary.

본 발명의 몇몇의 실시예에서, 상기 제1 워드를 워드 사전에서 검색하는 단계는, 상기 검색에 성공한 상기 제1 워드를 스택에 삽입하는 단계를 포함할 수 있다.In some embodiments of the invention, the step of retrieving the first word in a word dictionary may include inserting the first word succeeding the retrieval into the stack.

본 발명의 몇몇의 실시예에서, 상기 제1 워드를 워드 사전에서 검색하는 단계는, 상기 검색에 실패한 경우, 상기 진행 방향에 따라, 상기 용어에서 1 개의 문자를 더 추출하여 상기 제1 워드에 추가하는 단계; 및 n + 1 개의 문자로 이루어진 상기 제1 워드를 상기 워드 사전에서 검색하는 단계를 포함할 수 있다.In some embodiments of the present invention, the step of retrieving the first word in a word dictionary may further include extracting one more character from the term according to the proceeding direction and adding it to the first word ; And retrieving the first word of n + 1 characters in the word dictionary.

본 발명의 몇몇의 실시예에서, 상기 방법은, 상기 검색에 실패한 경우, 상기 용어에서 상기 진행 방향을 기준으로 첫번째 문자를 제외하는 단계; 상기 첫번째 문자가 제외된 상기 용어에서, 상기 진행 방향에 따라 l 개(단, l은 1 이상 n - 1 이하의 정수)의 문자를 순차적으로 추출하는 단계; 및 추출된 l 개의 문자로 이루어진 제3 워드를 워드 사전에서 검색하는 단계를 더 포함할 수 있다.In some embodiments of the present invention, the method further comprises the steps of: if the search is unsuccessful, exclude a first character based on the direction of travel in the term; Sequentially extracting one character (l is an integer equal to or greater than 1 and equal to or less than n - 1) in accordance with the progress direction in the term in which the first character is excluded; And searching the word dictionary for a third word composed of the extracted l characters.

본 발명의 몇몇의 실시예에서, 상기 용어에서 상기 진행 방향을 기준으로 첫번째 문자를 제외하는 단계는, 제외된 상기 첫번째 문자를 스택에 삽입하는 단계를 포함할 수 있다.In some embodiments of the present invention, the step of excluding the first character based on the progress direction in the term may include inserting the first character that is excluded into the stack.

본 발명의 몇몇의 실시예에서, 제외된 상기 첫번째 문자를 스택에 삽입하는 단계는, 상기 스택의 탑(top)이 상기 검색에 성공한 워드를 포함하고 있는 경우, 상기 첫번째 문자를 상기 스택에 새로운 원소로서 삽입하는 단계를 포함할 수 있다.In some embodiments of the present invention, the step of inserting the excluded first character into the stack comprises: if the top of the stack contains a word that has succeeded in the retrieval, As shown in FIG.

본 발명의 몇몇의 실시예에서, 제외된 상기 첫번째 문자를 스택에 삽입하는 단계는, 상기 스택의 탑(top)이 상기 검색에 실패하여 이전에 제외된 문자를 포함하고 있는 경우, 상기 탑을 출력(pop)하고, 상기 탑에 저장되어 있던 문자와 상기 첫번째 문자를 함께 하나의 원소로서 상기 스택에 삽입하는 단계를 포함할 수 있다.In some embodiments of the present invention, the step of inserting the first character that is excluded into the stack comprises: if the top of the stack fails the search and contains previously excluded characters, and inserting the character stored in the tower and the first character together as an element into the stack.

본 발명의 몇몇의 실시예에서, 상기 방법은, 상기 스택에, 검색에 성공한 워드를 포함하는 새로운 원소가 삽입된 경우, 상기 첫번째 문자를 포함하는 원소에 저장된 하나 이상의 문자를 새로운 워드로서 출력하는 단계를 더 포함할 수 있다.In some embodiments of the present invention, the method further comprises the step of outputting, as a new word, one or more characters stored in an element containing the first character, when a new element is inserted into the stack, As shown in FIG.

본 발명의 몇몇의 실시예에서, 상기 첫번째 문자를 포함하는 원소에 저장된 하나 이상의 문자를 새로운 워드로서 출력하는 단계는, 상기 새로운 워드를 상기 워드 사전에 저장하는 단계를 더 포함할 수 있다.In some embodiments of the present invention, outputting one or more characters stored in an element containing the first character as a new word may further comprise storing the new word in the word dictionary.

본 발명의 몇몇의 실시예에서, 상기 제2 워드를 워드 사전에서 검색하는 단계는, 상기 검색에 실패한 경우, 상기 진행 방향에 따라, 상기 용어에서 1 개의 문자를 더 추출하여 상기 제2 워드에 추가하는 단계; 및 추출된 상기 m + 1 개의 문자로 이루어진 제2 워드를 상기 워드 사전에서 검색하는 단계를 포함할 수 있다.In some embodiments of the present invention, the step of retrieving the second word in the word dictionary may further include extracting one character from the term according to the proceeding direction and adding it to the second word ; And retrieving, in the word dictionary, a second word consisting of the extracted m + 1 characters.

본 발명의 몇몇의 실시예에서, 상기 용어는 상기 제1 워드 및 상기 제2 워드를 구분짓는 구분 문자(separator)를 미포함할 수 있다.In some embodiments of the invention, the term may include a delimiter separating the first word and the second word.

상기 기술적 과제를 달성하기 위한 본 발명의 다른 실시예에 따른 워드 추출 방법은, 복수의 문자를 포함하는 용어를 입력 받는 단계; 미리 정해진 진행 방향에 따라, 용어에서 n 개(단, n은 1 이상의 정수)의 문자를 순차적으로 제외하는 단계; n 개의 문자가 제외된 복수의 문자로 이루어진 제1 워드를 워드 사전에서 검색하는 단계; 진행 방향에 따라, n 개의 문자에서 m 개(단, m은 1 이상 n 이하인 정수)의 문자를 순차적으로 제외하는 단계; 및 m 개의 문자가 제외된 n 개의 문자로 이루어진 제2 워드를 워드 사전에서 검색하는 단계를 포함한다.According to another aspect of the present invention, there is provided a word extracting method comprising: inputting a word including a plurality of characters; Sequentially excluding n characters (where n is an integer of 1 or more) in the term according to a predetermined traveling direction; retrieving in a word dictionary a first word consisting of a plurality of characters excluding n characters; Sequentially excluding m characters (where m is an integer equal to or greater than 1 and equal to or less than n) of n characters in accordance with the progress direction; And searching the word dictionary for a second word consisting of n characters excluding m characters.

본 발명의 몇몇의 실시예에서, 상기 방법은, 상기 검색에 실패한 경우, 상기 용어에서 상기 진행 방향을 기준으로 마지막 문자를 제외하는 단계; 상기 마지막 문자가 제외된 상기 용어에서, 상기 진행 방향에 따라 l 개(단, l은 1 이상 n - 1 이하의 정수)의 문자를 순차적으로 제외하는 단계; 및 상기 l 개의 문자가 제외된 n - l - 1 개의 문자로 이루어진 제3 워드를 워드 사전에서 검색하는 단계를 더 포함할 수 있다.In some embodiments of the present invention, the method further comprises: if the search fails, excluding the last character in the term based on the progress direction; Sequentially excluding 1 letter (l is an integer equal to or greater than 1 and equal to or less than n - 1) characters according to the progress direction in the term in which the last character is excluded; And searching the word dictionary for a third word consisting of n - l - 1 characters excluding the l characters.

본 발명의 몇몇의 실시예에서, 상기 용어에서 상기 진행 방향을 기준으로 마지막 문자를 제외하는 단계는, 제외된 상기 마지막 문자를 스택에 삽입하는 단계를 포함할 수 있다.In some embodiments of the invention, the step of excluding the last character based on the progress direction in the term may include inserting the last character that is excluded into the stack.

상기 기술적 과제를 달성하기 위한 본 발명의 또 다른 실시예에 따른 워드 추출 방법은, 복수의 문자를 포함하는 용어를 입력 받는 단계; 용어가 구분 문자(separator)를 포함하는지 판단하는 단계; 용어가 구분 문자를 포함하지 않는 경우, 미리 정해진 제1 진행 방향에 따라, 용어에서 제1 워드 및 제2 워드를 순차적으로 추출하는 단계; 제1 진행 방향과 반대로 미리 정해진 제2 진행 방향에 따라, 용어에서 제3 워드 및 제4 워드를 순차적으로 추출하는 단계; 및 제1 워드와 제4 워드가 동일한지 여부 및 제2 워드와 제3 워드가 동일한지 여부를 판단하는 단계를 포함한다.According to another aspect of the present invention, there is provided a method for extracting a word, comprising: receiving a term including a plurality of characters; Determining whether the term includes a separator; Sequentially extracting a first word and a second word from the term according to a first predetermined traveling direction when the term does not include a delimiter character; Sequentially extracting a third word and a fourth word from the term according to a second predetermined traveling direction opposite to the first traveling direction; And determining whether the first word and the fourth word are the same and whether the second word and the third word are the same.

본 발명의 몇몇의 실시예에서, 상기 용어가 구분 문자를 포함하는지 판단하는 단계는, 상기 용어가 구분 문자를 포함하면, 상기 워드 사전을 검색하지 않고 상기 용어를 상기 구분 문자를 이용하여 복수의 워드로 추출하는 단계를 포함할 수 있다.In some embodiments of the present invention, the step of determining whether the term includes a delimiter character may comprise: if the term includes a delimiter character, translating the word into a plurality of words As shown in FIG.

본 발명의 몇몇의 실시예에서, 상기 미리 정해진 제1 진행 방향에 따라, 상기 용어에서 제1 워드 및 제2 워드를 순차적으로 추출하는 단계는, 상기 제1 진행 방향에 따라, 상기 용어에서 n 개(단, n은 1 이상의 정수)의 문자를 순차적으로 추출하고, 추출된 상기 n 개의 문자로 이루어진 제1 워드를 워드 사전에서 검색하는 단계; 및 상기 용어에서 상기 제1 워드에 대응하는 상기 n 개의 문자를 제외하고, 상기 제1 진행 방향에 따라, 상기 용어에서 m 개(단, m은 1 이상의 정수)의 문자를 순차적으로 추출하고, 추출된 상기 m 개의 문자로 이루어진 제2 워드를 상기 워드 사전에서 검색하는 단계를 포함할 수 있다.In some embodiments of the present invention, the step of sequentially extracting the first word and the second word in the term according to the predetermined first direction of movement comprises, in accordance with the first direction of travel, (Where n is an integer equal to or greater than 1), and retrieving a first word of the extracted n characters in a word dictionary; And m words (where m is an integer of 1 or more) are sequentially extracted from the term according to the first traveling direction, excluding the n characters corresponding to the first word in the term, And searching the word dictionary for a second word made up of the m characters.

본 발명의 몇몇의 실시예에서, 상기 미리 정해진 제1 진행 방향에 따라, 상기 용어에서 제1 워드 및 제2 워드를 순차적으로 추출하는 단계는, 상기 제1 진행 방향에 따라, 상기 용어에서 n 개(단, n은 1 이상의 정수)의 문자를 순차적으로 제외하고, 상기 n 개의 문자가 제외된 상기 복수의 문자로 이루어진 제1 워드를 워드 사전에서 검색하는 단계; 및 상기 제1 진행 방향에 따라, 상기 n 개의 문자에서 m 개(단, m은 1 이상 n 이하인 정수)의 문자를 순차적으로 제외하고, 상기 m 개의 문자가 제외된 상기 n 개의 문자로 이루어진 제2 워드를 상기 워드 사전에서 검색하는 단계를 포함할 수 있다.In some embodiments of the present invention, the step of sequentially extracting the first word and the second word in the term according to the predetermined first direction of movement comprises, in accordance with the first direction of travel, (Where n is an integer equal to or greater than 1), and searching for a first word in the word dictionary including the plurality of characters excluding the n characters; And sequentially extracting m characters (where m is an integer equal to or greater than 1 and equal to or less than n) from the n characters in accordance with the first moving direction, And retrieving the word from the word dictionary.

본 발명의 몇몇의 실시예에서, 상기 제1 워드 및 제2 워드의 추출 순서와 상기 제3 워드 및 상기 제4 워드의 추출 순서는 서로 반대일 수 있다.In some embodiments of the present invention, the extraction order of the first word and the second word and the extraction order of the third word and the fourth word may be opposite to each other.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시예에 따른 워드 추출 장치는, 복수의 문자를 포함하는 용어를 입력 받는 용어 입력 모듈; 용어로부터 하나 이상의 문자를 추출하는 진행 방향에 대한 규칙을 입력 받는 규칙 입력 모듈; 규칙에서 미리 정해진 진행 방향에 따라, 용어에서 n 개(단, n은 1 이상의 정수)의 문자를 순차적으로 추출하는 워드 추출 모듈; 및 추출된 n 개의 문자로 이루어진 제1 워드를 워드 사전에서 검색하는 사전 검색 모듈을 포함하고, 워드 추출 모듈은, 용어에서 검색에 성공한 제1 워드에 대응하는 n 개의 문자를 제외하고, 진행 방향에 따라, 용어에서 m 개(단, m은 1 이상의 정수)의 문자를 순차적으로 추출하고, 사전 검색 모듈은 추출된 m 개의 문자로 이루어진 제2 워드를 워드 사전에서 검색한다.According to an aspect of the present invention, there is provided a word extracting apparatus comprising: a term input module receiving a term including a plurality of characters; A rule input module for receiving a rule for a progress direction for extracting one or more characters from a term; A word extracting module for sequentially extracting n characters (where n is an integer of 1 or more) in the term in accordance with a predetermined traveling direction in the rule; And a dictionary retrieval module for retrieving a first word of the extracted n characters from the word dictionary, wherein the word extracting module extracts n characters corresponding to the first word which succeeds in the retrieval in the term, Accordingly, in the terminology, m (where m is an integer of 1 or more) characters are sequentially extracted, and the dictionary search module searches the word dictionary for the second word composed of the extracted m characters.

본 발명의 몇몇의 실시예에서, 상기 규칙은 상기 용어를 단일어 기준으로 추출할 것인지 또는 복합어 기준으로 추출할 것인지 여부에 대한 규칙을 더 포함할 수 있다.In some embodiments of the present invention, the rule may further include rules for whether to extract the term on a monolingual basis or on a compound basis.

본 발명의 몇몇의 실시예에서, 상기 규칙은 상기 용어의 추출 단위에 대한 규칙을 더 포함할 수 있다.In some embodiments of the present invention, the rule may further include rules for an extraction unit of the term.

본 발명의 몇몇의 실시예에서, 상기 용어의 추출 단위는 바이트 단위로 정해질 수 있다.In some embodiments of the invention, the extraction unit of the term may be defined in bytes.

상기 기술적 과제를 달성하기 위한 본 발명의 다른 실시예에 따른 워드 추출 장치는, 하나 이상의 프로세서; 네트워크 인터페이스; 메모리; 및 메모리에 로딩 되어 프로세서에 의하여 수행 되는 컴퓨터 프로그램의 실행 파일이 기록된 스토리지 장치를 포함하되, 컴퓨터 프로그램은, 복수의 문자를 포함하는 용어를 입력 받는 일련의 인스트럭션; 미리 정해진 진행 방향에 따라, 용어에서 n 개(단, n은 1 이상의 정수)의 문자를 순차적으로 추출하는 일련의 인스트럭션; 추출된 n 개의 문자로 이루어진 제1 워드를 워드 사전에서 검색하는 일련의 인스트럭션; 용어에서 검색에 성공한 제1 워드에 대응하는 n 개의 문자를 제외하고, 진행 방향에 따라, 용어에서 m 개(단, m은 1 이상의 정수)의 문자를 순차적으로 추출하는 일련의 인스트럭션; 및 추출된 m 개의 문자로 이루어진 제2 워드를 워드 사전에서 검색하는 일련의 인스트럭션을 포함한다.According to another aspect of the present invention, there is provided a word extracting apparatus comprising: at least one processor; Network interface; Memory; And a storage device in which an executable file of a computer program that is loaded into the memory and executed by the processor is recorded, the computer program comprising: a series of instructions for inputting a term including a plurality of characters; A series of instructions for sequentially extracting n characters (where n is an integer of 1 or more) of characters in terms of a predetermined traveling direction; A series of instructions for searching a word dictionary for a first word of extracted n characters; A series of instructions for sequentially extracting m (where m is an integer equal to or greater than 1) characters in the term according to the traveling direction, excluding n characters corresponding to the first word that have succeeded in searching in terms; And a series of instructions for searching a word dictionary for a second word of extracted m characters.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.The details of other embodiments are included in the detailed description and drawings.

도 1은 본 발명의 일 실시예에 따른 워드 추출 장치를 설명하기 위한 개념도이다.
도 2는 본 발명의 일 실시예에 따른 워드 추출 장치를 설명하기 위한 개략도이다.
도 3은 본 발명의 일 실시예에 따른 워드 추출 방법 및 장치와 관련하여 워드를 추출하기 위한 규칙을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.
도 6은 본 발명의 또 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.
도 7은 본 발명의 또 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.
도 8은 본 발명의 또 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.
도 9는 본 발명의 또 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.
도 10은 본 발명의 몇몇의 실시예에 따른 워드 추출 장치를 설명하기 위한 개략도이다.
도 11은 본 발명의 일 실시예에 따른 워드 추출 방법을 설명하기 위한 순서도이다.
도 12는 본 발명의 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 순서도이다.
도 13은 본 발명의 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 순서도이다.1 is a conceptual diagram for explaining a word extracting apparatus according to an embodiment of the present invention.
2 is a schematic diagram for explaining a word extracting apparatus according to an embodiment of the present invention.
FIG. 3 is a diagram for explaining rules for extracting a word in association with a word extracting method and apparatus according to an embodiment of the present invention.
4 is a diagram for explaining a word extracting method according to an embodiment of the present invention.
5 is a diagram for explaining a word extracting method according to another embodiment of the present invention.
6 is a diagram for explaining a word extracting method according to another embodiment of the present invention.
7 is a diagram for explaining a word extracting method according to another embodiment of the present invention.
8 is a diagram for explaining a word extracting method according to another embodiment of the present invention.
9 is a diagram for explaining a word extracting method according to another embodiment of the present invention.
10 is a schematic diagram for explaining a word extracting apparatus according to some embodiments of the present invention.
11 is a flowchart for explaining a word extracting method according to an embodiment of the present invention.
12 is a flowchart for explaining a word extracting method according to another embodiment of the present invention.
13 is a flowchart for explaining a word extracting method according to another embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless defined otherwise, all terms (including technical and scientific terms) used herein may be used in a sense commonly understood by one of ordinary skill in the art to which this invention belongs. Also, commonly used predefined terms are not ideally or excessively interpreted unless explicitly defined otherwise. The terminology used herein is for the purpose of illustrating embodiments and is not intended to be limiting of the present invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification.

도 1은 본 발명의 일 실시예에 따른 워드 추출 장치를 설명하기 위한 개념도이다.1 is a conceptual diagram for explaining a word extracting apparatus according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 워드 추출 장치(10)는 용어(30) 및 규칙(40)을 입력 받고, 워드 사전(20) 및 규칙(40)을 이용하여 용어(30)로부터 하나 이상의 워드(50)를 추출한다. 예를 들어, 워드 추출 장치(10)는 용어(30)로부터 5 개의 워드(W1 내지 W5)를 추출할 수 있다.Referring to FIG. 1, a word extracting apparatus 10 according to an embodiment of the present invention receives a term 30 and a rule 40, and uses terms 30 and 30 using a word dictionary 20 and a rule 40, (50). &Lt; / RTI > For example, the word extracting apparatus 10 may extract five words (W1 to W5) from the term 30.

워드 추출 장치(10)는 용어(30)를 포함하는 텍스트 또는 스크립트를 입력 받아 워드 추출 연산을 수행할 수 있는 컴퓨팅 시스템으로서, 예를 들어, 하나 이상의 개인용 컴퓨터, 하나 이상의 서버 컴퓨터, 하나 이상의 포터블 컴퓨터 또는 이들의 조합일 수 있으나, 이에 한정되는 것은 아니다. 여기서 용어(30)는 복수의 문자들을 포함하고 있고, 용어(30)에 포함된 복수의 문자들 중 일부는 의미를 갖는 워드를 이룰 수 있다. The word extracting apparatus 10 is a computing system capable of receiving a text or a script including the term 30 and performing a word extracting operation. The word extracting apparatus 10 may include, for example, one or more personal computers, one or more server computers, Or a combination thereof, but is not limited thereto. Here, the term 30 includes a plurality of characters, and some of the plurality of characters included in the term 30 may form a word having meaning.

예를 들어, 용어(30)가 "processaccountid"인 경우, 워드 추출 장치(10)는 용어 "processaccountid"를 입력 받고, 예컨대, "Process", "Account" 및 "Id"라는 3 개의 워드를 출력할 수 있다. 물론, 본 발명의 구체적인 실시 조건에 따라, 워드 추출 장치(10)는 용어 "processaccountid"를 입력 받고, 예컨대, "Process" 및 "Accountid"라는 2 개의 워드를 출력할 수도 있다. 본 명세서에서는, 설명의 편의를 위해, 알파벳으로 표시된 용어에 대해서는 모두 소문자를 사용하고, 알파벳으로 표시된 워드에 대해서는 첫 문자에 대해서만 대문자를 사용하고 나머지 문자에 대해서는 소문자를 사용하기로 한다.For example, when the term 30 is "processaccountid ", the word extracting apparatus 10 receives the word " processaccountid" and outputs three words, for example, "Process ", & . Of course, according to the specific embodiment of the present invention, the word extracting apparatus 10 receives the term " processaccountid "and outputs two words, for example," Process " In this specification, for convenience of explanation, all lower-case letters are used for alphabetic terms, and upper-case letters are used only for the first letters and lower-case letters are used for the other letters.

워드 사전(20)은 복수의 워드들을 미리 저장하고 있는 스토리지 또는 데이터베이스일 수 있다. 예를 들어, 워드 사전(20)은 "Process", "Accountid", "Id" 등의 워드들을 미리 저장하고 있을 수 있다. 이와 같은 워드 사전(20)은 워드 추출 장치(10)가 용어(30)로부터 복수의 문자들을 추출하고, 추출된 문자들을 워드에 매칭(match)시키는 연산을 수행하기 위해 사용될 수 있다. 예를 들어, 워드 추출 장치(10)는 용어 "processaccountid"로부터 추출한 "proc"라는 복수의 문자들이 워드 사전(20)에 워드로서 저장되어 있는지 검색할 수 있다. 만일 워드 추출 장치(10)가 검색한 복수의 문자들에 매칭되는 워드가 존재하는 경우, 해당 워드는 용어(30)로부터 추출된 워드(50)로서 사용자에게 출력될 수 있다. 만일 워드 추출 장치(10)가 검색한 복수의 문자들에 매칭되는 워드를 워드 사전에서 발견하지 못하는 경우, 워드 추출 장치(10)는 그 방법을 달리 하여 워드 추출 작업을 반복하여 수행할 수 있다. 워드 추출 장치(10)가 워드를 추출하기 위해 수행하는 연산들은 규칙(40)에 따라 수행되며, 규칙(40)에 대한 상세한 설명은 도 3과 관련하여 후술하도록 한다.The word dictionary 20 may be a storage or a database storing a plurality of words in advance. For example, the word dictionary 20 may store words such as "Process "," Accountid ", "Id" Such a word dictionary 20 can be used for the word extracting apparatus 10 to extract an input of a plurality of characters from the term 30 and to perform an operation to match the extracted characters to a word. For example, the word extracting apparatus 10 can search whether a plurality of characters "proc" extracted from the term "processaccountid" are stored as a word in the word dictionary 20. If there is a word that matches a plurality of characters retrieved by the word extracting apparatus 10, the word may be output to the user as a word 50 extracted from the term 30. [ If the word extracting apparatus 10 can not find a word matching the plurality of characters searched by the word extracting apparatus 10 in the word dictionary, the word extracting apparatus 10 may repeat the word extracting operation in a different manner. The operations that the word extracting apparatus 10 performs to extract words are performed according to a rule 40, and a detailed description of the rule 40 will be described later with reference to Fig.

본 발명의 몇몇의 실시예에서, 워드 추출 장치(10)와 워드 사전(20)은 직접 연결되거나 네트워크를 통해 연결되어, 서로 다양한 데이터를 서로 받을 수 있다. 본 발명의 몇몇의 실시예에서, 네트워크는 LAN(Local Area Network), WAN(Wide Area Network) 등을 비롯한 유선 네트워크 및 WiFi 네트워크, 셀룰러 네트워크, 블루투스(Bluetooth) 등을 비롯한 무선 네트워크를 포함할 수 있으나, 이에 한정되는 것은 아니다.In some embodiments of the present invention, the word extracting device 10 and the word dictionary 20 may be directly connected or connected via a network to receive various data from each other. In some embodiments of the invention, the network may include a wired network including a Local Area Network (LAN), a Wide Area Network (WAN), etc. and a wireless network including a WiFi network, a cellular network, Bluetooth, , But is not limited thereto.

도 2는 본 발명의 일 실시예에 따른 워드 추출 장치를 설명하기 위한 개략도이다.2 is a schematic diagram for explaining a word extracting apparatus according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 워드 추출 장치(100)는 용어 입력 모듈(110), 규칙 입력 모듈(120), 워드 추출 모듈(130), 사전 검색 모듈(140), 메모리 모듈(150) 및 워드 출력 모듈(160)을 포함할 수 있다.Referring to FIG. 2, the word extracting apparatus 100 according to an embodiment of the present invention includes a term input module 110, a rule input module 120, a word extraction module 130, a dictionary search module 140, A module 150 and a word output module 160. [

용어 입력 모듈(110)은 복수의 문자를 포함하는 용어(30)를 입력 받을 수 있다. 본 발명의 몇몇의 실시예에서, 용어 입력 모듈(110)은 용어(30)를 텍스트 파일, 워드 파일 등을 비롯한 문서 파일의 형태로 입력 받을 수도 있고, 이들로부터 암호화되거나 압축된 형태로 입력 받을 수도 있다. 용어 입력 모듈(110)은 입력 받은 용어(30)를 워드 추출 모듈(130)에 제공한다.The term input module 110 may receive a term 30 including a plurality of characters. In some embodiments of the present invention, the term input module 110 may receive the term 30 in the form of a document file, including a text file, a word file, etc., have. The term input module 110 provides the input term 30 to the word extraction module 130.

규칙 입력 모듈(120)은 용어(30)로부터 워드들을 추출하기 위한 미리 정해진 규칙(40)을 입력 받을 수 있다. 예를 들어, 규칙 입력 모듈(120)은, 용어로부터 하나 이상의 문자를 추출하는 진행 방향을 정한 규칙을 포함하는 규칙(40)을 입력 받고, 이것을 워드 추출 모듈(130)에 제공하여 워드 추출 모듈(130)로 하여금 워드 추출을 수행할 때 규칙(40)을 참조할 수 있도록 한다.The rule input module 120 may receive a predetermined rule 40 for extracting words from the term 30. For example, the rule input module 120 receives a rule 40 including a rule for determining a direction in which to extract one or more characters from a term, provides the rule 40 to the word extraction module 130, 130 to refer to rules 40 when performing word extraction.

워드 추출 모듈(130)은 용어 입력 모듈(110)로부터 수신한 용어(30)로부터 규칙 입력 모듈(120)로부터 수신한 규칙(40)을 이용하여 복수의 문자들을 추출한다. 워드 추출 모듈(130)에 의해 추출된 복수의 문자들은 사전 검색 모듈(140)에 전달되고, 사전 검색 모듈(140)은 워드 추출 모듈(130)로부터 수신한 복수의 문자들이 워드 사전(20)에 존재하는지 검색한다. 만일 워드 추출 모듈(130)로부터 수신한 복수의 문자들이 워드 사전(20)에서 발견되었다면, 사전 검색 모듈(140)은 그 결과를 워드 추출 모듈(130)에 알릴 수 있다. 본 발명의 몇몇의 실시예에서, 복수의 문자들은 메모리 모듈(150)에 미리 저장된 워드들과 비교될 수도 있다. 즉, 용어(30)로부터 추출된 복수의 문자들이 워드에 해당되는지 여부에 대한 판단은 워드 사전(20)과 같은 데이터베이스를 이용하여 이루어 질 수도 있고, 메모리 모듈(150)에 휘발성 또는 비휘발성으로 저장된 데이터를 이용하여 이루어질 수도 있다.The word extraction module 130 extracts a plurality of characters from the term 30 received from the term input module 110 using the rule 40 received from the rule input module 120. [ The plurality of characters extracted by the word extraction module 130 are transmitted to the dictionary search module 140 and the dictionary search module 140 extracts a plurality of characters received from the word extraction module 130 from the word dictionary 20 Searches for existence. If a plurality of characters received from the word extraction module 130 are found in the word dictionary 20, the dictionary search module 140 may notify the word extraction module 130 of the result. In some embodiments of the present invention, the plurality of characters may be compared to words previously stored in the memory module 150. That is, the determination as to whether a plurality of characters extracted from the term 30 correspond to a word may be made using a database such as the word dictionary 20 and stored in the memory module 150 in a volatile or nonvolatile manner Data may be used.

메모리 모듈(150)은 특히 워드 추출 모듈(130)이 용어(30)로부터 워드를 추출하기 위한 중간 결과 데이터 또는 최종 결과 데이터를 일시적 또는 영속적으로 저장하기 위해 사용될 수 있다. 본 발명의 몇몇의 실시예에서, 메모리 모듈(150)은 상기 중간 결과 데이터 또는 최종 결과 데이터를 입력, 삭제, 유지하는 데이터 구조, 예컨대 스택(stack)을 포함할 수 있다.Memory module 150 may be used to temporarily or permanently store intermediate result data or final result data, particularly for word extract module 130 to extract words from term 30. In some embodiments of the invention, the memory module 150 may include a data structure, such as a stack, for inputting, deleting and maintaining the intermediate result data or the final result data.

워드 출력 모듈(160)은 워드 추출 모듈(130)에 의해 용어(30)로부터 추출된 워드(52, 54)를 출력한다. 예를 들어, 워드 출력 모듈(160)은 워드(52, 54)를 텍스트 데이터 파일 또는 문서 데이터 파일을 형태로 출력할 수도 있고, 디스플레이, 프린터 등의 입출력 디바이스를 이용하여 출력할 수도 있다. 본 발명의 몇몇의 실시예에서, 워드 출력 모듈(160)은 워드 사전(20)에 미리 저장되어 있던 워드와 매칭된 워드(52)를 출력할 수도 있고, 워드 사전(20)에 저장되어 있지 않았지만 워드 추출 모듈(130)이 용어(30)로부터 워드를 찾는 과정에서 새로운 워드로 판단된 워드(54)를 출력할 수도 있다. 본 발명의 몇몇의 실시예에서, 워드(54)는 워드 사전(20)에 저장되어, 추후 워드 추출 모듈(130)에 의해 수행되는 워드 추출 작업에서 사용될 수 있다.The word output module 160 outputs the words 52 and 54 extracted from the term 30 by the word extraction module 130. For example, the word output module 160 may output the words 52 and 54 in the form of a text data file or a document data file, or may output the data using an input / output device such as a display or a printer. In some embodiments of the invention, the word output module 160 may output a word 52 that is matched with a word that was previously stored in the word dictionary 20 and may not be stored in the word dictionary 20 The word extraction module 130 may output the word 54 determined as a new word in the process of searching for the word from the term 30. [ In some embodiments of the invention, the word 54 may be stored in the word dictionary 20 and used in a word extraction operation performed by the word extraction module 130 in the future.

도 3은 본 발명의 일 실시예에 따른 워드 추출 방법 및 장치와 관련하여 워드를 추출하기 위한 규칙을 설명하기 위한 도면이다.FIG. 3 is a diagram for explaining rules for extracting a word in association with a word extracting method and apparatus according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 워드 추출 방법 및 장치와 관련하여 워드를 추출하기 위한 규칙(40)은 언어 유형, 인식 형태, 인식 방향 등에 관한 정보를 포함할 수 있다.Referring to FIG. 3, a rule 40 for extracting a word in association with a method and apparatus for extracting a word according to an embodiment of the present invention may include information on a language type, a recognition type, a recognition direction, and the like.

언어 유형은, 예를 들어, 용어(30)의 언어의 유형에 따라 워드 추출 모듈(130)이 용어(30)에서 복수의 문자를 추출할 때 사용되는 단위를 규정하기 위한 것이다. 예를 들어, 만일 언어 유형이 "알파벳"인 경우, 워드 추출 모듈(130)은 용어(30)를 레터(letter) 단위로 추출할 수 있다. 이와 다르게, 만일 언어 유형이 "한글"인 경우, 워드 추출 모듈(130)은 용어(30)를 음절 단위로 추출할 수도 있다. 특히, 이러한 언어 유형에 따라, 워드 추출 모듈(130)이 용어(30)에서 복수의 문자를 추출할 때 사용되는 단위의 사이즈, 예컨대 바이트(byte) 단위의 사이즈가 정해질 수 있다.The language type is for defining a unit used when the word extraction module 130 extracts a plurality of characters from the term 30, for example, according to the type of the language of the term 30. [ For example, if the language type is "alphabet ", the word extraction module 130 may extract the term 30 in letter units. Alternatively, if the language type is "Hangul, " the word extraction module 130 may extract the term 30 in syllable units. In particular, according to this language type, the size of the unit used when the word extracting module 130 extracts a plurality of characters from the term 30, for example, the size in bytes, can be determined.

인식 형태는 용어(30)로부터 추출되는 워드로 단일어만을 고려할 것인지 복합어까지 고려할 것인지, 즉, 용어(30)를 단일어 기준으로 추출할 것인지 또는 복합어 기준으로 추출할 것인지를 규정하기 위한 것이다. 도 4 및 도 9와 관련하여 후술되는 다양한 실시예에서 상세하게 설명되는 것과 같이, 이 규칙에 의해 워드 추출 방법의 구체적인 과정이 달라질 수 있다.The recognition form is to specify whether to consider only single words or compound words in the word extracted from the term (30), that is, whether to extract the term (30) on a single word basis or on a compound word basis. As described in detail in various embodiments described later with reference to FIGS. 4 and 9, the specific procedure of the word extracting method may be changed by this rule.

인식 방향은 우측 우선 및 좌측 우선으로 정해질 수 있다. 여기서 인식 방향이란 용어(30)로부터 추출되는 워드의 위치 순서를 의미한다. 예를 들어, "processaccountid"로부터 예컨대, "Process", "Account" 및 "Id"라는 3 개의 워드를 추출하는 경우, 우측 우선인 경우 "Id", "Account" 및 "Process"의 순서로 추출될 수 있다. 이와 다르게, 좌측 우선인 경우 "Process", "Account" 및 "Id"의 순서로 추출될 수 있다.The recognition direction can be set to right priority and left priority. Here, the recognition direction means the order of the positions of the words extracted from the term (30). For example, in the case of extracting the three words "Process", "Account" and "Id" from "processaccountid", it is extracted in the order of "Id", "Account" and "Process" . Alternatively, the leftmost priority can be extracted in the order of "Process", "Account", and "Id".

본 발명의 몇몇의 실시예에서, 상술한 규칙(40)은 워드 추출 모듈(130)에 파라미터 값으로 전달될 수도 있다. 예를 들어, 제1 파라미터 값은 언어 유형이 알파벳인 경우 'A', 한글인 경우 'H'를 가질 수 있고, 제2 파라미터 값은 인식 형태가 단일어인 경우 'S', 복합어인 경우 'P'를 가질 수 있고, 제3 파라미터 값은 인식 방향이 우측 우선인 경우 'R', 좌측 우선인 경우 'L'을 가질 수 있다. 그러나 워드 추출 모듈(130)에 전달되는 파라미터의 형식은 구체적인 구현 목적에 따라 다르게 정해질 수도 있다.In some embodiments of the invention, the rules 40 described above may be passed to the word extraction module 130 as parameter values. For example, the first parameter value may have 'A' when the language type is alphabetical and 'H' when the language type is Hangul, and the second parameter value may be 'S' , And the third parameter value may have 'R' when the recognition direction is right priority or 'L' when it is left priority. However, the format of the parameter transmitted to the word extracting module 130 may be determined differently according to the concrete implementation purpose.

도 4는 본 발명의 일 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.4 is a diagram for explaining a word extracting method according to an embodiment of the present invention.

도 4를 참조하면, 용어(300)는 "processaccountid"로서 워드 추출 모듈(130)에 입력되고, 규칙(400)은 "(A, S, R)"로서 규정된다. 즉, 본 실시예에서 워드 추출 모듈(130)이 사용하는 규칙이 알파벳, 단일어, 우측 우선인 경우이다. 한편, 워드 사전(20)은 워드(201, 203, 205, 207, 209), 즉, "Process", "Account", "Id", "Processaccount" 및 "Accountid"를 미리 저장하고 있다.Referring to FIG. 4, the term 300 is entered into the word extraction module 130 as a "processaccountid", and the rule 400 is defined as "(A, S, R)". That is, the rule used by the word extraction module 130 in this embodiment is alphabetic, monolingual, or right-first. On the other hand, the word dictionary 20 stores in advance the words 201, 203, 205, 207 and 209, namely, "Process", "Account", "Id", "Processaccount" and "Accountid".

용어 입력 모듈(110)은 복수의 문자를 포함하는 용어(300), 즉 "processaccountid"를 입력 받고, 규칙 입력 모듈(120)은 규칙(400), 즉 "(A, S, R)"을 입력 받는다. The term input module 110 receives a term 300 containing a plurality of characters, i.e., "processaccountid ", and the rule input module 120 inputs a rule 400, i.e.," Receive.

워드 추출 모듈(130)은 용어 입력 모듈(110)로부터 "processaccountid"를 입력 받고, 미리 정해진 진행 방향에 따라 "processaccountid"에서 n 개(단, n은 1 이상의 정수)의 문자를 순차적으로 추출하고, 추출된 n 개의 문자로 이루어진 워드를 워드 사전(20)에서 검색한다. 여기서 미리 정해진 진행 방향은 규칙(400)의 인식 방향과 연관된다. 본 실시예에서, 워드의 인식 방향은 "우측 우선"이고, 이를 구현하기 위해 진행 방향은 용어(300)의 우측으로부터 좌측을 향하는 방향이 된다.The word extraction module 130 receives the word " processaccountid "from the term input module 110, sequentially extracts n (where n is an integer of 1 or more) characters in" processaccountid " And searches the word dictionary 20 for the word consisting of the extracted n characters. Here, the predetermined traveling direction is associated with the recognition direction of the rule 400. In this embodiment, the recognition direction of the word is "right priority ", and in order to implement this, the direction is the direction from the right side to the left side of the term 300.

구체적으로, 워드 추출 모듈(130)은 용어(300) 우측으로부터 좌측을 향하는 진행 방향에 따라 "processaccountid"에서 1 개의 문자, 즉, "d"를 추출하고, 워드 사전(20)에서 "d"를 검색한다. 워드 사전(20)에는 "d"가 저장되어 있지 않으므로 검색은 실패한다. 그러면 워드 추출 모듈(130)은 동일한 진행 방향에 따라 용어(300)에서 1 개의 문자 "i"를 더 추출하고, "id"를 워드 사전(20)에서 검색한다. 용어(300) 중 "id"는 워드 사전에 저장된 워드(205), 즉, "Id"와 매칭되므로 검색에 성공한다. 본 발명의 몇몇의 실시예에서, 워드 추출 모듈(130)은 검색에 성공한 워드, 즉, "Id"를 스택(170)에 삽입할 수 있다.Specifically, the word extraction module 130 extracts one character, "d ", from the word" processaccountid " in accordance with the progressing direction from the right side of the term 300 to the left, Search. The search fails because "d" is not stored in the word dictionary 20. The word extraction module 130 then further extracts one character "i" from the term 300 according to the same direction of travel, and retrieves the word "id" The term "id" in the term 300 matches with the word 205 stored in the word dictionary, i.e., "Id" In some embodiments of the present invention, the word extraction module 130 may insert the retrieved word, "Id, " into the stack 170.

검색에 성공한 경우, 워드 추출 모듈(130)은 용어(300)에서 검색에 성공한 워드에 대응하는 n 개의 문자를 제외하고, 동일한 진행 방향에 따라, 용어(300)에서 m 개(단, m은 1 이상의 정수)의 문자를 순차적으로 추출하고, 추출된 m 개의 문자로 이루어진 워드를 워드 사전(20)에서 검색한다.If the search is successful, the word extraction module 130 extracts m words in the term 300 (where m is 1), according to the same direction of travel, except for the n characters corresponding to the word And a word consisting of the extracted m characters is searched in the word dictionary 20. Then,

구체적으로, 워드 추출 모듈(130)은 검색에 성공한 2 개의 문자, 즉 "id"를 제외하고, 동일한 진행 방향에 따라, "processaccount"에서 1 개의 문자, 즉, "t"를 추출하고, 워드 사전(20)에서 "t"를 검색한다. 워드 사전(20)에는 "t"가 저장되어 있지 않으므로 검색은 실패한다. 그러면 워드 추출 모듈(130)은 동일한 진행 방향에 따라 용어(300)에서 1 개의 문자 "n"을 더 추출하고, "nt"를 워드 사전(20)에서 검색한다. 이와 같은 과정을 반복하여 용어(300) 중 "account"는 워드 사전에 저장된 워드(203), 즉, "Account"와 매칭되므로 검색에 성공한다. 본 발명의 몇몇의 실시예에서, 워드 추출 모듈(130)은 검색에 성공한 워드, 즉, "Account"를 스택(170)에 삽입할 수 있다.Specifically, the word extraction module 130 extracts one character, i.e., "t ", from" processaccount "Quot; t " Since "t" is not stored in the word dictionary 20, the search fails. The word extraction module 130 then further extracts one character "n" from the term 300 according to the same direction of travel and searches for the word "nt" By repeating this process, "account" in the term 300 is matched with the word 203 stored in the word dictionary, that is, "Account", so that the search succeeds. In some embodiments of the present invention, the word extraction module 130 may insert the retrieved word, "Account ", into the stack 170.

검색에 성공한 경우, 워드 추출 모듈(130)은 용어(300)에서 검색에 성공한 워드에 대응하는 "id" 및 "account"를 제외하고, 동일한 진행 방향에 따라 "process"를 추출하고, 이를 워드 사전에 저장된 워드(201), 즉, "Process"와 매칭시킨다. 본 발명의 몇몇의 실시예에서, 워드 추출 모듈(130)은 검색에 성공한 워드, 즉, "Process"를 스택(170)에 삽입할 수 있다.If the retrieval is successful, the word extracting module 130 extracts "process" according to the same direction of progress, except "id" and " account " With the word "Process" In some embodiments of the present invention, the word extraction module 130 may insert the word (s) successfully retrieved, i.e. "Process"

이에 따라, "Id", "Account" 및 "Process"의 순서로 워드가 인식되고 스택(170)에 삽입되었으며, 워드 출력 모듈(160)은 최종 결과로서 "Process", Account", "Id"를 스택(170)으로부터 인출하여 출력할 수 있다.Accordingly, the words are recognized in the order of "Id", "Account" and "Process" and inserted into the stack 170, and the word output module 160 outputs "Process", "Account" It can be drawn out from the stack 170 and output.

도 5는 본 발명의 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.5 is a diagram for explaining a word extracting method according to another embodiment of the present invention.

도 5를 참조하면, 도 4에서의 실시예와 동일하게, 용어(300)는 "processaccountid"로서 워드 추출 모듈(130)에 입력되고, 규칙(400)은 "(A, S, R)"로서 규정된다. 즉, 본 실시예에서 워드 추출 모듈(130)이 사용하는 규칙이 알파벳, 단일어, 우측 우선인 경우이다. 그러나, 워드 사전(20)은 도 4의 실시예와는 다르게, 워드(201, 205, 209), 즉, "Process", "Id" 및 "Accountid"만을 미리 저장하고 있다.4, the term 300 is input to the word extraction module 130 as "processaccountid ", and the rule 400 is entered as" (A, S, R) . That is, the rule used by the word extraction module 130 in this embodiment is alphabetic, monolingual, or right-first. However, the word dictionary 20 previously stores only the words 201, 205, and 209, that is, "Process", "Id", and "Accountid" differently from the embodiment of FIG.

도 4에서의 실시예와 다른 부분은, 검색에 성공한 워드 "Id"를 스택(170)에 삽입한 이후이다. 도 4의 실시예와는 다르게 워드 사전(20)에는 "Account"라는 워드가 존재하지 않는다.4 is after inserting the successfully retrieved word "Id " into the stack 170. The word " Id" Unlike the embodiment of FIG. 4, there is no word "Account" in the word dictionary 20.

"Id"를 스택(170)에 삽입한 후, 워드 추출 모듈(130)은 검색에 성공한 2 개의 문자, 즉 "id"를 제외하고, 우측으로부터 좌측을 향하는 진행 방향에 따라 에 따라, "processaccount"에서 1 개의 문자, 즉, "t"를 추출하고, 워드 사전(20)에서 "t"를 검색한다. 워드 사전(20)에는 "t"가 저장되어 있지 않으므로 검색은 실패한다. 그러면 워드 추출 모듈(130)은 동일한 진행 방향에 따라 용어(300)에서 1 개의 문자 "n"을 더 추출하고, "nt"를 워드 사전(20)에서 검색한다. 이와 같은 과정을 반복하였음에도 불구하고, "processaccount"를 워드 사전(20)에서 검색하는 단계까지 검색은 실패하게 된다.After inserting the "Id" into the stack 170, the word extracting module 130 extracts the word "processaccount ", according to the progressing direction from right to left, T "from the word dictionary 20, and searches for" t " Since "t" is not stored in the word dictionary 20, the search fails. The word extraction module 130 then further extracts one character "n" from the term 300 according to the same direction of travel and searches for the word "nt" Even though this process is repeated, the search fails until the step of searching the word dictionary 20 for "processaccount".

이 경우, 워드 추출 모듈(130)은 용어(300)에서 상기 진행 방향을 기준으로 첫번째 문자를 제외한다. 즉, 우측으로부터 좌측을 향하는 진행 방향으로 볼 때 첫번째 문자인 "t"를 제외한다. 그리고, 워드 추출 모듈(130)은 상기 첫번째 문자가 제외된 용어(300)에서, 상기 진행 방향에 따라 l 개(단, l은 1 이상 n - 1 이하의 정수)의 문자를 순차적으로 추출하고, 추출된 l 개의 문자로 이루어진 워드를 워드 사전(20)에서 검색한다. 즉, "processaccoun"에 대해 상기 과정을 반복한다. 본 발명의 몇몇의 실시예에서, 워드 추출 모듈(130)은 제외된 상기 첫번째 문자 "t"를 스택(170)에 삽입할 수 있다.In this case, the word extraction module 130 excludes the first character from the term 300 based on the traveling direction. That is, the first character "t" is excluded when viewed in the direction from the right to the left. The word extracting module 130 sequentially extracts 1 letter (1 is an integer equal to or greater than 1 and equal to or less than n - 1) in accordance with the progress direction in the term 300 in which the first character is excluded, And searches the word dictionary 20 for a word consisting of the extracted l characters. That is, the process is repeated for "processaccoun". In some embodiments of the invention, the word extraction module 130 may insert the first character "t" that is excluded in the stack 170.

본 발명의 몇몇의 실시예에서, 제외된 첫번째 문자를 스택(170)에 삽입하는 것은, 스택(170)의 탑(top)이 검색에 성공한 워드, 즉, "Id"를 포함하고 있는 경우, 첫번째 문자를 스택(170)에 새로운 원소로서 삽입할 수 있다.In some embodiments of the present invention, inserting the first removed character into the stack 170 may cause the top of the stack 170 to contain a word that has been successfully retrieved, i.e., "Id & Characters can be inserted into the stack 170 as new elements.

그러나 "processaccoun"에 대해서도 워드 추출에 실패하게 되고, 상기 진행 방향을 기준으로 첫번째 문자를 제외하는 과정을 반복하여 "processaccou", "processacco" 등에 대해 상기 과정을 반복한다. 본 발명의 몇몇의 실시예에서, 워드 추출 모듈(130)은 이러한 반복 과정에서 제외된 상기 첫번째 문자 "n", "u"를 스택(170)에 삽입할 수 있다.However, word extraction fails even for "processaccoun", and the above process is repeated for "processaccou", "processacco", etc. by repeating the process of excluding the first character based on the progress direction. In some embodiments of the present invention, the word extraction module 130 may insert the first characters "n "," u "

본 발명의 몇몇의 실시예에서, 제외된 첫번째 문자를 스택(170)에 삽입하는 것은, 스택(170)의 탑(top)이 검색에 실패하여 이전에 제외된 문자, 즉, "t"를 포함하고 있는 경우, 탑을 출력(pop)하고, 탑에 저장되어 있던 문자 "t"와 첫번째 문자, 즉, "n"(및 "u")를 함께 하나의 원소로서 스택(170)에 삽입할 수 있다.In some embodiments of the present invention, inserting the first removed character into the stack 170 may cause the top of the stack 170 to fail to retrieve and include the previously excluded characters, i.e., "t" It is possible to pop the tower and insert the first character, i.e., " n "(and" u "), have.

결국 "process"까지 제외 과정을 반복하게 되고, 스택(170)에는 "[account]"가 삽입된 상태에서, 워드 추출 모듈(130)은 "process"에 대해 상술한 방법으로 워드 사전(20)에 미리 저장된 워드(201), 즉, "Process"와의 매칭을 성공한다. 본 발명의 몇몇의 실시예에서, 워드 추출 모듈(130)은 검색에 성공한 워드, 즉, "Process"를 스택(170)에 삽입할 수 있다. 그리고, "Process"를 스택(170)에 삽입되기 전에 삽입이 완료된 "[account]"는 새로운 워드로서 취급된다.Finally, the process of exclusion to the "process " is repeated, and with the word " [account]" inserted into the stack 170, the word extraction module 130 writes the word " process " And matches the previously stored word 201, i.e., "Process " In some embodiments of the present invention, the word extraction module 130 may insert the word (s) successfully retrieved, i.e. "Process" And, the inserted "[account]" before the "Process" is inserted into the stack 170 is treated as a new word.

이에 따라, "Id", "[Account]" 및 "Process"의 순서로 워드가 인식되고 스택(170)에 삽입되었으며, 워드 출력 모듈(160)은 최종 결과로서 "Process", [Account]", "Id"를 스택(170)으로부터 인출하여 출력할 수 있다. 여기서 "[Account]"는 워드 사전(20)에 미리 저장된 워드가 아니므로, 필요에 따라 워드 사전(20)에 추가 저장할 수도 있다.Accordingly, the words are recognized in the order of "Id", "[Account]" and "Process" and inserted into the stack 170, and the word output module 160 outputs "Process", "Account" "Id" can be extracted from the stack 170 and outputted. Here, "[Account]" is not a word stored in advance in the word dictionary 20, and may be additionally stored in the word dictionary 20 as needed.

도 6은 본 발명의 또 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.6 is a diagram for explaining a word extracting method according to another embodiment of the present invention.

도 6을 참조하면, 용어(300)는 "processaccountid"로서 워드 추출 모듈(130)에 입력되고, 규칙(400)은 "(A, P, R)"로서 규정된다. 즉, 본 실시예에서 워드 추출 모듈(130)이 사용하는 규칙이 알파벳, 복합어, 우측 우선인 경우이다. 한편, 워드 사전(20)은 워드(201, 203, 205, 207, 209), 즉, "Process", "Account", "Id", "Processaccount" 및 "Accountid"를 미리 저장하고 있다.6, the term 300 is input to the word extraction module 130 as "processaccountid ", and rule 400 is defined as" (A, P, R) ". That is, the rule used by the word extracting module 130 in this embodiment is alphabet, compound word, right priority. On the other hand, the word dictionary 20 stores in advance the words 201, 203, 205, 207 and 209, namely, "Process", "Account", "Id", "Processaccount" and "Accountid".

용어 입력 모듈(110)은 복수의 문자를 포함하는 용어(300), 즉 "processaccountid"를 입력 받고, 규칙 입력 모듈(120)은 규칙(400), 즉 "(A, P, R)"을 입력 받는다.The term input module 110 receives a term 300 including a plurality of characters, i.e., "processaccountid ", and the rule input module 120 inputs a rule 400, i.e.," Receive.

워드 추출 모듈(130)은 용어 입력 모듈(110)로부터 "processaccountid"를 입력 받고, 미리 정해진 진행 방향에 따라 "processaccountid"에서 n 개(단, n은 1 이상의 정수)의 문자를 순차적으로 제외하고, 추출된 n 개의 문자가 제외된 복수의 문자로 이루어진 워드를 워드 사전(20)에서 검색한다. 여기서 미리 정해진 진행 방향은 규칙(400)의 인식 방향과 연관된다. 본 실시예에서, 워드의 인식 방향은 "우측 우선"이고, 이를 구현하기 위해 진행 방향은 용어(300)의 좌측으로부터 우측을 향하는 방향이 된다.The word extraction module 130 receives the word " processaccountid "from the term input module 110, sequentially excludes n characters (where n is an integer of 1 or more) in the" processaccountid " A word consisting of a plurality of characters excluding the extracted n characters is searched in the word dictionary (20). Here, the predetermined traveling direction is associated with the recognition direction of the rule 400. In this embodiment, the recognition direction of the word is "right priority ", and in order to implement this, the direction is the direction from the left to the right of the term 300.

구체적으로, 워드 추출 모듈(130)은 용어(300) 좌측으로부터 우측을 향하는 진행 방향에 따라 "processaccountid"에서 1 개의 문자, 즉, "p"를 제외하고, 워드 사전(20)에서 "rocessaccountid"를 검색한다. 워드 사전(20)에는 "rocessaccountid"가 저장되어 있지 않으므로 검색은 실패한다. 그러면 워드 추출 모듈(130)은 동일한 진행 방향에 따라 용어(300)에서 1 개의 문자 "r"를 더 제외하고, "ocessaccountid"를 워드 사전(20)에서 검색한다. 이러한 과정을 반복하여, 워드 추출 모듈(130)은 결국 "accountid"를 검색하게 되고, 용어(300) 중 "accountid"는 워드 사전에 저장된 워드(209), 즉, "Accountid"와 매칭되므로 검색에 성공한다. 본 발명의 몇몇의 실시예에서, 워드 추출 모듈(130)은 검색에 성공한 워드, 즉, "AccountId"를 스택(170)에 삽입할 수 있다.Specifically, the word extraction module 130 extracts "rocessaccountid" from the word dictionary 20, excluding one character in the "processaccountid", ie, "p", in accordance with the progress direction from the left to the right of the term 300 Search. Since "rocessaccountid" is not stored in the word dictionary 20, the search fails. The word extraction module 130 then searches the word dictionary 20 for "ocessaccountid", excluding one character "r" in the term 300 according to the same direction of travel. This process is repeated so that the word extraction module 130 eventually searches for "accountid", and "accountid" in the term 300 matches the word 209 stored in the word dictionary, ie, "Accountid" It succeeds. In some embodiments of the present invention, the word extraction module 130 may insert a word that has been successfully retrieved, i.e., "AccountId"

검색에 성공한 경우, 워드 추출 모듈(130)은 상기 n 개의 문자에서 m 개(단, m은 1 이상 n 이하인 정수)의 문자를 순차적으로 제외하고, m 개의 문자가 제외된 상기 n 개의 문자로 이루어진 워드를 워드 사전(20)에서 검색한다.If the retrieval is successful, the word extracting module 130 sequentially extracts m (where m is an integer equal to or greater than 1 and not greater than n) characters out of the n characters and sequentially outputs the n characters excluding m characters And searches the word dictionary 20 for the word.

구체적으로, 워드 추출 모듈(130)은 "process"에 대해 상기 작업을 수행하게 된다. 그런데 "process"는 워드 사전(20)에 미리 저장되어 있는 워드(201) "Process"와 매칭된다. 본 발명의 몇몇의 실시예에서, 워드 추출 모듈(130)은 검색에 성공한 워드, 즉, "Process"를 스택(170)에 삽입할 수 있다.Specifically, the word extraction module 130 performs the above operation on the "process ". However, "process " is matched with the word" Process "201 previously stored in the word dictionary 20. [ In some embodiments of the present invention, the word extraction module 130 may insert the word (s) successfully retrieved, i.e. "Process"

이에 따라, "Accountid" 및 "Process"의 순서로 워드가 인식되고 스택(170)에 삽입되었으며, 워드 출력 모듈(160)은 최종 결과로서 "Process", Accountid"를 스택(170)으로부터 인출하여 출력할 수 있다.Accordingly, the word is recognized in the order of "Accountid" and "Process" and inserted into the stack 170, and the word output module 160 fetches "Process", Accountid "from the stack 170 as the final result, can do.

도 7은 본 발명의 또 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.7 is a diagram for explaining a word extracting method according to another embodiment of the present invention.

도 7을 참조하면, 도 6에서의 실시예와 동일하게, 용어(300)는 "processaccountid"로서 워드 추출 모듈(130)에 입력되고, 규칙(400)은 "(A, P, R)"로서 규정된다. 즉, 본 실시예에서 워드 추출 모듈(130)이 사용하는 규칙이 알파벳, 복합어, 우측 우선인 경우이다. 그러나, 워드 사전(20)은 도 6의 실시예와는 다르게, 워드(201, 205), 즉, "Process" 및 "Id" 만을 미리 저장하고 있다.7, the term 300 is input to the word extraction module 130 as "processaccountid ", and the rule 400 is entered as" (A, P, R) . That is, the rule used by the word extracting module 130 in this embodiment is alphabet, compound word, right priority. However, the word dictionary 20 previously stores only the words 201 and 205, i.e., "Process" and "Id ", unlike the embodiment of FIG.

"Id"를 스택(170)에 삽입한 후, 워드 추출 모듈(130)은 검색에 성공한 2 개의 문자, 즉 "id"를 제외하고 남은 "processaccount"에 대해 워드 사전(20)의 검색을 수행하지만 검색에 실패한다. 이 경우, 워드 추출 모듈(130)은 용어(300)에서 상기 진행 방향을 기준으로 마지막 문자를 제외한다. 즉, 좌측으로부터 우측을 향하는 진행 방향으로 볼 때 마지막 문자인 "t"를 제외한다. 그리고, 워드 추출 모듈(130)은 상기 마지막 문자가 제외된 용어(300)에서, 상기 진행 방향에 따라 l 개(단, l은 1 이상 n - 1 이하의 정수)의 문자를 순차적으로 제외하고, 상기 l 개의 문자가 제외된 n - l - 1 개의 문자로 이루어진 워드를 워드 사전에서 검색한다. 즉, "processaccoun"에 대해 상기 과정을 반복한다. 본 발명의 몇몇의 실시예에서, 워드 추출 모듈(130)은 제외된 상기 첫번째 문자 "t"를 스택(170)에 삽입할 수 있다.After inserting "Id" into the stack 170, the word extraction module 130 performs a search of the word dictionary 20 for the remaining "processaccount" Search fails. In this case, the word extraction module 130 excludes the last character from the term 300 based on the progress direction. That is, it excludes the last character "t" when viewed in the direction from the left to the right. The word extracting module 130 sequentially excludes 1 letter (l is an integer equal to or greater than 1 and equal to or less than n - 1) in accordance with the progress direction in the term 300 in which the last character is excluded, A word consisting of n - l - 1 characters excluding the l characters is searched in a word dictionary. That is, the process is repeated for "processaccoun". In some embodiments of the invention, the word extraction module 130 may insert the first character "t" that is excluded in the stack 170.

본 발명의 몇몇의 실시예에서, 제외된 마지막 문자를 스택(170)에 삽입하는 것은, 스택(170)의 탑(top)이 검색에 성공한 워드, 즉, "Id"를 포함하고 있는 경우, 마지막 문자를 스택(170)에 새로운 원소로서 삽입할 수 있다.In some embodiments of the present invention, inserting the last character that is excluded in the stack 170 is the last character in the stack 170 if the top of the stack 170 contains a word that has been successfully retrieved, Characters can be inserted into the stack 170 as new elements.

그러나 "processaccoun"에 대해서도 워드 추출에 실패하게 되고, 상기 진행 방향을 기준으로 마지막 문자를 제외하는 과정을 반복하여 "processaccou", "processacco" 등에 대해 상기 과정을 반복한다. 본 발명의 몇몇의 실시예에서, 워드 추출 모듈(130)은 이러한 반복 과정에서 제외된 상기 마지막 문자 "n", "u"를 스택(170)에 삽입할 수 있다.However, word extraction fails for "processaccoun", and the above process is repeated for "processaccou", "processacco", etc. by repeating the process of excluding the last character based on the progress direction. In some embodiments of the present invention, the word extraction module 130 may insert the last characters "n "," u "

본 발명의 몇몇의 실시예에서, 제외된 마지막 문자를 스택(170)에 삽입하는 것은, 스택(170)의 탑(top)이 검색에 실패하여 이전에 제외된 문자, 즉, "t"를 포함하고 있는 경우, 탑을 출력(pop)하고, 탑에 저장되어 있던 문자 "t"와 마지막 문자, 즉, "n"(및 "u")를 함께 하나의 원소로서 스택(170)에 삽입할 수 있다.In some embodiments of the invention, inserting the excluded last character into the stack 170 may cause the top of the stack 170 to fail the search and include the previously excluded character, i. It is possible to pop the tower and insert the last character, i.e., " n "(and" u "), have.

도 8은 본 발명의 또 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.8 is a diagram for explaining a word extracting method according to another embodiment of the present invention.

도 8을 참조하면, 도 6와 비교하여, 워드의 인식 방향이 "L", 즉 좌측 우선이라는 점만 다르다. 이에 따르면, 상술한 방법에 따라 "Processaccount" 및 "Id"의 순서로 워드가 인식되고 스택(170)에 삽입된다. 그리고 워드 출력 모듈(160)은 최종 경과로서 "Id" 및 "Processaccount"를 스택(170)으로부터 인출하여 출력할 수 있다.Referring to Fig. 8, it differs from Fig. 6 only in that the recognition direction of words is "L ", that is, left priority. According to this method, words are recognized in the order of "Processaccount" and "Id " according to the above-described method and inserted into the stack 170. Then, the word output module 160 can extract "Id" and "Processaccount" from the stack 170 as a final progress.

도 9는 본 발명의 또 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 도면이다.9 is a diagram for explaining a word extracting method according to another embodiment of the present invention.

도 9를 참조하면, 도 6와 비교하여, 워드의 인식 유형이 "한글"이라는 점, 이에 따라 음절 단위로 추출된다는 점과, 워드 사전(20)이 워드(211, 213, 215, 217, 219), 즉 "처리", "프로세스", "명", "처리프로세스" 및 "프로세스명"을 저장한다는 점만 다르다. 이에 따르면, 용어(300)인 "처리프로세스명"에 대해, 상술한 방법에 따라 "프로세스명" 및 "처리"의 순서로 워드가 인식되고 스택(170)에 삽입된다. 그리고 워드 출력 모듈(160)은 최종 경과로서 "처리" 및 "프로세서명"을 스택(170)으로부터 인출하여 출력할 수 있다.9, the recognition type of the word is "Hangul ", so that it is extracted in units of syllables, and that the word dictionary 20 has words 211, 213, 215, 217 and 219 ), I.e., "processing "," process ", "name "," processing process " According to this, words are recognized and inserted in the stack 170 in the order of "process name" and "process" according to the above-described method for the term "process name" The word output module 160 may then retrieve and output the "processing" and "processor name"

도 10은 본 발명의 몇몇의 실시예에 따른 워드 추출 장치를 설명하기 위한 개략도이다.10 is a schematic diagram for explaining a word extracting apparatus according to some embodiments of the present invention.

도 10을 참조하면, 워드 추출 장치(600)는 프로세서(601), 메모리(603), 스토리지(605), 네트워크 인터페이스(607), 입력 인터페이스(609) 및 출력 인터페이스(611)를 포함한다. 프로세서(601), 메모리(603), 스토리지(605), 네트워크 인터페이스(607), 입력 인터페이스(609) 및 출력 인터페이스(611)는 버스(613)를 통해 데이터를 서로 주고 받을 수 있다.10, the word extracting apparatus 600 includes a processor 601, a memory 603, a storage 605, a network interface 607, an input interface 609, and an output interface 611. The processor 601, the memory 603, the storage 605, the network interface 607, the input interface 609, and the output interface 611 can exchange data via the bus 613.

지금까지 상술한 워드 추출 장치의 다양한 동작들은 프로세서(601), 메모리(603), 스토리지(605), 네트워크 인터페이스(607), 입력 인터페이스(609) 및 출력 인터페이스(611)를 이용하여 구현될 수 있다.The various operations of the word extracting apparatus described above can be implemented using the processor 601, the memory 603, the storage 605, the network interface 607, the input interface 609 and the output interface 611 .

예를 들어, 워드 추출 장치(600)는 하나 이상의 프로세서(601), 네트워크 인터페이스(607), 메모리(603) 및 메모리(603)에 로딩 되어 프로세서(601)에 의하여 수행 되는 컴퓨터 프로그램의 실행 파일이 기록된 스토리지 장치(605)를 포함하되, 상기 컴퓨터 프로그램은, 복수의 문자를 포함하는 용어를 입력 받는 일련의 인스트럭션; 미리 정해진 진행 방향에 따라, 상기 용어에서 n 개(단, n은 1 이상의 정수)의 문자를 순차적으로 추출하는 일련의 인스트럭션; 추출된 상기 n 개의 문자로 이루어진 제1 워드를 워드 사전에서 검색하는 일련의 인스트럭션; 상기 용어에서 상기 검색에 성공한 상기 제1 워드에 대응하는 상기 n 개의 문자를 제외하고, 상기 진행 방향에 따라, 상기 용어에서 m 개(단, m은 1 이상의 정수)의 문자를 순차적으로 추출하는 일련의 인스트럭션; 및 추출된 상기 m 개의 문자로 이루어진 제2 워드를 상기 워드 사전에서 검색하는 일련의 인스트럭션을 포함할 수 있다.For example, the word extracting apparatus 600 may include one or more processors 601, a network interface 607, a memory 603 and an executable file of a computer program that is loaded into the memory 603 and executed by the processor 601 A recorded storage device (605), the computer program comprising: a series of instructions for receiving a term comprising a plurality of characters; A series of instructions for sequentially extracting n (n is an integer of 1 or more) characters in the term according to a predetermined traveling direction; A series of instructions for retrieving a first word of the extracted n characters in a word dictionary; A series of sequentially extracting m (where m is an integer of 1 or more) characters in the term according to the proceeding direction excluding the n characters corresponding to the first word succeeding the search in the term Instructions; And a series of instructions for retrieving a second word of the extracted m characters from the word dictionary.

도 11은 본 발명의 일 실시예에 따른 워드 추출 방법을 설명하기 위한 순서도이다.11 is a flowchart for explaining a word extracting method according to an embodiment of the present invention.

도 11을 참조하면, 본 발명의 일 실시예에 따른 워드 추출 방법은, 용어(30)가 구분 문자, 예컨대 기호, 스페이스 등을 포함하면, 워드 사전(20)을 검색하지 않고 용어(30)를 상기 구분 문자를 이용하여 복수의 워드로 추출할 수 있다.Referring to FIG. 11, a word extracting method according to an embodiment of the present invention is a method of extracting a term 30 from a word dictionary 20 without searching for a word dictionary 20 when the term 30 includes a delimiter character, It is possible to extract a plurality of words using the delimiter character.

용어(30)가 구분 문자를 포함하지 않는다는 전제하에 설명하면, 상기 방법은 용어(30)를 입력 받고(S1101), 워드를 인식하기 위한 우선 방향을 결정(S1103)하는 것을 포함한다. n 값을 1로 설정(S1105)하고 우선 방향에 따라 n 개의 문자를 추출(S1107)한다. 추출된 n 개의 문자가 워드 사전(20)에 존재하는지 판단(S1109)하고, 존재하는 경우 발견된 워드를 스택(170)에 삽입하고 용어(30)로부터 삭제(S1111)한다. 존재하지 않는 경우, n 값을 증가시키고(S1113), 용어(30)의 길이와 n 값을 비교(S1115)한다. 만일 n 값이 용어(30)의 길이보다 작거나 같다면 단계(S1115)로 진행하고, 그렇지 않다면 우선 방향에 따라 마지막 문자를 스택(170)에 삽입하고 용어(30)에서 삭제(S1117)한다. 다음으로 용어(30)의 길이가 0에 도달하는지 판단(S1119)한 후, 그러한 경우 프로세스를 종료한다.The method includes receiving the term 30 (S1101), and determining a priority direction for recognizing the word (S1103), assuming that the term 30 does not include a delimiter character. the value of n is set to 1 (S1105) and n characters are extracted according to the priority direction (S1107). If the extracted n characters exist in the word dictionary 20 (S1109), if found, the found word is inserted into the stack 170 and deleted from the term 30 (S1111). If it does not exist, the value of n is incremented (S1113), and the length of the term 30 is compared with the value of n (S1115). If the value of n is less than or equal to the length of the term 30, the process proceeds to step S1115. If not, the last character is inserted into the stack 170 according to the preferential direction and deleted in the term 30 (S1117). Next, after judging whether the length of the term 30 reaches 0 (S1119), the process is ended in such a case.

도 12는 본 발명의 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 순서도이다.12 is a flowchart for explaining a word extracting method according to another embodiment of the present invention.

도 12를 참조하면, 본 발명의 다른 실시예에 따른 워드 추출 방법은, 용어(30)가 구분 문자를 포함하지 않는다는 전제하에 설명하면, 용어(30)를 입력 받고(S1201), 워드를 인식하기 위한 우선 방향을 결정(S1203)하는 것을 포함한다. n 값을 용어(30)의 길이 값으로 설정(S1205)하고 우선 방향에 따라 n 개의 문자를 추출(S1207)한다. 추출된 n 개의 문자가 워드 사전(20)에 존재하는지 판단(S1209)하고, 존재하는 경우 발견된 워드를 스택(170)에 삽입하고 용어(30)로부터 삭제(S1211)한다. 존재하지 않는 경우, n 값을 감소시키고(S1213), 용어(30)의 길이와 n 값을 비교(S1215)한다. 만일 n 값이 용어(30)의 길이보다 작거나 같다면 단계(S1215)로 진행하고, 그렇지 않다면 우선 방향에 따라 마지막 문자를 스택(170)에 삽입하고 용어(30)에서 삭제(S1217)한다. 다음으로 용어(30)의 길이가 0에 도달하는지 판단(S1219)한 후, 그러한 경우 프로세스를 종료한다.Referring to FIG. 12, the word extracting method according to another embodiment of the present invention is explained on the assumption that the term 30 does not include a delimiter character. The word extracting method receives the term 30 (S1201) (S1203). The n value is set to the length value of the term 30 (S1205) and n characters are extracted in accordance with the priority direction (S1207). If the extracted n characters exist in the word dictionary 20 (S1209), if found, the found word is inserted into the stack 170 and deleted from the term 30 (S1211). If not, the value of n is decreased (S1213), and the length of the term 30 is compared with the value of n (S1215). If the value of n is less than or equal to the length of the term 30, the process proceeds to step S1215; otherwise, the last character is inserted in the stack 170 according to the preferential direction and deleted in the term 30 (S1217). Next, after determining whether the length of the term 30 reaches 0 (S1219), the process is ended in such a case.

도 13은 본 발명의 다른 실시예에 따른 워드 추출 방법을 설명하기 위한 순서도이다.13 is a flowchart for explaining a word extracting method according to another embodiment of the present invention.

도 13을 참조하면, 본 발명의 다른 실시예에 따른 워드 추출 방법은 제1 규칙에 따라 용어(30)로부터 워드(50)를 추출(S1301)하고, 제1 규칙과 다른 제2 규칙에 따라 용어(30)로부터 워드(50)를 추출(S1303)하고, 양 결과가 서로 동일한지 여부를 판단(S1305)할 수 있다. 만일 양 결과가 동일하면 그 결과를 출력(S1309)하고, 만일 양 결과가 다르다면 실패 메시지(S1307)를 출력할 수 있다.Referring to FIG. 13, in the word extracting method according to another embodiment of the present invention, words 50 are extracted from terms 30 according to a first rule (S1301) (S303), and judges whether both results are identical to each other (S1305). If both results are the same, the result is outputted (S1309), and if both results are different, the failure message (S1307) can be output.

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였으나, 본 발명은 상기 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 제조될 수 있으며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, It is to be understood that the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive.

Claims

Receiving a term including a plurality of characters;
Sequentially extracting n characters (where n is an integer of 1 or more) in the term according to a predetermined traveling direction;
Retrieving a first word of the extracted n characters in a word dictionary;
(M is an integer of 1 or more) characters are sequentially extracted from the term according to the progress direction except for the n characters corresponding to the first word in the term when the search is successful ; And
And searching the word dictionary for a second word consisting of the extracted m characters.

The method according to claim 1,
Wherein the step of retrieving the first word in a word dictionary comprises:
And inserting the first word succeeding in the search into the stack.

The method according to claim 1,
Wherein the step of retrieving the first word in a word dictionary comprises:
If the search is unsuccessful, further extracting one character from the term according to the traveling direction and adding the one character to the first word; And
retrieving the first word of n + 1 characters in the word dictionary.

The method according to claim 1,
If the search fails, excluding the first character based on the progress direction in the term;
Sequentially extracting one character (l is an integer equal to or greater than 1 and equal to or less than n - 1) in accordance with the progress direction in the term in which the first character is excluded; And
And retrieving a third word consisting of the extracted l characters in a word dictionary.

5. The method of claim 4,
Wherein the step of excluding the first character based on the progress direction in the term comprises:
And inserting the excluded first character into the stack.

6. The method of claim 5,
The step of inserting the excluded first character into the stack comprises:
And inserting the first character as a new element into the stack if the top of the stack contains the word that has succeeded in the retrieval.

6. The method of claim 5,
The step of inserting the excluded first character into the stack comprises:
Popping the tower if the top of the stack fails the search and contains previously excluded characters, and if the character stored in the tower and the first character are together an element And inserting the word into the stack.

6. The method of claim 5,
Further comprising the step of outputting, as a new word, one or more characters stored in an element including the first character, when a new element including a word successfully retrieved is inserted into the stack.

9. The method of claim 8,
Wherein outputting one or more characters stored in an element containing the first character as a new word further comprises storing the new word in the word dictionary.

The method according to claim 1,
Wherein the step of retrieving the second word in a word dictionary comprises:
Further extracting one character from the term according to the progress direction and adding the one character to the second word when the search is unsuccessful; And
And searching the word dictionary for a second word consisting of the extracted m + 1 characters.

The method according to claim 1,
Wherein the term includes a delimiter separating the first word and the second word.

Receiving a term including a plurality of characters;
Sequentially excluding n characters (where n is an integer of 1 or more) in the term according to a predetermined traveling direction;
Retrieving a first word in the word dictionary consisting of the plurality of characters excluding the n characters;
Sequentially excluding m characters (where m is an integer equal to or greater than 1 and equal to or less than n) from the n characters according to the progress direction; And
And searching the word dictionary for a second word consisting of the n characters excluding the m characters.

13. The method of claim 12,
Wherein the step of retrieving the first word in a word dictionary comprises:
And inserting the first word succeeding in the search into the stack.

13. The method of claim 12,
If the search is unsuccessful, excluding the last character based on the progress direction in the term;
Sequentially excluding 1 letter (l is an integer equal to or greater than 1 and equal to or less than n - 1) characters according to the progress direction in the term in which the last character is excluded; And
Searching the word dictionary for a third word consisting of n - l - 1 characters excluding the l characters.

15. The method of claim 14,
Wherein the step of excluding the last character based on the progress direction in the term comprises:
And inserting the excluded last character into the stack.

Receiving a term including a plurality of characters;
Determining whether the term includes a separator;
Sequentially extracting a first word and a second word in the term according to a first predetermined direction when the term does not include a delimiter character;
Sequentially extracting a third word and a fourth word from the term according to a second predetermined traveling direction opposite to the first traveling direction; And
Determining whether the first word and the fourth word are the same and whether the second word and the third word are the same.

17. The method of claim 16,
Wherein the step of determining whether the term includes a delimiter character comprises:
If the term includes a delimiter, extracting the term as a plurality of words using the delimiters without searching the word dictionary.

17. The method of claim 16,
The step of sequentially extracting the first word and the second word in the term according to the predetermined first traveling direction comprises:
Sequentially extracting n (n is an integer of 1 or more) characters in the term according to the first moving direction, and searching a first word of the extracted n characters in a word dictionary; And
The m words (m is an integer of 1 or more) are successively extracted from the term according to the first traveling direction, excluding the n characters corresponding to the first word in the term, And retrieving a second word of the m characters in the word dictionary.

17. The method of claim 16,
The step of sequentially extracting the first word and the second word in the term according to the predetermined first traveling direction comprises:
In accordance with the first progress direction, sequentially excludes n characters (where n is an integer of 1 or more) in the term, and sequentially writes a first word composed of the plurality of characters excluding the n characters in a word dictionary Searching; And
(M is an integer equal to or greater than 1 and equal to or less than n) from the n characters in accordance with the first moving direction, and sequentially extracts a second word consisting of the n characters excluding the m characters In the word dictionary.

17. The method of claim 16,
Wherein the extraction order of the first word and the second word and the extraction order of the third word and the fourth word are opposite to each other.

A term input module for inputting a term including a plurality of characters;
A rule input module for inputting a rule for a progress direction for extracting one or more characters from the term;
A word extracting module for sequentially extracting n characters (n is an integer of 1 or more) in the term according to a predetermined traveling direction in the rule; And
And a dictionary search module for searching, in a word dictionary, a first word of the extracted n characters,
Wherein the word extracting module extracts m characters (where m is an integer of 1 or more) of characters in the term, excluding the n characters corresponding to the first word succeeding the search in the term, Are successively extracted,
Wherein the dictionary search module searches the word dictionary for a second word composed of the extracted m characters.

22. The method of claim 21,
Wherein the rule further comprises rules for whether to extract the term on a single word basis or on a compound word basis.

22. The method of claim 21,
Wherein the rule further comprises rules for an extraction unit of the term.

24. The method of claim 23,
Wherein the term extraction unit is determined in units of bytes.

One or more processors;
Network interface;
Memory; And
A storage device in which an executable file of a computer program loaded in the memory and executed by the processor is recorded,
The computer program comprising:
A series of instructions for inputting a term including a plurality of characters;
A series of instructions for sequentially extracting n (n is an integer of 1 or more) characters in the term according to a predetermined traveling direction;
A series of instructions for retrieving a first word of the extracted n characters in a word dictionary;
A series of sequentially extracting m (where m is an integer of 1 or more) characters in the term according to the proceeding direction excluding the n characters corresponding to the first word succeeding the search in the term Instructions; And
And a series of instructions for searching the word dictionary for a second word consisting of the extracted m characters.