KR20180113849A

KR20180113849A - Method for semantic rules generation and semantic error correction based on mass data, and error correction system implementing the method

Info

Publication number: KR20180113849A
Application number: KR1020170045497A
Authority: KR
Inventors: 박일남; 백슬예; 안애림; 서가은
Original assignee: 주식회사 카카오
Priority date: 2017-04-07
Filing date: 2017-04-07
Publication date: 2018-10-17
Also published as: KR101965887B1

Abstract

The present invention relates to a method for generating a semantic error correction rule of an error correction system operating by at least one processor. The method includes a step of extracting error data pairs from mass data, a step of generating a semantic error correction rule including correction content to be corrected when a context is detected based on a correction target where a semantic error occurs, the context, a direction of searching the context for from the correction target, and correction content for correcting the correction target if the context is detected, and a step of storing the semantic error correction rule in a semantic error rule dictionary. The error data pairs are data pairs having the same length in the left and right context but different lengths of phonemes. It is possible to efficiently search for semantic error rules.

Description

TECHNICAL FIELD [0001] The present invention relates to a method and apparatus for generating a semantic error correction rule based on a large amount of data, and a semantic error correction method, and an error correction system implementing the semantic error based correction rule,

본 발명은 한국어 맞춤법 오류 교정 기술에 관한 것이다.The present invention relates to a Korean spelling error correction technique.

기술의 발달로 한국어 맞춤법 오류 교정에 대한 사용자 요구 수준이 높아짐에 따라, 의미 오류를 탐지하고 정확히 교정해야 할 필요성이 늘고 있다. 종래에도 의미 오류 교정 방법은 연구되었으나, 확률 통계 기반 오류 교정 방법을 사용하였다. 확률 통계 기반 오류 교정 기술은 규칙 기반 의미 오류 교정 기술에 비해 재현률은 높지만 정확률이 낮다. 반면, 규칙 기반 의미 오류 교정 기술은 정확성이 높지만, 주변 어절 어휘가 정확하게 매칭되어야 규칙을 만들 수 있다. 따라서 대용량 데이터에 접근할 수 없으면, 규칙 생성에 필요한 데이터쌍을 수집하기 어려워 규칙을 만들기 어렵다. As the level of user 's requirement for Korean spelling error correction increases with the development of technology, there is an increasing need to detect and correct corrective errors. Conventionally, mean error correction methods have been studied, but error correction methods based on probability statistics have been used. Probabilistic statistical error correction techniques have higher recall but lower accuracy than rule - based semantic error correction techniques. On the other hand, rule - based semantic error correction techniques are highly accurate, but the surrounding word lexicals must be precisely matched before they can be created. Therefore, if large amounts of data can not be accessed, it is difficult to create rules because it is difficult to collect data pairs required for rule generation.

본 발명이 해결하고자 하는 과제는 대용량 데이터 기반으로 의미 오류 교정 규칙을 생성하고, 품사 정보를 이용하여 의미 오류 교정 규칙을 확장하며, 문맥 구조를 나타내는 정보로 일반화된 의미 오류 규칙을 기초로 의미 오류 규칙을 효율적으로 탐색하여 오류를 교정하는 방법 및 시스템을 제공하는 것이다.The problem to be solved by the present invention is to create semantic error correcting rules based on a large amount of data, to extend semantic error correcting rules by using parts of speech information, and to use semantic error rules And to provide a method and system for efficiently searching for errors and correcting errors.

한 실시예에 따른 적어도 하나의 프로세서에 의해 동작하는 오류 교정 시스템의 의미 오류 교정 규칙 생성 방법으로서, 대량의 데이터로부터 오류 데이터쌍들을 추출하는 단계, 상기 오류 데이터쌍들 각각을 기초로 의미 오류가 발생한 교정대상, 문맥, 상기 교정대상으로부터 문맥을 탐색하는 방향, 그리고 상기 문맥이 탐지되면 상기 교정대상을 교정할 교정내용을 포함하는 의미 오류 교정 규칙을 생성하는 단계, 그리고 상기 의미 오류 교정 규칙을 의미 오류 교정 규칙 사전에 저장하는 단계를 포함하고, 상기 오류 데이터쌍은 좌우 문맥은 같으나, 일정 길이의 음소차이가 나는 데이터쌍이다.A method for generating a semantic error correction rule for an error correction system operating by at least one processor, the method comprising: extracting error data pairs from a large amount of data; Generating a semantic error correction rule including a correction target, a context, a direction in which a context is searched for from the correction target, and a correction content for correcting the correction target when the context is detected, Wherein the error data pair is a data pair having the same left and right context but with a certain length of phoneme difference.

상기 의미 오류 교정 규칙의 상기 교정대상, 상기 교정내용, 그리고 상기 문맥 각각은 어절, 형태소, 그리고 품사 정보 중 적어도 하나로 기술되고, 상기 품사 정보는 품사 대표 태그와 품사 세부 태그 중 어느 하나일 수 있다.Each of the correction subject, the correction content, and the context of the meaning error correction rule is described by at least one of a word, morpheme, and part of speech information, and the part of speech information may be one of a partly part representative tag and a part of speech tag.

상기 문맥을 탐색하는 방향은 상기 교정대상의 이전에 존재하는 문맥을 탐지하도록 지시하는 제1 방향, 상기 교정대상의 이후에 존재하는 문맥을 탐지하도록 지시하는 제2 방향, 그리고 상기 교정대상의 전후에 존재하는 문맥을 탐지하도록 지시하는 제3 방향 중 어느 하나를 포함할 수 있다.The direction in which the context is searched includes a first direction for instructing to detect a context existing before the object to be corrected, a second direction for instructing to detect a context existing after the object to be corrected, And a third direction for instructing to detect an existing context.

상기 의미 오류 교정 규칙은 규칙 확장 여부를 나타내는 정보를 더 포함하고, 상기 규칙 확장 여부를 나타내는 정보는 해당 규칙에서 명확하게 지정한 경우에 대해서 해당 규칙을 적용하도록 지시하는 제1 정보, 그리고 구문 분석을 통해 지정된 의존 관계에 대해서 해당 규칙을 적용하도록 지시하는 제2 정보를 포함할 수 있다.Wherein the semantic error correcting rule further includes information indicating whether to extend the rule, information indicating whether the rule is extended is first information for instructing the rule to be applied when the rule is explicitly specified, And second information indicating to apply the rule to the specified dependency relation.

상기 의미 오류 교정 규칙 생성 방법은 생성된 특정 의미 오류 교정 규칙에 포함된 특정 교정대상과 특정 교정내용을 교정쌍으로 추출하는 단계, 대량의 데이터로부터 상기 교정쌍의 좌우 문맥이 동일한 확장 오류 데이터쌍을 수집하는 단계, 그리고 상기 확장 오류 데이터쌍의 문맥을 추출하여 상기 교정쌍에 대한 새로운 의미 오류 교정 규칙을 생성하는 단계를 더 포함할 수 있다.The semantic error correction rule generation method comprises the steps of extracting a specific calibration object and a specific calibration content included in the generated specific semantic error correction rule by a calibration pair and extracting an extended error data pair having the same left and right contexts of the calibration pair from a large amount of data And extracting a context of the extended error data pair to generate a new semantic error correcting rule for the corrected pair.

상기 오류 데이터쌍을 추출하는 단계는 n-그램(n-gram) 패턴을 이용하여 대량의 데이터를 정제하고, 정제된 데이터에서 좌우 문맥은 같으나, 일정 길이의 음소차이가 나는 데이터쌍을 상기 오류 데이터쌍으로 추출하거나, 대량의 데이터를 구분 분석하여 획득한 의존 관계 쌍으로부터 상기 오류 데이터쌍을 추출할 수 있다.The step of extracting the erroneous data pair comprises: purifying a large amount of data by using an n-gram pattern; calculating a difference between the erroneous data and the erroneous data, Extracting the pair of error data, or extracting the error data pair from the dependency pair obtained by segmenting and analyzing a large amount of data.

다른 실시예에 따른 적어도 하나의 프로세서에 의해 동작하는 오류 교정 시스템의 의미 오류 교정 방법으로서, 입력문을 어절, 형태소, 품사 정보로 분류하고, 분류한 정보를 기초로 의미 오류 발생 가능성이 있는 상기 입력문의 교정대상을 추출하는 단계, 제1 규칙 사전에서, 상기 교정대상과 상기 입력문에 포함된 문맥에 해당하는 특정 의미 오류 교정 규칙을 추출하는 단계, 그리고 상기 특정 의미 오류 교정 규칙의 교정내용으로 상기 교정대상을 교정하는 단계를 포함한다. 상기 제1 규칙 사전은 상기 특정 의미 오류 교정 규칙을 포함하는 복수의 의미 오류 교정 규칙을 저장하고, 상기 특정 의미 오류 교정 규칙은 상기 교정대상이 상기 문맥과 함께 표시되는 경우, 상기 교정대상을 상기 교정내용으로 교정하는 규칙을 나타낸다. 상기 문맥은 어절, 형태소, 그리고 품사 정보 중 적어도 하나로 표시된다.A semantic error correction method for an error correction system operated by at least one processor according to another embodiment, comprising the steps of: classifying input statements into phrases, morphemes, and parts of speech information; Extracting a specific meaning error correction rule corresponding to the context included in the calibration object and the input statement in the first rule dictionary, extracting a specific meaning error correction rule corresponding to the context included in the input statement, And calibrating the calibration object. Wherein the first rule dictionary stores a plurality of semantic error correction rules including the specific semantic error correction rule and the specific semantic error correction rule is a rule that the correction subject is displayed together with the context, Indicates the rule to correct by content. The context is represented by at least one of a word, morpheme, and part of speech information.

상기 의미 오류 교정 규칙을 추출하는 단계는 상기 입력문의 문맥을 기초로 상기 입력문으로 생성 가능한 적어도 하나의 후보 의미 오류 규칙을 생성하고, 상기 제1 규칙 사전에서 상기 후보 의미 오류 규칙이 존재하는지 탐색하여 상기 후보 의미 오류 규칙에 해당하는 상기 특정 의미 오류 교정 규칙을 추출할 수 있다.Wherein the step of extracting the semantic error correcting rule comprises generating at least one candidate semantic error rule that can be generated by the input statement based on the context of the input statement and searching for the presence of the semantic error rule in the first rule dictionary The specific meaning error correction rule corresponding to the candidate meaning error rule can be extracted.

상기 의미 오류 교정 규칙을 추출하는 단계는 제2 규칙 사전에서, 상기 교정 대상에 관계된 적어도 하나의 일반화된 규칙을 추출하는 단계, 상기 입력문을 분류한 정보를 기초로, 추출된 상기 일반화된 규칙 중에서 상기 입력문의 문맥 구조에 타당한 일반화된 규칙을 선정하는 단계, 상기 입력문을 상기 타당한 일반화된 규칙의 문맥 구조에 적용하여 상기 입력문의 의미 오류 규칙을 생성하는 단계, 그리고 상기 제1 규칙 사전에서, 상기 입력문의 제1 의미 오류 규칙에 해당하는 상기 특정 의미 오류 교정 규칙을 추출하는 단계를 포함할 수 있다. 상기 제2 규칙 사전은 상기 복수의 의미 오류 교정 규칙을 문맥 구조를 나타내는 정보로 일반화시킨 일반화된 규칙을 저장할 수 있다.The step of extracting the semantic error correcting rule may include extracting at least one generalized rule related to the object to be corrected in a second rule dictionary, extracting at least one generalized rule related to the object from among the extracted generalized rules Generating a semantic error rule of the input query by applying the input statement to the context structure of the validated generalized rule; and in the first rule dictionary, And extracting the specific meaning error correction rule corresponding to the first meaning error rule of the input query. The second rule dictionary may store a generalized rule in which the plurality of semantic error correction rules are generalized to information indicating a context structure.

상기 제2 규칙 사전을 생성하는 단계를 더 포함하고, 상기 제2 규칙 사전을 생성하는 단계는 상기 교정 대상에 관계된 적어도 하나의 의미 오류 교정 규칙을 상기 일반화된 규칙으로 변환하고, 각 일반화된 규칙에 포함된 문맥 탐지 방향, 문맥의 수, 그리고 문맥의 종류에 따라 계산된 부호화값을 상기 교정 대상에 대응하여 저장할 수 있다.Generating the second rule dictionary, wherein generating the second rule dictionary comprises converting at least one semantic error correction rule associated with the calibration subject into the generalized rule, and wherein each generalized rule The encoded value calculated according to the context detection direction, the number of contexts, and the type of the context can be stored corresponding to the calibration object.

상기 제2 규칙 사전을 생성하는 단계는 문맥 탐지 방향, 문맥의 수, 그리고 문맥 종류에 따라 이진값이 지정된 이진마스크를 기초로 각 일반화된 규칙의 부호화값을 계산하여 해당 일반화된 규칙의 우선순위를 결정하고, 상기 적어도 하나의 일반화된 규칙을 추출하는 단계는 상기 제2 규칙 사전에서 우선순위에 따라 상기 교정 대상에 관계된 적어도 하나의 부호화값을 추출하고, 추출한 부호화값을 복호하여 일반화된 규칙을 획득할 수 있다.The step of generating the second rule dictionary may include calculating a coding value of each generalized rule on the basis of a binary mask having a binary value according to the context detection direction, the number of contexts, and the type of context, And extracting the at least one generalized rule extracts at least one encoded value related to the object to be corrected in accordance with a priority in the second rule dictionary and decodes the extracted encoded value to obtain a generalized rule can do.

상기 입력문의 문맥 구조에 타당한 규칙을 선정하는 단계는 추출된 상기 일반화된 규칙에서 우선순위가 가장 높은 제1 규칙이 상기 입력문의 문맥 구조에 타당한지 판단하고, 상기 제1 규칙이 상기 입력문의 문맥 구조에 타당하지 않거나, 상기 제1 규칙으로 생성된 상기 입력문의 제1 의미 오류 규칙이 상기 제1 규칙 사전에 존재하지 않는 경우, 추출된 상기 일반화된 규칙 중에서 상기 제1 규칙의 다음 순위의 제2 규칙이 상기 입력문의 문맥 구조에 타당한지 판단할 수 있다.Wherein the step of selecting a valid rule for the context structure of the input statement includes the steps of: determining whether a first rule having the highest priority in the extracted generalized rule is valid in the context structure of the input query; Or if the first semantic error rule of the input query generated by the first rule does not exist in the first rule dictionary, the second rule of the next rank of the first rule out of the extracted generalized rules Can be determined to be appropriate for the context structure of the input query.

상기 특정 의미 오류 교정 규칙을 추출하는 단계 이전에, 상기 입력문의 구문 분석 결과를 기초로 상기 입력문에 임의의 의미 오류 교정 규칙을 적용할 수 있는지 판단하는 단계를 더 포함할 수 있다.The method may further include determining whether an arbitrary semantic error correcting rule can be applied to the input sentence based on a result of analyzing the syntax of the input sentence before extracting the specific semantic correcting rule.

또 다른 실시예에 따른 적어도 하나의 프로세서에 의해 동작하는 오류 교정 시스템으로서, 대량의 데이터로부터 좌우 문맥은 같으나, 일정 길이의 음소차이가 나는 데이터쌍을 오류 데이터쌍으로 추출하는 오류 데이터쌍 수집 장치, 그리고 상기 오류 데이터쌍들 각각을 기초로 의미 오류가 발생한 교정대상, 문맥, 상기 교정대상으로부터 문맥을 탐색하는 방향, 그리고 상기 문맥이 탐지되면 상기 교정대상을 교정할 교정내용을 포함하는 의미 오류 교정 규칙을 생성하는 의미 오류 교정 규칙 기술 장치, 그리고 상기 의미 오류 교정 규칙 기술 장치에서 기술된 의미 오류 교정 규칙들을 기초로 입력문의 오류를 교정하는 의미 오류 교정 장치를 포함한다.According to another embodiment of the present invention, there is provided an error correction system operating by at least one processor, the error correction system comprising: an error data pair collecting device for extracting, from a large amount of data, data pairs having the same left and right context, And a semantic error correcting rule including a correcting object in which a semantic error occurred, a context, a direction in which a context is searched from the object to be corrected, and a correction to be corrected when the context is detected, And a semantic error correcting device for correcting the error of the input query based on the semantic error correcting rules described in the semantic correcting rule description device.

상기 의미 오류 교정 규칙 기술 장치는 상기 교정대상, 상기 교정내용, 그리고 상기 문맥 각각을 어절, 형태소, 그리고 품사 정보 중 적어도 하나로 기술하고, 상기 품사 정보는 품사 대표 태그와 품사 세부 태그 중 어느 하나일 수 있다.The semantic error correcting rule description device may describe each of the calibration object, the calibration content, and the context as at least one of a word phrase, morpheme, and part of speech information. The part of speech information may be one of a part- have.

상기 의미 오류 교정 규칙 기술 장치는 상기 문맥을 탐색하는 방향을 상기 교정대상의 이전에 존재하는 문맥을 탐지하도록 지시하는 제1 방향, 상기 교정대상의 이후에 존재하는 문맥을 탐지하도록 지시하는 제2 방향, 그리고 상기 교정대상의 전후에 존재하는 문맥을 탐지하도록 지시하는 제3 방향 중 어느 하나로 기술할 수 있다.Wherein the semantic error correcting rule description device comprises: a first direction for instructing to search for a context existing before the object to be corrected; a second direction for instructing to detect a context existing after the object to be corrected; And a third direction for instructing detection of a context existing before and after the object to be calibrated.

상기 의미 오류 교정 장치는 상기 입력문을 어절, 형태소, 품사 정보로 분류하고, 분류한 정보를 기초로 의미 오류 발생 가능성이 있는 상기 입력문의 교정대상을 추출하며, 상기 의미 오류 교정 규칙 기술 장치에서 기술된 의미 오류 교정 규칙들 중에서, 상기 입력문의 교정대상과 상기 입력문에 포함된 문맥에 해당하는 특정 의미 오류 교정 규칙을 추출하고, 상기 특정 의미 오류 교정 규칙의 교정내용으로 상기 입력문의 교정대상을 교정할 수 있다.The semantic error correcting apparatus classifies the input sentence into a phrase, morpheme, and part-of-speech information, extracts a subject of correction of the input sentence that is likely to cause a semantic error based on the classified information, Extracting a specific meaning error correction rule corresponding to a context included in the input subject and a correction target of the input query from the meaning correction correction rules and correcting the correction subject of the input query with the correction meaning of the specific meaning error correction rule can do.

상기 의미 오류 교정 장치는 상기 입력문의 문맥을 기초로 상기 입력문으로 생성 가능한 적어도 하나의 후보 의미 오류 규칙을 생성하고, 상기 의미 오류 교정 규칙 기술 장치에서 기술된 의미 오류 교정 규칙들 중에서, 상기 후보 의미 오류 규칙에 해당하는 상기 특정 의미 오류 교정 규칙을 추출할 수 있다.Wherein the semantic error correcting device generates at least one candidate semantic error rule that can be generated by the input statement based on the context of the input statement and selects one of the semantic error correction rules described in the semantic error correction rule description device, The specific meaning error correction rule corresponding to the error rule can be extracted.

상기 의미 오류 교정 시스템은 상기 의미 오류 교정 규칙 기술 장치에서 기술된 의미 오류 교정 규칙들을 저장하는 제1 규칙 사전, 그리고 상기 제1 규칙 사전에 저장된 의미 오류 교정 규칙들을 문맥 구조를 나타내는 정보로 일반화시킨 일반화된 규칙을 저장하는 제2 규칙 사전을 더 포함할 수 있다. 상기 의미 오류 교정 장치는 제2 규칙 사전에서, 상기 입력문의 교정 대상에 관계된 적어도 하나의 일반화된 규칙을 추출하고, 상기 입력문을 분류한 정보를 기초로, 추출된 상기 일반화된 규칙 중에서 상기 입력문의 문맥 구조에 타당한 일반화된 규칙을 선정한 후, 상기 입력문을 상기 타당한 일반화된 규칙의 문맥 구조에 적용하여 상기 후보 의미 오류 규칙을 생성할 수 있다.Wherein the semantic error correction system includes a first rule dictionary storing semantic error correction rules described in the semantic error correction rule description device and a generalized rule of semantic error correction rules stored in the first rule dictionary as information representing a context structure And a second rule dictionary that stores the rules that have been generated. Wherein the semantic error correcting device extracts at least one generalized rule relating to an object of correction of the input query in a second rule dictionary and extracts at least one generalized rule related to the input query from among the extracted generalized rules, The candidate semantic error rule can be generated by selecting a generalized rule suitable for the context structure and then applying the input statement to the context structure of the validated generalized rule.

상기 제2 규칙 사전은 각 일반화된 규칙에 포함된 문맥 탐지 방향과 문맥 구조에 이진값이 부여되어 부호화된 부호화값을 해당 일반화된 규칙에 대응하여 저장하고, 각 일반화된 규칙은 문맥 탐지 방향과 문맥 구조에 따라 결정된 우선순위를 나타내도록 이진마스크에서의 이진값이 부여될 수 있다.The second rule dictionary stores a coded value corresponding to a context detection direction and a context structure included in each generalized rule, and stores the coded value corresponding to the generalized rule, and each generalized rule stores a context detection direction and a context A binary value in the binary mask may be given to indicate the priority determined according to the structure.

본 발명의 실시예에 따르면 대용량 데이터를 이용하여 대량의 의미 오류 후보군 및 의미 오류 데이터쌍을 수집할 수 있어서, 정확도가 높은 의미 오류 규칙을 생성할 수 있다. 본 발명의 실시예에 따르면 품사 정보를 문맥으로 활용하여 규칙을 기술하므로, 최소의 규칙으로 다양한 오류를 교정할 수 있으므로 시간과 비용을 절감할 수 있다. 본 발명의 실시예에 따르면 의미 오류 교정 규칙에 기반한 오류 교정의 재현율을 높일 수 있고, 특히 품사 정보가 문맥으로 활용하여 문장의 길이 및 구성에 다양하게 적용할 수 있어 오류 교정의 재현율을 높일 수 있다.According to the embodiment of the present invention, it is possible to collect a large number of semantic error candidates and semantic error data pairs using a large amount of data, thereby generating semantic error rules with high accuracy. According to the embodiment of the present invention, because the rules are described using the parts-of-speech information as a context, various errors can be corrected with a minimum rule, thereby saving time and cost. According to the embodiment of the present invention, the recall rate of the error correction based on the semantic error correcting rule can be increased, and in particular, the speech recall rate can be increased by variously applying the part of speech information to the context and the length of the sentence structure .

도 1은 본 발명의 한 실시예에 따른 오류 교정 시스템의 개략적인 구성도이다.
도 2는 본 발명의 한 실시예에 따른 규칙 탐색 경로를 설명하는 예시 도면이다.
도 3은 본 발명의 한 실시예에 따른 의미 오류 교정 장치의 의미 오류 교정 방법의 흐름도이다.
도 4는 본 발명의 한 실시예에 따른 오류 교정 시스템의 오류 교정 방법의 흐름도이다.1 is a schematic block diagram of an error correction system according to an embodiment of the present invention.
2 is an exemplary diagram illustrating a rule search path according to an embodiment of the present invention.
3 is a flowchart of a semantic error correction method of a semantic error correcting apparatus according to an embodiment of the present invention.
4 is a flowchart of an error correction method of the error correction system according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise. Also, the terms " part, "" module," and " module ", etc. in the specification mean a unit for processing at least one function or operation and may be implemented by hardware or software or a combination of hardware and software have.

도 1은 본 발명의 한 실시예에 따른 오류 교정 시스템의 개략적인 구성도이고, 도 2는 본 발명의 한 실시예에 따른 규칙 탐색 경로를 설명하는 예시 도면이다.FIG. 1 is a schematic configuration diagram of an error correction system according to an embodiment of the present invention, and FIG. 2 is an exemplary diagram illustrating a rule search path according to an embodiment of the present invention.

도 1을 참고하면, 적어도 하나의 프로세서에 의해 동작하는 오류 교정 시스템(앞으로, 간단히 '오류 교정 시스템'이라고 한다)(10)은 규칙 기반 의미 오류 교정 기술을 이용하여 한국어 맞춤법 오류를 교정하는 장치로서, 대용량 로그에서 오류 후보군 및 교정쌍을 수집하고(규칙발굴), 교정쌍을 이용하여 의미오류 교정 규칙 확장하며(규칙확장), 품사정보를 문맥정보에 활용하여 최소의 규칙 기술로 많은 교정을 하고(규칙 일반화), 오류 교정 규칙을 부호화(인코딩)하여 매칭되는 규칙을 빨리 추출할 수 있다.Referring to FIG. 1, an error correction system (hereinafter simply referred to as an 'error correction system') 10 operated by at least one processor is a device for correcting Korean spelling errors using a rule-based semantic error correction technique , We collect error candidates and correction pairs in the large log (find out the rules), extend the semantic error correction rule by using the correction pair (rule extension), use the part of speech information in the context information, (Rule generalization), error correction rules are encoded (encoded), and matching rules can be extracted quickly.

이를 위해, 오류 교정 시스템(10)은 오류 데이터쌍 수집 장치(100), 의미 오류 교정 규칙 기술 장치(200), 그리고 의미 오류 교정 장치(300)를 포함한다. 오류 교정 시스템(10)은 의미 오류 교정 규칙 기술 장치(200)에서 기술된 규칙들을 저장하는 의미 오류 교정 규칙 사전(간단히, '규칙 사전'이라고 한다)(400)을 더 포함할 수 있다. 오류 교정 시스템(10)은 의미 오류 교정 규칙을 문맥 구조를 나타내는 정보로 변환한 일반화된 의미 오류 규칙(간단히 '일반화된 의미 오류 교정 규칙'이라고 한다)과 이의 부호화값을 저장하는 일반화된 의미 오류 규칙 사전(간단히, '일반화된 규칙 사전'이라고 한다)(500)을 더 포함할 수 있다. To this end, the error correction system 10 includes an error data pair collection device 100, a semantic error correction rule description device 200, and a semantic error correction device 300. The error correction system 10 may further include a semantic error correction rule dictionary (simply referred to as a " rule dictionary ") 400 for storing the rules described in the semantic error correction rule description device 200. [ The error correction system 10 includes a generalized meaning error rule (simply referred to as a generalized meaning error correction rule) obtained by converting a semantic error correction rule into information representing a context structure, and a generalized semantic error rule (Simply referred to as a " generalized rule dictionary ") 500.

오류 교정 시스템(10)은 의미 오류 교정과 더불어, 입력문의 전처리 검사, 형태소 분석, 띄어쓰기 오류 검사, 철자 오류 검사, 문법 오류 검사, 그리고 확률 기반 검사 중 적어도 하나의 검사 및 교정을 하는 기본 오류 교정 장치(600)를 더 포함할 수 있다. 의미 오류 교정 장치(300)와 기본 오류 교정 장치(600)가 연동하여 최종 오류 교정된 출력문을 생성할 수 있는데, 이들의 연동은 오류 교정 순서에 따라 다양하게 설정될 수 있고, 의미 오류 교정 장치(300)와 기본 오류 교정 장치(600)는 하나의 장치로 구현될 수도 있다. 설명에서는 의미 오류 교정 장치(300)의 동작을 주로 설명한다.The error correction system 10 includes a basic error correction device 10 for performing at least one of a preprocessing check of input inquiry, morpheme analysis, spacing error check, spelling error check, grammar error check, and probability based check, (600). The semantic error correcting apparatus 300 and the basic error correcting apparatus 600 may be interlocked to generate a final error corrected output statement. The interworking of the semantic error correcting apparatus 300 and the basic error correcting apparatus 600 may be variously set in accordance with the error correcting order, 300 and the basic error correcting apparatus 600 may be implemented as a single apparatus. In the description, the operation of the semantic error correcting apparatus 300 will be mainly described.

오류 데이터쌍 수집 장치(100)는 대량의 비정형 문서에서 오류 데이터쌍을 수집한다. 오류 데이터쌍으로부터 오류 교정 규칙이 생성된다. 오류 데이터쌍 수집 장치(100)는 대량의 데이터에서 n-gram 패턴(예를 들면, 5-gram)을 이용하여 데이터를 정제한다. 오류 데이터쌍 수집 장치(100)는 정제된 데이터에서 좌우 문맥은 같으나, 편집거리 1(음소차이 1)인 오류 데이터쌍을 수집한다. 예를 들면, 표 1과 같은 오류 데이터쌍이 수집될 수 있고, 오류 데이터쌍으로부터 규칙이 생성된다. 수집된 데이터쌍에서 의미 있는 오류 교정 규칙이 도출된다. 이때, 규칙은 기계에 의해 자동으로 생성되거나 또는 맞춤법에 능통한 언어 전문가에 의해 생성될 수 있다.The error data pair collection device 100 collects error data pairs in a large amount of unstructured documents. Error correction rules are generated from the error data pairs. The error data pair collecting apparatus 100 refines data using an n-gram pattern (e.g., 5-gram) in a large amount of data. The error data pair collecting apparatus 100 collects error data pairs having the same left and right context but the editing distance 1 (phoneme difference 1) in the refined data. For example, error data pairs as shown in Table 1 can be collected and rules are generated from error data pairs. A meaningful error correction rule is derived from the collected data pairs. At this time, the rules may be generated automatically by the machine or may be generated by spelling-qualified language experts.

오류 데이터쌍Error data pair 오류 교정 규칙Error correction rules 1One 어머니의 만두 빗는 손놀림이 어찌나
어머니의 만두 빚는 손놀림이 어찌나My mother's dumpling comb is in her hands
My mother's debt to the dumplings is so handsome '빗는'이라는 어절 앞에 '만두'라는 문맥이 있을 때는 '빚는'으로 교정하는 규칙을 생성When there is a context of 'dumpling' in front of the word 'combing', it creates a rule to correct it as 'owed' 22 어깨에 가방을 매고 이른 아침
어깨에 가방을 메고 이른 아침And every morning a shoulder bag
Mail a shoulder bag and early morning '매고'라는 어절 앞에 '가방'라는 문맥이 있을 때는 '메고'로 교정하는 규칙을 생성If there is a context called 'bag' in front of the word 'hanging', create a rule to correct it with 'hanging' 33 처음에는 선홍색을 띄다가 차츰 양이
처음에는 선홍색을 띠다가 차츰 양이At first, this approach amounts gradually assuming its cerise
Initially, the band approached the amount of scarlet is gradually '띄다가'라는 어절 앞에 '선홍색'이라는 문맥이 있을 때는 '띠다가'로 교정하는 규칙을 생성If there is a context of 'reddish' in front of the word 'stand up', create a rule that corrects it with 'reddish'

오류 데이터쌍 수집 장치(100)는 한국어 구문 분석기(파서)로 대량의 데이터를 구분 분석하고, 의존(header-dependent) 관계 쌍으로부터 오류 데이터쌍을 추출할 수 있다. 오류 데이터쌍 수집 장치(100)는 오류 데이터쌍을 기초로 오류 교정 규칙을 생성한다. 한국어 구문 분석기의 구문 분석 표지는 표 2와 같다. 표 2에서, '←'는 어절간 의존 관계임을 나타내고, '의존소 ← 지배소'의 형식으로 표현된다.The error data pair collecting apparatus 100 can classify and analyze a large amount of data with a Korean parser (parser) and extract error data pairs from a pair of header-dependent relations. The error data pair collecting apparatus 100 generates error correction rules based on the error data pair. Table 2 shows the syntax of the Korean parser. In Table 2, '←' indicates dependency between words and is expressed in the form of 'dependency ← dominant'.

표지sign 범주category 예시1Example 1 예시2Example 2 NPNP 체언구The 체언(명사, 대명사, 수사)Clan (noun, pronoun, investigation) 학교가, 그를, 셋으로School, him, three VPVP 용언구 Yongin Ward 용언(동사, 형용사, 보조 용언)Verb (verb, adjective, auxiliary verb) 먹었다, 따뜻하다, 싶다Eat, warm, I want VNPVNP 긍정 지정사구Positive Designation Dune 긍정 지정사 '이다'Yes, 아들이다, 사람이고, 물건이야It's a son, a person, a thing. SBJSBJ 주어subject 주격 체언구 NP_SBJ
주격 명사 전성 용언구 VP_SBJNarcissistic sphere NP_SBJ
Noun Noun 학교가←좋다. NP_SBJ
공부하기가←힘들다. VP_SBJSchool is good ←. NP_SBJ
← It is hard to study. VP_SBJ OBJOBJ 목적어direct object 목적격 체언구 NP_OBJ
목적격 명사 전성 용언구 VP_OBJObjective Clan NP_OBJ
Destination Noun Destiny Noun VP_OBJ 당신을←사랑해요. NP_OBJ
아름다움을←추구한다. VP_OBJI love you ←. NP_OBJ
Pursue beauty ←. VP_OBJ MODMOD 체언 수식어Censor 관형격 체언구 NP_MOD
관형형 용언구 VP_MODTube-type cymbal NP_MOD
Tubular spoken word VP_MOD 그녀의←동생을 NP_MOD
헤어진←연인들이 VP_MODHer ← sister NP_MOD
Separated ← Lovers VP_MOD AJTAJT 용언 수식어Vernacular 부사격 체언구 NP_AJTAffirmative spearhead NP_AJT 서울에서←살았다. NP_AJTI lived in Seoul ←. NP_AJT

오류 데이터쌍 수집 장치(100)는 대량의 데이터에서 체언구(의존소)-용언구(지배소) 관계로 구문 분석된 데이터쌍을 추출한다. 오류 데이터쌍 수집 장치(100)는 구문 분석 표지가 체언에 관련된 NP(체언구), NP_SBJ(주격 체언구), NP_OBJ(목적격 체언구)인 경우, 지배소는 동일하고 의존소가 편집거리 1인 데이터쌍, 또는 의존소는 동일하고 지배소가 편집거리 1인 오류 데이터쌍을 수집한다. The error data pair collecting apparatus 100 extracts a pair of data parsed from a large amount of data in a cognitive phrase (dependency) -fundamental (dominant) relation. The error data pair collecting apparatus 100 may be configured such that when the parsing mark is NP (shorthand), NP_SBJ (nominal shorthand), NP_OBJ (destination shorthand) related to Chennai, Data pairs, or dependencies, are the same and the dominator collects an error data pair with an edit distance of 1.

오류 데이터쌍 수집 장치(100)는 대량의 데이터에서 용언구(의존소)-체언구(지배소) 관계로 구문 분석된 데이터쌍을 추출한다. 오류 데이터쌍 수집 장치(100)는 구문 분석 표지가 체언을 수식하는 VP_MOD (관형형 용언구)인 경우, 지배소는 동일하고 의존소가 편집거리 1인 오류 데이터쌍을 수집한다. The error data pair collecting apparatus 100 extracts a pair of data parsed from a large amount of data in a dependency relation (dependency) - dependency relation (dominant) relationship. The error data pair collecting apparatus 100 collects error data pairs whose dominant is the same and whose dependency is the edit distance 1 when the parsing mark is VP_MOD (cube phrase) modifying the cognition.

오류 데이터쌍 수집 장치(100)는 대량의 데이터에서 체언구(의존소)-체언구(지배소) 관계로 구문 분석된 데이터쌍을 추출한다. 구문 분석 표지가 체언에 관련된 NP(체언구), NP_SBJ(주격 체언구), NP_OBJ(목적격 체언구), NP_MOD(관형격 체언구)인 경우, 지배소는 동일하고 의존소가 편집거리 1인 데이터쌍, 또는 의존소는 동일하고 지배소가 편집거리 1인 오류 데이터쌍을 수집한다.The error data pair collecting apparatus 100 extracts a pair of data parsed from a large amount of data in a cognitive sphere (dependent) - cognitive sphere (dominant) relationship. If the parse markup is NP (sibling), NP_SBJ (nominal sphere), NP_OBJ (objective sphere), NP_MOD (nominal sphere) related to the cognition, , Or the dependency element is the same and the dominant element has an edit distance of 1.

체언구(의존소)-용언구(지배소) 관계로 구문 분석된 데이터쌍, 용언구(의존소)-체언구(지배소) 관계로 구문 분석된 데이터쌍, 체언구(의존소)-체언구(지배소) 관계로 구문 분석된 오류 데이터쌍은 예를 들면 표 3과 같다.Data pairs parsed in relation to the cognition (dependency) - spoken (dominant) relation, data pairs parsed in relation to the verb phrase (dependency) - cognitive phrase (dependent) The error data pairs parsed in relation to the old (dominant) are shown in Table 3, for example.

오류 데이터쌍Error data pair 설명Explanation 체언구(의존소)-용언구(지배소) 관계Cheng Wu (Dependency) - Yonghui (Domination) Relationship 속 썩히고
속 썩이고Rotting
It's fast. NP,속(의존소), 썩히고/썩이고(지배소),
의존소는 동일하고 지배소가 편집 거리 1인 데이터 쌍NP, the genus (dependent), the rotten / rotten (dominant)
Data pairs with the same dependency and whose dominant position is edit distance 1 낮이 익다
낯이 익다The day is ripe.
Be familiar with NP_SBJ, 낮이/낯이(의존소),
익다(지배소),
지배소는 동일하고 의존소가 편집 거리 1인 데이터 쌍NP_SBJ, Day / Native (dependency),
Idi (the ruling cattle),
A data pair with the same dominion and a dependency of edit distance 1 화를 삭히다
화를 삭이다Remove one's anger
Fry NP_OBJ, 화를(의존소),
삭이다/삭히다(지배소),
의존소는 동일하고 지배소가 편집 거리 1인 데이터 쌍NP_OBJ, anchor (dependency),
It is called "
Data pairs with the same dependency and whose dominant position is edit distance 1 용언구(의존소)-체언구(지배소) 관계Yonghwang (Dependency) - Chengnung (domination) relationship 헤어진 연인
해어진 연인A broken lover
Lover VP_MOD, 헤어진/해어진(의존소), 연인(지배소)
지배소는 동일하고 의존소가 편집 거리 1인 데이터 쌍VP_MOD, broken / broken (dependent), lover (dominant)
A data pair with the same dominion and a dependency of edit distance 1 쳐진 눈과
처진 눈과With struck eyes
With drooping eyes VP_MOD, 쳐진/처진(의존소), 눈과(지배소)
지배소는 동일하고 의존소가 편집 거리 1인 데이터 쌍VP_MOD, struck / defeated (dependent), eye and (dominant)
A data pair with the same dominion and a dependency of edit distance 1 절인 배추를
저린 배추를Pickled cabbage
Chinese cabbage VP_MOD, 절인/저린(의존소), 배추를(지배소)
지배소는 동일하고 의존소가 편집 거리 1인 데이터 쌍VP_MOD, pickle / girin (dependence cattle), cabbage (dominant cattle)
A data pair with the same dominion and a dependency of edit distance 1 체언구(의존소)-체언구(지배소) 관계Ching Ward (Dependency) - Ching Ward (Ching Ward) Relationship 생태계를 보존
생태계를 보전Preserve ecosystem
Conserving ecosystems NP_OBJ, 생태계를(의존소), 보존/보전(지배소), 의존소는 동일하고 지배소가 편집 거리 1 차이 나는 데이터 쌍NP_OBJ, ecosystem (dependent), conservation / conservation (dominant), dependent 밀집 모자
밀짚 모자Dense hat
Straw hat NP, 밀집/밀짚(의존소), 모자(지배소), 지배소는 동일하고 의존소가 편집 거리 1 차이 나는 데이터 쌍NP, dense / straw (dependency), cap (dominant), dominance is the same, 건축물의 자제
건축물의 자재Restraint of architecture
Materials of building NP_MOD, 건축물의(의존소), 자제/자재(지배소), 의존소는 동일하고 지배소가 편집 거리 1차이 나는 데이터 쌍NP_MOD, structure (dependency), constraint / material (control), dependency are the same,

오류 데이터쌍 수집 장치(100)는 오류 데이터쌍을 늘리기 위해, 대량의 데이터에서, 이미 수집된 교정쌍에 대한 n-gram 데이터를 수집한다. 그리고 오류 데이터쌍 수집 장치(100)는 교정쌍에 대한 n-gram 데이터를 이용하여 오류 교정 규칙을 확장한다. The error data pair collecting apparatus 100 collects n-gram data for a calibration pair already collected, in a large amount of data, in order to increase the error data pair. The error data pair collecting apparatus 100 then extends the error correction rules using n-gram data for the correction pair.

표 4를 참고하면, 수집된 오류 데이터쌍이 '만두'라는 문맥 뒤에 있는 '빗는'과 '빚는'일 때, '빗는'이라는 어절 앞에 '만두'라는 문맥이 있을 때는 '빚는'으로 교정하는 규칙이 생성된다. 여기서 교정쌍은 <빗는, 빚는>이다. 오류 데이터쌍 수집 장치(100)는 이 교정쌍 '빗는'과 '빚는'의 좌우 문맥이 동일한 데이터쌍을 수집하여 문맥 정보(예를 들면, '머리')를 추가로 발굴할 수 있다. 오류 데이터쌍 수집 장치(100)는 새로운 문맥 정보를 포함하는 오류 교정 규칙(예를 들면, '빚는'이라는 어절 앞에 '머리'라는 문맥이 있을 때는 '빗는'으로 교정하는 규칙)을 추가한다. 이렇게, 오류 데이터쌍 수집 장치(100)는 이미 수집한 오류 데이터쌍/교정쌍의 문맥 정보를 추가로 발굴하여 오류 교정 규칙을 늘린다. 이는 오류 데이터쌍 수집 장치(100)가 대량의 데이터로부터 오류 데이터쌍을 수집할 수 있기 때문에 가능하다.Table 4 shows that when the collected error data pairs are 'comb' and 'borrow' behind the context of 'dumplings', and when there is a context of' buns' in front of the word 'comb', ' . The correction pair here is <comb, borrow>. The error data pair collecting apparatus 100 can further extract the context information (for example, 'head') by collecting the data pairs in which the left and right contexts of this calibration pair 'comb' and 'borrow' are the same. The error data pair collecting apparatus 100 adds an error correction rule including new context information (for example, a rule of correcting 'comb' when there is a context of 'head' before the word 'borrow'). In this way, the error data pair collecting apparatus 100 further increases the error correction rule by further extracting the context information of the error data pair / correction pair already collected. This is possible because the error data pair collecting apparatus 100 can collect error data pairs from a large amount of data.

오류 데이터쌍 및 규칙Error data pairs and rules 추가된 오류 데이터쌍Added error data pairs 추가된 오류 교정 규칙Added error correction rules 1One 빗는-빚는,
규칙: '빗는'이라는 어절 앞에 '만두'라는 문맥이 있을 때는 '빚는'으로 교정Comb - borrowing,
Rule: When there is a context of 'buns' in front of the word 'comb', it is corrected by 'borrowing' 머리 빗는 모습
머리 빚는 모습Hair comb
Head shape '빚는'이라는 어절 앞에 '머리'라는 문맥이 있을 때는 '빗는'으로 교정하는 규칙을 생성When there is a context of 'head' in front of the phrase 'borrowed,' create a rule to correct it with 'comb' 22 메고-매고,
규칙: '매고'라는 어절 앞에 '가방'이라는 문맥이 있을 때는 '메고'로 교정Mago - Mago,
Rule: When there is a context of 'bag' in front of the word 'hanging', it is corrected by 'hanging' 밭을 메고 있었다.
밭을 매고 있었다.I was carrying the field.
I was tying the field. '메고'라는 어절 앞에 '밭'이라는 문맥이 있을 때는 '매고'로 교정하는 규칙을 생성When there is a context of 'field' in front of the word 'Meggo', it creates a rule to correct it with 'tagged'

의미 오류 교정 규칙 기술 장치(200)는 오류 데이터쌍 수집 장치(100)에서 수집된 오류 데이터쌍의 의미 오류 교정 규칙을 의미 오류 교정 장치(300)가 참조할 수 있도록 기술한다. 의미 오류 교정 규칙 기술 장치(200)는 규칙 사전(400)에 의미 오류 교정 규칙을 저장한다. 의미 오류 교정 규칙은 [교정대상(key)] [문맥 탐색 방향] [문맥 기술(context)] [교정내용(value)] [규칙 확장여부]로 표현되고, 이들 중 어느 규칙 항목은 생략될 수 있다. 의미 오류 교정 장치(300)는 의미 오류 교정 규칙을 통해, 교정대상(key)을 포함하는 입력문이 문맥(context)을 포함하면, 의미 오류라고 판단하고, 교정대상을 교정내용(value)으로 변경하여 의미 오류를 교정한다.The semantic error correcting rule description device 200 describes semantic error correcting rules of error data pairs collected by the error data pair collecting device 100 so that the semantic error correcting device 300 can refer to them. The semantic error correction rule description device 200 stores semantic error correction rules in the rule dictionary 400. Meaning error correction rules are expressed as [key] [context search direction] [context] [correction contents (value)] [whether to extend rules], and any of these rule items may be omitted . If the input statement including the key includes a context through the semantic error correcting rule, the semantic error correcting apparatus 300 determines that the semantic error is generated and changes the object to be corrected to a value To correct for semantic errors.

의미 오류 교정 규칙 기술 장치(200)는 어절이나 형태소 단위의 교정대상(key)을 기술할 수 있다. 교정대상이 '어절'로 기술된 의미 오류 교정 규칙은 규칙에 포함된 교정대상과 동일한 어절에 대해서만 적용될 수 있다. 교정대상이 '형태소'로 기술된 의미 오류 교정 규칙은 규칙에 포함된 교정대상의 모든 활용형에 적용될 수 있다. 표 5를 참고하면, 의미 오류 교정 규칙은, [교정대상(key)] [문맥 탐색 방향] [문맥 기술(context)] [교정내용(value)] [규칙 확장 여부]로 표현된다. [문맥 기술(context)]은 교정대상을 교정내용으로 교정하기 위해 발견되어야 하는 문맥을 의미한다. [교정대상(key)]이나 [문맥 기술(context)]의 태그 기술란에 사용된 '*'는 형태소의 모든 품사가 올 수 있음을 의미한다. 규칙에 붙는 'C', 'P'는 규칙 확장 여부를 나타내는 값으로서, 'C'는 이전, 이후 명확하게 지정한 경우에 대해서 의미 오류 교정 규칙을 적용하는 것을 나타내고, 'P'는 구문 분석을 통해 의미 오류 교정 규칙을 확장하여 적용하는 것을 나타낸다.Meaning Error Correction Rule The description device 200 can describe a correction key of a word or morpheme unit. Meaning that the subject to be corrected is described as 'word'. The error correction rule can be applied only to the same word as the correction subject included in the rule. Meaning that the subject of correction is described as 'morpheme'. Error correction rules can be applied to all types of proofs included in the rule. Referring to Table 5, the semantic error correction rules are expressed as [key] [context search direction] [context] [correction value] [whether to extend the rule]. [Context] refers to the context that should be found in order to calibrate the calibration object with the calibration contents. The '*' in the tag description field of the [key] or [context] means that all parts of the morpheme can come from. 'C' and 'P' attached to the rule indicate whether to extend the rule. 'C' indicates that the semantic error correcting rule is applied to the case where the rule is explicitly specified before and after the 'C' Meaning It indicates that error correction rule is extended and applied.

계속 표 5를 참고하면, 어절 '해어진' 뒤에 '연인'이라는 문맥이 나오면, '해어진'을 '헤어진'으로 교정하는 규칙은 '[해어진 어절 *] [이후] [형태소=연인 *] [헤어진 *] [C]'로 기술될 수 있다. 이 의미 오류 교정 규칙은, '해어진 연인들이'라는 표현은 '헤어진 연인들이'로 교정할 수 있지만, '해어진'과 동일하지 않은 '해어지는 연인들이'는 '헤어지는 연인들이'로 교정할 수 없다. 어간 '빗다' 앞에 '만두'이라는 문맥이 나오면, '빗다'를 '빚다'로 교정하는 규칙은 '[빗다 형태소 *] [이전] [형태소=만두 *] [빚다 *] [C]'로 기술될 수 있다. 이 의미 오류 교정 규칙은 '빗다'를 어간으로 가지는 '빗는다', '빗고', '빗어서', '빗으니까', '빗었는데' 의 모든 활용형 앞에 '만두'가 나오면 '빚다'의 활용형으로 교정할 수 있다.If we refer to Table 5, the rule of correcting 'disappeared' to 'disjointed' when the phrase 'lover' appears after the word 'disappeared'is' Can be described as [disjointed *] [C] '. This means error correction rules, the expression "Hare true lovers" is corrected to "lovers that Hare" is not the same and can be corrected as "two ex lovers', 'Hare Jean," but is "breaking up lovers' Can not. The rule that corrects' Bidda 'to' Bidda 'when it comes to the context of' Mandu 'in front of the stem' Bidda 'is described as' [Bidda morpheme *] [previous] [Morpheme = Mandu *] [Bidda *] [C] . The meaning of this error correction rule is to use 'Bidda', 'Bid', 'Comb', 'Comb', and 'Bid' .

기술 방법Technical Method 의미 오류 교정 규칙 (교정대상 설명용)Meaning Error Correction Rule (for explanation to be corrected) 일반 기술General technology [교정대상(key 어절/형태소)] [문맥 탐색 방향(이전/이후/전후)] [문맥 기술(context)] [교정내용(value)] [규칙 확장여부(C/P)] [Keyword word / morpheme] [Context search direction (before / after / after)] [context] [correction value] [whether to extend the rule (C / P)] 어절Eulogy [해어진 어절 *] [이후] [형태소=연인 *] [헤어진 *] [C][Declining] * [after] [morpheme = lover *] [separated *] [C] 형태소morpheme [빗다 형태소 *] [이전] [형태소=만두 *] [빚다 *] [C][Criminal morpheme *] [previous] [morpheme = dumplings *] [cuddle *] [C]

의미 오류 교정 규칙 기술 장치(200)는 [문맥 기술]에 어절, 형태소, 대표태그, 세부태그를 병행하여 문맥으로 기술할 수 있다. 문맥을 어절이나 형태소로만 특정하지 않고, 품사 정보를 문맥으로 기술할 수 있기 때문에, 적은 규칙으로 많은 오류 문장에 대응할 수 있다.Meaning Error Correction Rule The description device 200 can describe a phrase, a morpheme, a representative tag, and a detail tag in a context in [context description]. Since the context can be described in a context without specifying a word or morpheme only, it is possible to deal with many error sentences with few rules.

품사 정보는 한글 형태소의 품사를 체언, 용언, 관형사, 부사, 감탄사, 조사, 어미, 접사, 어근, 부호 등으로 나누어 표현한다. 품사 정보는 그 단어의 품사를 나타내는 대표 태그, 그리고 같은 품사 안에서도 좌우 접속 정보 등 더 많은 정보를 나타내는 세부 태그가 있다. 세부 태그는 형태소 분석 단계에서 사용하는 태그로서, 같은 품사에 대해서도 다양한 종류의 세부 태그가 존재한다. 예를 들면, '이순신, 서울, 뉴욕'은 대표 태그는 '고유명사'로 동일하지만, 각각 '국내인명, 국내지명, 해외지명'과 같은 세부 태그를 가지게 된다.Part-of-speech information is expressed by dividing the part of speech in Korean morpheme into clan, verb, adjective, adverb, exclamation, investigation, ending, affix, root, and code. The part of speech information includes a representative tag indicating the part of speech of the word and a detailed tag indicating more information such as left and right access information even in the same part of speech. The detailed tag is a tag used in the morpheme analysis stage. There are various types of detailed tags for the same part of speech. For example, 'Yi Soon, Seoul and New York' have the same tags as 'proper noun' but they have detailed tags such as 'domestic person, domestic place, overseas place'.

품사 정보로 문맥을 기술하는 규칙은 표 6과 같이 표현될 수 있다. 문맥 탐색 방향은 교정대상의 이전, 이후, 전후 방향으로 정할 수 있다. 문맥은 복수 개(n개) 정의될 수 있고, 각 문맥은 어절, 형태소, 대표 태그, 세부 태그 중 어느 하나로 표현된다. 각 문맥은 [문맥x의 기술 방법(어절/형태소/대표태그/세부태그)]=[문맥x]으로 기술되고, 이들이 복수 개 연결되어 교정대상에 관련된 전체 문맥을 정의한다. 한편, 교정대상의 방향이 '전후 방향'일 경우에는 '|'로 구분하여 문맥의 이전, 이후를 기술할 수 있다.The rules describing the context as part of speech information can be expressed as shown in Table 6. The context search direction can be set to before, after, and backward directions of the object to be calibrated. A plurality of (n) contexts can be defined, and each context is represented by one of a word, a morpheme, a representative tag, and a detail tag. Each context is described by [description method of the context x (word / morpheme / representative tag / detail tag)] = [context x], and a plurality of these are connected to define a whole context related to the object to be corrected. On the other hand, if the direction of the object to be corrected is 'forward or backward direction', it can be described as '|'

기술 방법Technical Method 의미 오류 교정 규칙 (문맥 상세)Meaning Error Correction Rules (Context Details) 일반 기술General technology [교정대상] [문맥 탐색 방향(이전/이후/전후)] {문맥 기술부} [교정내용] [규칙확장여부(C/P)][Correction target] [Context search direction (Before / After / After)] {Context description part} [Correction contents] [Extension rule (C / P)] 교정대상 이전/이후방향으로 탐색되는
{문맥 기술부}Navigated to before / after calibration target
{Context Engineering Department} [문맥1의 기술 방법(어절/형태소/대표태그/세부태그)]= [문맥1],…, [문맥n의 기술 방법(어절/형태소/대표태그/세부태그)]= [문맥n] [Description method of context 1 (word / morpheme / representative tag / detail tag)] = [context 1], ... , [Description method of context n (word / morpheme / representative tag / detail tag)] = [context n] 교정대상 전후 방향으로 탐색되는 {문맥 기술부}{Contextual technology department} 이전 [문맥1의 기술 방법(어절/형태소/대표태그/세부태그)]= [문맥1],…, [문맥n의 기술 방법(어절/형태소/대표태그/세부태그)]= [문맥n] | 이후 [문맥1의 기술 방법(어절/형태소/대표태그/세부태그)]= [문맥1],…, [문맥n의 기술 방법(어절/형태소/대표태그/세부태그)]= [문맥n]Previous [description method of context 1 (word / morpheme / representative tag / detail tag)] = [context 1], ... , [Description method of context n (word / morpheme / representative tag / detail tag)] = [context n] | After that [description method of context 1 (word / morpheme / representative tag / detail tag)] = [context 1], ... , [Description method of context n (word / morpheme / representative tag / detail tag)] = [context n]

표 6의 의미 오류 교정 규칙으로 표현된 '빗다'-'빚다'의 오류 교정 규칙은 표 7과 같이 다양하게 기술될 수 있다. 표 7에서 예를 든 규칙은, 탐색 방향이 '이전'이므로, 교정대상 앞에 문맥 기술부에서 지정한 문맥이 있다면, 교정대상을 교정내용으로 정정하는 규칙이다.The error correction rules of 'BIGDA' - 'BUDDHA' expressed in the meaning error correction rule of Table 6 can be variously described as shown in Table 7. In the example shown in Table 7, since the search direction is 'previous', if there is a context designated by the context description unit before the object to be corrected, it is a rule to correct the object to be corrected to the contents of the correction.

오류 교정 규칙 및 오류 교정 결과 Error correction rules and error correction results 규칙1: [빗다 형태소 *] [이전] [형태소=만두 *] [빚다 *] [C]
문맥1: 형태소=만두 *
오류 교정 결과: 만두 빗었다. → 만두 빚었다.Rule 1: [Binda morphemes *] [previous] [ morpheme = buns * ] [bold *] [C]
Context 1: morpheme = buns *
Error correction result: The dumplings were combed. → I made buns. 규칙2: [빗다 형태소 *] [이전] [어절=만두를 *] [빚다 *] [C]
문맥1: 어절=만두를
오류 교정 결과: 만두를 빗었다. → 만두를 빚었다.Rule 2: [bitter morpheme *] [previous] [ word = buns *] [borrow *] [C]
Context 1: Echoes = Buns
Error correction result: I diced the dumplings. → I made buns. 규칙3: [빗다 형태소 *] [이전] [형태소=만두 *,세부태그=보조사] [빚다 *] [C]
문맥1: 형태소=만두 *
문맥2: 세부태그=보조사
오류 교정 결과: 만두+보조사(도/만/는) 빗었다. → 만두도/만두만/만두는 빚었다.Rule 3: [Binda morpheme *] [previous] [ morpheme = bunt *, detail tag = assistant] [bold *] [C]
Context 1: morpheme = buns *
Context 2: Detail tag = assistant
Error correction result: dumplings + assistant (degrees / moon / s) combed. → dumplings / dumplings / dumplings. 규칙4: [빗다 형태소 *] [이전] [형태소=만두 *,대표태그=부사] [빚다 *] [C]
문맥1: 형태소=만두 *
문맥2: 대표태그=부사
오류 교정 결과: 만두 부사(많이/잘/함께/빨리/잔뜩) 빗었다.
→ 만두 많이/잘/함께/빨리/잔뜩 빚었다.Rule 4: [abridged morpheme *] [previous] [ morpheme = bunt *, representative tag = adverb ]
Context 1: morpheme = buns *
Context 2: Representative tag = adverb
Error Correction Result: Dumpling Adverb (lot / well / together / fast / bunch).
→ a lot of buns / well / together / fast / I made a lot. 규칙5: [빗다 형태소 *] [이전] [어절=만두를 *,세부태그=형용사, 형태소=게 어미] [빚다 *] [C]
문맥1: 어절=만두를 *
문맥2: 세부태그=형용사
문맥3: 형태소=게 어미
오류 교정 결과: 만두를 형용사 어간 + 어미-게(맛있게/예쁘게/멋지게/정성스럽게) 빗
었다. → 만두를 맛있게/예쁘게/멋지게/정성스럽게 빚었다.Rule 5: [abridged morpheme *] [previous] [ verse = dumplings *, detail tag = adjective, morpheme =
Context 1: Eatings = Buns *
Context 2: Detail tag = adjective
Context 3: Morpheme = crab
Error correction result: dumplings with adjective stem + mother-crab (deliciously / beautifully / nicely / carefully) comb
. → Delicious dumplings / beautifully / nicely / carefully.

표 6의 오류 교정 규칙으로 표현된 '찌게'-'찌개'의 오류 교정 규칙은 표 8과 같이 다양하게 기술될 수 있다. 표 8에서 예를 든 규칙은, 탐색 방향이 '이후'이므로, 교정대상 뒤에 문맥 기술부에서 지정한 문맥이 있다면, 교정대상을 교정내용으로 정정하는 규칙이다.The error correction rules of 'jigae' - 'jigae' expressed in the error correction rule of Table 6 can be variously described as shown in Table 8. In Table 8, the example rule is a rule that corrects the object to be corrected to the contents of correction if there is a context specified by the context description unit after the object to be corrected since the search direction is 'after'.

오류 교정 규칙 및 오류 교정 결과 Error correction rules and error correction results 규칙1: [찌게 형태소 *] [이후] [형태소=끓이다 *] [찌개 *] [C]
문맥1: 형태소=끓이다 *
오류 교정 결과: 저녁에는 찌게 끓였어. → 저녁에는 찌개 끓였어.Rule 1: [stemming morpheme *] [after] [ morpheme = boiling *] [stewing *] [C]
Context 1: morpheme = boil *
Error correction result: In the evening, it was boiled. → I cooked the stew in the evening. 규칙2: [찌게 형태소 *] [이후] [대표태그=보조사, 형태소=끓이다 *] [찌개 *] [C]
문맥1: 대표태그=보조사
문맥2: 형태소=끓이다 *
오류 교정 결과: 찌게+보조사(도/만/는) 끓였네. → 찌개도/찌개만/찌개는 끓였네.Rule 2: [stemming morpheme *] [after] [ Representative tag = assistant, morpheme = boil *] [stew *] [C]
Context 1: Main Tag = Assistant
Context 2: morpheme = boil *
Error correction result: Steaming + auxiliary (boiling / boiling / boiling) boiling. → The stew / stew / stew boiled. 규칙3: [찌게 형태소 *] [이후] [대표태그= 형용사,형태소 =게 어미,형태소 =끓이다 *] [찌개 *] [C]
문맥1: 대표태그=형용사
문맥2: 형태소=게 어미
문맥3: 형태소=끓이다 *
오류 교정 결과: 찌게 형용사 어간+ 어미 -게(맛있게/뜨겁게/맵게/맛깔나게) 끓이려면 → 찌개 맛있게/뜨겁게/맵게/맛깔나게 끓이려면Rule 3: [stemming morpheme *] [after] [ Representative tag = adjective, morpheme = stem, morpheme = boil *] [stew *] [C]
Context 1: Representative tag = adjective
Context 2: Morpheme = crab
Context 3: morpheme = boil *
Error correction result: Stubborn adjective stem + mother-crab (delicious / hot / spicy / flavor) To boil → Stew delicious / hot / spicy / flavor to boil

표 6의 오류 교정 규칙으로 표현된 전후 문맥 탐색 규칙들과 이를 통한 오류 교정 결과는 표 9와 같다. 표 9에서 예를 든 규칙은, 탐색 방향이 '전후'이므로, 교정대상 앞과 뒤에 문맥 기술부에서 지정한 문맥이 있다면, 교정대상을 교정내용으로 정정하는 규칙이다.Table 9 shows the context search rules expressed by the error correction rules in Table 6 and the error correction results thereof. In the example shown in Table 9, since the search direction is 'before and after', if there is a context designated by the context description unit before and after the object to be corrected, it is a rule to correct the object to be corrected to the contents of correction.

오류 교정 규칙 및 오류 교정 결과 Error correction rules and error correction results 규칙1: [길러 어절 *] [전후] [어절=물을 * | 형태소=마시다 *] [길어 *] [C]
문맥1: 어절=물을 *
문맥2: 형태소=마시다 *
오류 교정 결과: 우물에서 물을 길러 마셨다. → 우물에서 물을 길어 마셨다.Rule 1: Raised Eojeol *] [after] [= Eojeol water * | Morpheme = drink *] [long *] [C]
Context 1: Word = water *
Context 2: Morphology = Drink *
Error correction Result: I drank water from a well. → I drank water from a well. 규칙2: [낮게 어절 *] [전후] [어절=병자를 * | 형태소=하다 *] [낫게 *] [C]
문맥1: 어절=병자를 *
문맥2: 형태소=하다 *
오류 교정 결과: 의사는 병자를 낮게 했다. → 의사는 병자를 낫게 했다.Rule 2: [low Eojeol *] after [the Eojeol = sick * | Morpheme = do *] [better *] [C]
Context # 1:
Context 2: morpheme = do *
Error correction results: The doctor lowered the sick. The doctor healed the sick. 규칙3: [묶을 어절 *] [전후] [대표태그=주격 조사 | 형태소=호텔 *] [묵을 *] [C]
문맥1: 대표태그=주격 조사
문맥2: 형태소=호텔 *
오류 교정 결과: 우리가/그들이 묶을 호텔은 → 우리가/그들이 묵을 호텔은Rule 3: [tied up] * [before and after] [ representative tag = subject survey | Morpheme = hotel *] [stay [*]] [C]
Context 1: Representative Tag = Subject Search
Context 2: Morpheme = Hotel *
Error Correction Result: We will tie them to the hotel → we will arrange the hotel / 규칙4: [꾀 어절 *] [전후] [형태소=이 주격 조사 | 형태소=나다 *] [꽤 *] [C]
문맥1: 형태소=이 주격 조사
문맥2: 형태소=나다 *
오류 교정 결과: 석탄이/열이 꾀 난다. → 석탄이/열이 꽤 난다.Rule 4: [Etiquette *] [Before and After] [ Morphea = Morpheme = I am *] [pretty *] [C]
Context 1: Morphology =
Context 2: Morpheme = Ida *
Error Correction Result: Coal is / is heated. → The coal is quite hot.

지금까지 표 7부터 표 9를 참고로 설명한 규칙은 오류 교정 규칙에서 지정한 문맥이 존재하면 교정대상을 교정내용으로 정정한다. 이렇게 오류 교정 규칙을 적용하는 경우, 미처 발굴하지 못한 문맥에 대해서는 대응할 수 없다. 또한, 문맥의 위치가 의미 오류 교정 규칙으로 기술 가능한 범위를 벗어난 경우도 있다. 이러한 문제를 해결하기 위해 의미 오류 교정 규칙 기술 장치(200)는 오류 교정 규칙에 규칙 확장 여부를 나타내는 값(예를 들면, 'C', 'P')를 붙여서, 오류 교정 규칙을 엄격히 적용할 것인지('C'인 경우), 구문 분석을 통해 의미 오류 교정 규칙을 적응적으로 적용할 것인지('P')를 기술할 수 있다. 예를 들면, 표 10과 같이, 규칙 확장을 나타내는 값(예를 들면, 'P')을 포함시켜 규칙을 기술할 수 있다.The rules described with reference to Table 7 to Table 9 up to now correct the calibration target to the calibration content if the context specified in the error correction rule exists. When the error correction rule is applied in this way, it can not cope with a context that has not been found yet. There are also cases where the position of the context is out of the allowable range of semantic error correction rules. In order to solve such a problem, the semantic error correction rule description device 200 attaches a value (for example, 'C' or 'P') indicating whether the rule is extended to the error correction rule so as to strictly apply the error correction rule ('C'), and whether to apply the semantic error correction rules adaptively ('P') through parsing. For example, as shown in Table 10, a rule may be described by including a value indicating a rule extension (for example, 'P').

규칙1: [거스르다 형태소 동사] [이전] [어절=눈에 *] [거슬리다 동사] [P]Rule 1: [verb morpheme verb] [previous] [verse = in the eye *] [verb verb] [P] 규칙2: [폐색 형태소 *] [이후] [형태소=짙다 *] [패색 *] [P]Rule 2: [occlusion morpheme *] [since] [morpheme = dark *] [red *] [P]

입력문의 오류 교정 규칙으로 탐색된 오류 교정 규칙에 규칙 확장을 나타내는 값('P')이 포함된 경우, 의미 오류 교정 장치(300)는 구문 분석기로 입력문의 구문 분석을 하여, 의미 오류 교정 규칙을 적용할지 판단할 수 있다. When the error correction rule detected by the error correction rule of the input query includes a value ('P') indicating the rule extension, the semantic error correction device 300 analyzes the syntax of the input query with a syntax analyzer, It is possible to judge whether or not to apply it.

구문 분석을 통한 의미 오류 교정 범위는 어절 단위 문맥인지 형태소 단위 문맥인지에 따라 다를 수 있다. 예를 들면, 입력문의 오류 교정 규칙으로 탐색된 규칙이 어절 단위 문맥(어절=눈에)을 포함하는 규칙([거스르다 형태소 동사] [이전] [어절=눈에 *] [거슬리다 동사] [P])인 경우, 의미 오류 교정 시, 모든 의존 관계 유형(눈에[NP-AJT] ← 거스르다)에 해당 규칙을 적용하여 오류 교정하도록 약속될 수 있다. The scope of semantic error correction through parsing may be different depending on whether it is a unit-of-word context or a morpheme context. For example, if the rule searched by the error correcting rule of the input query is a rule containing the unit-of-word context (the word in the eye) [(the verb is the morpheme verb) ]), Semantic error correction can be promised to correct errors by applying the corresponding rule to all dependency types ([NP-AJT] ← in the eye).

입력문의 오류 교정 규칙으로 탐색된 규칙이 형태소 단위 문맥을 포함하는 규칙인데, 교정대상(key)이 의존소(dependent)이고, 문맥(context)이 지배소(header)인 규칙(예를 들면, [폐색 형태소 *] [이후] [형태소=짙다 *] [패색 *] [P])인 경우, 의미 오류 교정 시, 모든 의존 관계 유형(폐색이[NP-SBJ] ← 짙다)에 해당 규칙을 적용하여 오류 교정하도록 약속될 수 있다. A rule that is searched by the error correction rule of the input query is a rule that includes a morpheme unit context and the rule is that the key is dependent and the context is a header (for example, Apply the corresponding rule to all dependency types (occlusion is [NP-SBJ] ← dark) at the time of semantic error correction, if the occlusion morpheme *] [after] [morpheme = dark *] [ It can be promised to correct errors.

한편, 입력문의 오류 교정 규칙으로 탐색된 규칙이 형태소 단위 문맥을 포함하는 규칙인데, 교정대상(key)이 지배소(header)이고, 문맥(context)이 의존소(dependent)인 규칙인 경우, 지정된 의존 관계 유형일 때에만 해당 규칙을 적용하여 오류 교정할 수 있다. 지정된 의존 관계 유형은 예를 들면, NP, NP_SBJ, NP_OBJ, NP_MOD, VP, VP_SBJ, VP_OBJ, VP_MOD, VNP_SBJ, VNP_OBJ, VNP_MOD 중 적어도 하나를 포함할 수 있다. 예를 들어, 표 11과 같이, 규칙([짖다 형태소 동사] [이전] [형태소=집 *] [짓다 동사] [P])은 '집을 크게 짖다'를 '집을 크게 짓다'로 교정하기 위한 규칙인데, 이 규칙을 '강아지가 집에서 시끄럽게 짖는다'에 그대로 적용하면, '강아지가 집에서 시끄럽게 짓는다'로 변경해서 오류를 발생시킬 수 있다. 따라서, 이 경우에는 지정된 의존 관계 유형일 때에만 규칙을 적용하여 오류 교정하도록 지시한다.On the other hand, if the rule retrieved by the error correction rule of the input query is a rule including the morpheme unit context and the key is the header and the context is the dependent rule, Only when it is a dependency type, error correction can be applied by applying the rule. The specified dependency type may include at least one of NP, NP_SBJ, NP_OBJ, NP_MOD, VP, VP_SBJ, VP_OBJ, VP_MOD, VNP_SBJ, VNP_OBJ and VNP_MOD, for example. For example, as shown in Table 11, the rule ([bark morphological verb] [previous] [morpheme = home *] [compose verb] [P]) is a 'house large bark "to" nidify significantly "as rules for calibrating inde, you accept these rules to 'the dog barks loudly at home "can cause errors by modifying the" puppy smiles loudly at home. Therefore, in this case, the rule is applied only when the specified dependency type is specified to instruct error correction.

입력문 및 의존관계Input statements and dependencies 오류 교정 규칙 및 교정 결과Error correction rules and calibration results 까치가 하루 종일 끊임없이 깍깍 짓는다

key=짓다(지배소),
context=까치(의존소)
까치가→짓는다[NP_SBJ]Magpies constantly bark all day.

key = make (dominant),
context = Magpie (dependency)
Magpie → make [NP_SBJ] [짓다 형태소 동사] [이전] [형태소=까치 *] [짖다 동사] [P]

교정 결과: 까치가 하루 종일 끊임없이 깍깍 짖는다[Preposition morpheme verb] [previous] [morpheme = magpie *] [bark verb] [P]

Correction results: Magpies constantly shout all day 강아지가 집에서 시끄럽게 짖는다.

key = 집에서(의존소),
context = 짖다(지배소)
집에서→짖는다[NP_AJT]The puppy barks loudly at home.

key = home (dependency),
context = bark (dominion)
Home → bark [NP_AJT] [짖다 형태소 동사] [이전] [형태소=집 *] [짓다 동사] [P]

교정 결과: [NP_AJT]는 의미 오류 규칙 적용 대상이 아니므로 교정 안되고 그대로 출력[Bold] morpheme verb [previous] [morpheme = house *] [verb verb] [P]

Calibration result: [NP_AJT] is not calibrated because it is not applicable to the meaning error rule.

의미 오류 교정 장치(300)는 의미 오류 교정 규칙 기술 장치(200)에서 기술된 규칙들 즉, 규칙 사전(400)에 저장된 규칙들을 기초로 입력문의 의미 오류를 교정한다. The semantic error correcting apparatus 300 corrects semantic errors of the input query based on the rules described in the semantic error correcting rule description device 200, that is, rules stored in the rule dictionary 400. [

먼저, 의미 오류 교정 장치(300)는 입력문의 어절 및 형태소 중에서 의미 오류 발생 가능성이 있는 교정대상(key)이 있는지 판단한다. 의미 오류 교정 장치(300)는 규칙 사전(400)에서 교정대상(key)과 입력문의 문맥(context)을 포함하는 적어도 하나의 의미 오류 교정 규칙을 추출한다. 의미 오류 교정 장치(300)는 추출한 의미 오류 교정 규칙의 교정내용(value)으로 교정대상(key)을 교정한다. 의미 오류 교정 장치(300)는 교정된 어휘/형태소가 문법적으로 결합 가능한지 문법 오류 검사하고, 형태소인 경우 활용형으로 어휘를 생성하여, 의미 오류 교정된 출력문을 생성한다.First, the semantic error correcting apparatus 300 judges whether there is a correction target (key) that may cause a semantic error in the input word morpheme and morpheme. The semantic error correcting apparatus 300 extracts at least one semantic error correcting rule including a key to be calibrated and a context of an input query in the rule dictionary 400. [ The meaning error correcting apparatus 300 corrects the key to be corrected with the corrected value of the extracted meaning error correcting rule. The semantic error correcting apparatus 300 generates a semantic error-corrected output statement by generating a vocabulary in a utilization form when the corrected vocabulary / morpheme is grammatically combinable or grammatical error check.

한편, 규칙 사전(400)은 어절, 형태소, 품사 정보(대표 태그, 세부 태그)의 문맥 조합으로 표현되므로, 하나의 교정대상(key)을 교정내용(value)으로 교정하는 규칙은 매우 많이 기술될 수 있다. 따라서, 의미 오류 교정 장치(300)는 입력문의 오류 교정을 위해 의미 오류 교정 규칙 사전(400)에서 적어도 하나의 규칙을 가져와야 하는데, 교정대상(key)과 입력문의 다양한 문맥(context) 조합에 해당하는 의미 오류 교정 규칙이 규칙 사전(400)에 존재하는지를 확인해야 한다. 하지만, 입력문의 문맥(context) 조합으로부터, 교정대상(key)의 오류 교정을 위해 존재 가능성이 있는 모든 규칙 후보를 생성하고, 모든 규칙 후보가 규칙 사전(400)에 실제로 존재하는지 확인한다면, 시스템 복잡도가 매우 높아지게 된다. 따라서, 의미 오류 교정 장치(300)는 오류 교정 규칙의 교정 대상(key)의 타입(어절이나 형태소)에 따라 문장에서 존재 가능한 문맥 정보(어절, 형태소, 품사 정보)를 탐색하는 경로를 일반화된 의미 오류 규칙으로 정해 둔다. 일반화된 의미 오류 규칙은 복수 개 정의될 수 있고, 각 일반화된 의미 오류 규칙은 우선순위가 설정되어 있을 수 있다. 일반화된 의미 오류 규칙은 일반화된 규칙 사전(500)에 저장될 수 있다.Meanwhile, since the rule dictionary 400 is represented by a context combination of words, morpheme, and parts of speech information (representative tags, detailed tags), a rule for correcting a single calibration subject to a calibration value is very much described . Therefore, the semantic error correcting apparatus 300 must fetch at least one rule from the semantic error correcting rule dictionary 400 for error correction of the input query, and the semantic error correcting rule dictionary 400 includes at least one rule corresponding to the combination of the key and the context It is necessary to check whether the meaning error correction rule exists in the rule dictionary 400. [ However, if all rule candidates that are likely to exist for error correction of the key to be corrected are generated from the context combination of the input query and if all rule candidates are actually present in the rule dictionary 400, . Therefore, the semantic error correcting apparatus 300 can classify a path for searching for context information (phrase, morpheme, and part-of-speech information) that can be present in a sentence according to the type (keyword or morpheme) of the correction target key of the error correction rule Set as error rule. A plurality of generalized semantic error rules can be defined, and each generalized semantic error rule can be prioritized. The generalized semantic error rules can be stored in the generalized rule dictionary 500.

도 2의 (a)를 참고하면, 입력문이 '힘도 새다'인 경우, 입력문의 어절 및 형태소 중에서 의미 오류 발생 가능성이 있는 교정대상(key)이 '새다'라고 판단될 수 있다. 의미 오류 교정 장치(300)는 규칙 사전(400)에서 교정대상 '새다'와 입력문의 문맥을 포함하는 적어도 하나의 의미 오류 교정 규칙을 추출해야 하는데, 규칙으로 존재 가능한 문맥이 어절 '힘도', 형태소 '힘'과 이의 품사 태그, 보조사 '도'와 이의 품사 태그의 조합일 수 있다. Referring to FIG. 2 (a), when the input statement is ' force is strong ', it can be judged that the calibration subject (key) having a possibility of occurrence of a semantic error in the input query word and morpheme is 'leaky'. The semantic error correcting apparatus 300 must extract at least one semantic error correcting rule including the subject to be calibrated and the context of the input statement in the rule dictionary 400. If the context which can exist as a rule is a word ' It can be a combination of a morpheme 'power' and its part-of-speech tag, an assistant 'doo' and its part-of-speech tag.

이때, 의미 오류 교정 장치(300)는 입력문의 오류 교정을 위해 존재 가능한 모든 규칙 후보가 규칙 사전(400)에 존재하는지 확인할 필요 없이, 도 2의 (b)와 같이, 일반화된 규칙 사전(500)에서 교정대상(key) '새다'에 해당하는 동사 형태소로부터 문맥을 탐색할 수 있는 타당한 경로(교정대상의 '이전' 방향으로 탐색해서 '형태소' 및 품사를 나타내는 '태그'로 구성된 문맥을 찾는 경로)를 먼저 찾는다. At this time, the semantic error correcting apparatus 300 may generate the generalized rule dictionary 500 as shown in FIG. 2B without checking whether all possible rule candidates exist in the rule dictionary 400 for error correction of the input query. A search for a context consisting of a 'morpheme' and a 'tag' indicating parts of speech by searching in a 'previous' direction of a correction subject, from a verb morpheme corresponding to the subject to be corrected (key) ).

그리고 의미 오류 교정 장치(300)는 타당한 경로의 문맥 정보로 입력문의 의미 오류 교정 규칙(형태소 '힘' 및 품사 태그 '보조사'의 문맥 조합)을 생성하고, 이 규칙이 실제로 규칙 사전(400)에 존재하는지 탐색하여, 규칙 사전(400)에 존재하는 규칙의 교정내용(예를 들면, '세다')를 획득한다.Then, the semantic error correcting apparatus 300 generates semantic error correcting rules (a combination of a morpheme 'force' and a contextual tag of a part of speech tag 'assistant') as context information of a valid path, (For example, 'Cedar') of the rule existing in the rule dictionary 400 is obtained.

이를 위해, 의미 오류 교정 장치(300)는 규칙 사전(400)의 의미 오류 교정 규칙들을 문맥 구조를 나타내는 정보로 일반화하여 일반화된 의미 오류 규칙을 생성해 둔다. 일반화된 의미 오류 규칙은 부호화되어 저장될 수 있다. 여기서 일반화된 의미 오류 규칙은 의미 오류 규칙(예를 들면, '빗다'를 '빚다'로 오류 교정하는 규칙)이 품사 태그와 같은 문맥 구조 정보로 일반화되는 것을 의미한다.To this end, the semantic error correcting apparatus 300 generalizes semantic error correcting rules of the rule dictionary 400 to information indicating a context structure, and generates generalized semantic error rules. Generalized semantic error rules can be coded and stored. Here, the generalized semantic error rule means that the semantic error rule (for example, the rule of correcting 'misdata' to 'error') is generalized to the context structure information such as part-of-speech tag.

일반화된 교정 대상(key)별 일반화된 의미 오류 규칙은 복수 개 정의될 수 있고, 각 일반화된 의미 오류 규칙은 부호화될 수 있다. 각 일반화된 의미 오류 규칙은 우선순위가 설정되어 있을 수 있다. 일반화된 의미 오류 규칙과 이의 부호화값은 일반화된 규칙 사전(500)에 저장될 수 있다. 의미 오류 교정 장치(300)는 규칙 사전(400)을 읽어 교정 대상(key)과 이의 규칙들을 일반화된 의미 오류 규칙으로 미리 변환해 둘 수 있다.Multiple generalized semantic error rules per generalized key (key) can be defined, and each generalized semantic error rule can be encoded. Each generalized semantic error rule may have a priority set. The generalized semantic error rules and their encoded values can be stored in the generalized rule dictionary 500. The semantic error correcting apparatus 300 can read the rule dictionary 400 and convert the key and its rules into generalized semantic error rules in advance.

예를 들어 표 12를 참고하면, '빗다'를 '빚다'로 교정하기 위한 규칙 3개가 있다고 할 때, 규칙#1은 [형태소 품사가 명사인 '송편', 대표태그가 목적격 조사, 대표태그가 부사]가 문맥으로 있는 경우, 예로 '송편을 잘 빗는구나'를 교정하기 위한 규칙이고, 규칙#2는 [형태소 품사가 명사인 '송편', 대표태그가 목적격 조사, 어절이 '예쁘게']가 문맥으로 있는 경우, 예로 '송편을 예쁘게 빗네'를 교정하기 위한 규칙이며, 규칙#3은 [품사 상관없이 '송편', 대표태그가 보조사]가 문맥으로 있는 경우, 예로 '송편만 빗어라'를 교정하기 위한 규칙이고, 이들을 일반화시키면 표 13과 같다.For example, referring to Table 12, if there are three rules for correcting 'Bidda' to 'Bidda', Rule # 1 is [ If an adverb] is in this context, for example, rules for correcting the "you're good comb the rice cake," rule # 2 - stemming Speech is a noun of "rice cake", the representative tag is objective investigation, Eojeol a 'clean' - the If that context, examples and rules for 'clean bitne a rice cake, proofreading, rule # 3 If you have to have [Speech' rice cake 'representatives tag Assistant regardless of] the context, for example, only rice cake comb called "the These are the rules for calibration and are generalized as shown in Table 13.

번호number '빗다 -> 빚다' 의미 오류 규칙 예Meaning error rule example #1#One [빗다 형태소 동사] [이전] [형태소=송편 명사, 대표태그=목적격 조사, 대표태그=부사] [빚다 동사] [C][Comb morphological verb] [previous] [morpheme = noun rice cake, a representative tag = survey objective, a representative tag = adverb] [bitda verb] [C] #2#2 [빗다 형태소 동사] [이전] [형태소=송편 명사, 대표태그=목적격 조사,어절=예쁘게 *] [빚다 동사] [C][Comb morphological verb] [previous] [morpheme = noun rice cake, a representative tag = survey objective, Eojeol = clean *] [bitda verb] [C] #3# 3 [빗다 형태소 동사] [이전] [형태소=송편 *, 대표태그=보조사] [빚다 동사] [C][Comb morphological verb] [previous] [morpheme = songpyeon *, representative tag = Assistant] [bitda verb] [C]

번호number 일반화된 의미 오류 규칙Generalized semantic error rules Key
(어절/형태소)Key
(Word / morpheme) DirectionDirection Context
(어절/형태소/품사정보)Context
(Word / morpheme / part of speech) #1#One 빗다 형태소 동사Blended morpheme verbs 이전Previous 형태소_tag, 대표태그, 대표태그Morpheme_tag, representative tag, representative tag #2#2 빗다 형태소 동사Blended morpheme verbs 이전Previous 형태소_tag, 대표태그, 어절_allMorpheme _tag, representative tag, word _all #3# 3 빗다 형태소 동사Blended morpheme verbs 이전Previous 형태소_all, 대표태그Morpheme _all, representative tag

의미 오류 교정 장치(300)는 복수의 일반화된 의미 오류 규칙의 문맥 탐색 방향 및 문맥 정보를 기초 우선순위를 부여하고, 이를 나타내는 값으로 부호화(인코딩)할 수 있다. 의미 오류 교정 장치(300)는 문맥 탐색 방향 및 문맥 정보 중에서 중요한 정보를 상위 비트에 위치시켜 부호화값만으로 각 일반화된 의미 오류 규칙의 우선순위를 표현할 수 있다.The semantic error correcting apparatus 300 may assign the context search direction and the context information of a plurality of generalized semantic error rules to a base priority and encode (encode) the same with a value indicating the base priority. The semantic error correcting apparatus 300 can represent important priorities of each generalized semantic error rule by placing important information among the context search direction and the context information in the upper bits.

표 14를 참고하면, 의미 오류 규칙의 우선순위는 문맥의 탐색 방향, 윈도우 크기, 문맥단위 순위를 가지며 각 중요도에 따라 상위 비트에 위치시켜 부호화시킨다. 표 15를 참고하면, 교정대상 전후의 최대 5 윈도우를 탐색하는 경우, 각 일반화된 의미 오류 규칙은 32비트의 부호화값으로 표현될 수 있다. 이때, 문맥 탐색 방향이 상위 2비트에 할당되어 부호화될 수 있다. 여기서, 이전/이후/전후 방향에 부여되는 우선순위는 전후, 이전, 이후 순으로 부여될 수 있다. 전후 방향의 중요도가 높으므로, 비트값 11이 부여된다. P5-P1/N1-N5는 교정대상을 기준으로 이전이나 이후에 존재하는 문맥 종류(어절/형태소/품사정보 등)를 윈도우 순서에 따라 표시하는데, 각 문맥의 중요도에 따라 값이 부여되고, 각 문맥의 중요도는 표 14에 할당된 비트값으로 표현될 수 있다. 문맥 중 '어절_tag'의 중요도를 가장 높게 부여한 경우, '어절_tag'의 비트값이 다른 문맥보다 큰 값(예를 들면, 110)으로 할당된다. 어절/형태소/품사정보에 부여되는 값은 교정 대상(key)이나, 문맥에서의 기여도에 따라 다르게 부여될 수 있다.Referring to Table 14, the priority of the semantic error rule has the search direction of the context, the window size, and the context unit rank, and is located in the upper bit according to each significance level. Referring to Table 15, when searching for a maximum of five windows before and after the object to be corrected, each generalized semantic error rule can be represented by a 32-bit encoded value. At this time, the context search direction can be assigned to the upper two bits and encoded. Here, the priority assigned to the previous / next / backward direction may be given in the order of before, after, before, and after. Since the importance in the forward and backward directions is high, the bit value 11 is given. P5-P1 / N1-N5 displays the types of contexts (word / morpheme / part-of-speech information, etc.) existing before or after the object to be calibrated according to window order. Values are assigned according to the importance of each context. The importance of the context can be expressed by the bit value assigned in Table 14. [ If the importance of the phrase " tag " in the context is highest, the bit value of the phrase " tag " is assigned to a value (e.g., 110) larger than the other context. The values assigned to the word / morpheme / parts information can be given differently depending on the key or the contribution in the context.

우선 순위Priority [방향] [윈도우 크기] [문맥단위] 순
-[방향]: 전후, 이전, 이후 순으로 우선순위 가짐
-[윈도우 크기]: 길이가 긴 순(5,4,3,2,1)
-[문맥단위]: 어절_tag, 어절_*(all), 형태소_tag, 형태소_*(all), 대표태그, 세부태그[Direction] [Window size] [Context unit]
- [Direction]: Priority in order of before, after, before, and after
- [window size]: long length (5,4,3,2,1)
- [contextual unit]: word_tag, word_ * (all), morpheme_tag, morpheme_ * (all), representative tag, detail tag 비트 표현Bit representation 2bit: 방향
-전후: 11(3)
-이전: 10(2)
-이후: 01(1)

30bit: 문맥 (3bit 씩 이전, 이후 각 5개 문맥 표현)
-어절_tag: 110(6)
-어절_*(all): 101(5)
-형태소_tag: 100(4)
-형태소_*(all): 011(3)
-대표태그: 010(2)
-세부태그: 001(1)2bit: Direction
- Before and after: 11 (3)
- Previous: 10 (2)
- After: 01 (1)

30bit: Context (3 bits before, after each 5 contexts)
- Eagle _tag: 110 (6)
- E-mail _ * (all): 101 (5)
- morpheme_tag: 100 (4)
- morpheme _ * (all): 011 (3)
- Representative Tags: 010 (2)
- Detailed Tags: 001 (1)

Direction(D)Direction (D) P5P5 P4P4 P3P3 P2P2 P1P1 N5N5 N4N4 N3N3 N2N2 N1N1 2bit2bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit

예를 들면, 표 15의 이진마스크로 표 13의 일반화된 의미 오류 규칙을 부호화하면 표 16과 같고, 이들의 부호화값 및 우선순위는 표 17과 같이 계산될 수 있다, 규칙#1, #2, #3의 문맥 탐색 방향은 '이전'으로 상위 2비트는 동일한 '10'이 할당된다. 규칙#1 및 #2는 문맥 윈도우 크기가 3이므로, P3, P2, P1에 비트가 할당되고, 규칙#3은 문맥 윈도우 크기가 2이므로, P2, P1에 비트가 할당되며, 비트값은 표 14의 문맥에 따라 '100'(형태소_tag), '011'(형태소_*), '010'(대표태그)가 할당될 수 있다. For example, when the generalized semantic error rule of Table 13 is encoded with the binary mask of Table 15, it is as shown in Table 16, and their encoding values and priority can be calculated as shown in Table 17. Rule # 1, # 2, The context search direction of # 3 is 'previous' and the upper two bits are assigned the same '10'. Since the context window size of the rules # 1 and # 2 is 3, bits are assigned to P3, P2, and P1, and the rule window # 3 has the context window size of 2. Therefore, bits are assigned to P2 and P1, '00' (morpheme _tag), '011' (morpheme _ *), and '010' (representative tag) may be assigned according to the context of the word '

이러한 부호화 결과, 표 12의 의미 오류 규칙 '빗다 -> 빚다'의 경우, 규칙#2의 부호화값이 가장 크기 때문에, 규칙#2의 우선순위가 가장 높다는 것을 의미한다.As a result of such encoding, in the case of the semantic error rule 'BIGDA-> overflow' in Table 12, the encoding value of rule # 2 is the largest, meaning that rule # 2 has the highest priority.

규칙rule DD P5P5 P4P4 P3P3 P2P2 P1P1 N5N5 N4N4 N3N3 N2N2 N1N1 2bit2bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit 3bit3bit #1#One 1010 000000 000000 100100 010010 010010 000000 000000 000000 000000 000000 #2#2 1010 000000 000000 100100 010010 101101 000000 000000 000000 000000 000000 #3# 3 1010 000000 000000 000000 011011 010010 000000 000000 000000 000000 000000

규칙rule 부호화값Encoded value 우선순위Priority #1#One 21564620802156462080 22 #2#2 21565603842156560384 1One #3# 3 21483356162148335616 33

다음에서, 일반화된 의미 오류 규칙을 이용하여 의미 오류 교정 장치(300)가 입력문의 의미 오류를 교정하는 방법에 대해 다음에서 설명한다.Next, a method for the semantic error correcting apparatus 300 to correct the semantic error of the input statement using the generalized semantic error rule will be described below.

도 3은 본 발명의 한 실시예에 따른 의미 오류 교정 장치의 의미 오류 교정 방법의 흐름도이다.3 is a flowchart of a semantic error correction method of a semantic error correcting apparatus according to an embodiment of the present invention.

도 3을 참고하면, 의미 오류 교정 장치(300)는 입력문을 어절, 형태소, 품사 정보(대표 태그, 세부 태그)로 분류하고, 분류한 정보 중에서 의미 오류 발생 가능성이 있는 교정대상(key)을 추출한다(S110). 입력문이 '송편을 예쁘게 빗었다'인 경우, 입력문의 형태소 및 품사 정보는 표 18과 같다. 의미 오류 교정 장치(300)는 형태소 '빗다'를 의미 오류 발생 가능성이 있는 교정대상(key)으로 판단할 수 있다. 한편, 어절 '빗었다'를 의미 오류 발생 가능성이 있는 교정대상(key)으로 판단할 수도 있으나, 형태소 '빗다'가 교정대상(key)인 경우에 대해 설명한다.Referring to FIG. 3, the semantic error correcting apparatus 300 classifies the input sentence into phrases, morphemes, and parts of speech information (representative tags and detailed tags), and selects a correction target key (S110). Table 18 shows the morpheme and part-of-speech information of the input sentence when the input sentence is ' beautifully combed '. The semantic error correcting apparatus 300 can judge the morpheme 'BIDDER' as a correction subject having a possibility of a semantic error. On the other hand, it can be judged as a correction subject (key) which means a word 'combed', but it explains a case where the word 'comb' is a correction subject (key).

어절Eulogy 송편
noun_phrasesongpyeon
noun_phrase 예쁘게
pa ecbeautifully
pa ec 빗었다
pv ecI did it.
pv ec 형태소morpheme 송편
명사songpyeon
noun 을
목적격 조사of
Object investigation 예쁘다
형용사pretty
adjective 게
어미to
ending of a word 빗다comb
동사verb
--> key-> key 었
선어말 어미The
A good mother 다
어미All
ending of a word 대표 태그Representative tags 명사noun 목적격 조사Object investigation 형용사adjective 어미ending of a word 동사verb 선어말 어미A good mother 어미ending of a word 세부 태그Detail tag 명사noun 목적격 조사Object investigation 형용사adjective 어미ending of a word 동사verb 선어말 어미A good mother 어미ending of a word

의미 오류 교정 장치(300)는 일반화된 규칙 사전(500)에서, 형태소 '빗다'의 품사 태그 정보 동사가 교정대상(key)인 일반화된 의미 오류 교정 규칙들을 추출한다(S120). 이때, 일반화된 규칙 사전(500)는 부호화값을 저장하고 있어서, 의미 오류 교정 장치(300)는 부호화값 '2156560384'를 부호화된 규칙으로 1차 추출하고, 부호화된 규칙 '2156560384'를 복호화(디코딩)하여 일반화된 의미 오류 교정 규칙#2를 획득될 수 있다. 표 12를 참고하면, 교정쌍 <빗다 형태소 동사, 빚다 동사>를 교정하기 문맥 정보는 3가지이고, 이들의 조합으로 존재 가능한 일반화된 오류 교정 규칙은 표 13처럼 3개일 수 있다. 이와 같이, 의미 오류 교정 장치(300)는 표 18에서 입력어의 문맥이 다양하게 조합될 수 있더라도, 미리 교정대상에 따라 구조적으로 타당한 문맥 정보가 일반화된 오류 교정 규칙으로 정해져 있으므로, 교정대상(key)인 [빗다 형태소 동사]에 관계된 일반화된 오류 교정 규칙을 후보 규칙들로 추출하기만 하면 된다.The semantic error correcting apparatus 300 extracts generalized semantic error correcting rules in which the partly tagged information verb of the morpheme 'Bidda' is a key in the generalized rule dictionary 500 (S120). At this time, the generalized rule dictionary 500 stores the encoded values, so that the semantic error correcting apparatus 300 firstly extracts the encoded value '2156560384' with the encoded rule, and decodes the encoded rule '2156560384' ) To obtain the generalized semantic error correcting rule # 2. Referring to Table 12, there are three types of context information for correcting the pair of correction <BIGDA morphological verb, BUDGDA verb>, and the generalized error correction rule that can exist as a combination of them is three as shown in Table 13. Thus, even though the context of the input word can be variously combined in Table 18, the semantic error correcting apparatus 300 has the structure information that is structurally valid according to the object to be corrected. ) [Bidda morpheme verbs] that are related to the generalized error correction rules.

의미 오류 교정 장치(300)는 추출된 일반화된 오류 교정 규칙이 입력문의 문맥 구조에서 타당한 규칙인지 검증한다(S130). 예를 들어, 의미 오류 교정 장치(300)는 표 18의 어절 및 형태소 분석 결과를 기초로, 일반화된 오류 교정 규칙에 해당하는 문맥 탐색 경로가 생성되는지 확인한다. 예를 들어, 표 13과 표 18을 참고하면, 규칙#1은 문맥이 이전 방향으로 형태소_tag, 대표태그, 대표태그인데, 이에 대응하는 문맥 경로를 모두 탐색할 수 있으므로, 규칙#1은 타당한 일반화된 오류 교정 규칙이다. 규칙#2는 문맥이 이전 방향으로 형태소_tag, 대표태그, 어절_all인데, 이에 대응하는 경로를 탐색할 수 있으므로, 규칙#2 역시 타당한 일반화된 오류 교정 규칙이다. 규칙#3은 문맥이 이전 방향으로 형태소_all, 대표태그인데, 이에 대응하는 경로를 탐색할 수 있으므로, 규칙#3 역시 타당한 일반화된 오류 교정 규칙이다. 한편, 의미 오류 교정 장치(300)는 일반화된 오류 교정 규칙이 표 17과 같이 부호화값으로 출력되는 경우, 이를 복호화한 후, 일반화된 오류 교정 규칙의 문맥 정보를 확인할 수 있다. 의미 오류 교정 장치(300)는 추출된 일반화된 오류 교정 규칙의 우선순위에 따라 순차적으로 동작하도록 프로그램된 경우, 추출된 일반화된 오류 교정 규칙 중 우선순위가 높은 규칙이 입력문의 문맥 구조에서 타당하지 않으면, 후순위로 추출된 일반화된 오류 교정 규칙이 입력문의 문맥 구조에서 타당한 규칙인지 검증할 수 있다.The semantic error correcting apparatus 300 verifies whether the extracted generalized error correcting rule is a valid rule in the context structure of the input statement (S130). For example, the semantic error correction apparatus 300 determines whether a context search path corresponding to the generalized error correction rule is generated based on the results of the word and morphological analysis of Table 18. For example, referring to Tables 13 and 18, Rule # 1 can search all of the contextual paths corresponding to the morpheme_tag, the representative tag, and the representative tag in the previous direction, Generalized error correction rules. Rule # 2 is also a valid generalized error correction rule because the context is the morpheme _tag, the representative tag, and the word _all in the previous direction, and the corresponding path can be searched. Rule # 3 is a valid generalized error correction rule because the context is the morpheme _all, the representative tag in the previous direction, and the corresponding path can be searched. On the other hand, if the generalized error correction rule is output as a coded value as shown in Table 17, the semantic error correcting device 300 can confirm the context information of the generalized error correction rule after decoding it. When the semantic error correcting apparatus 300 is programmed to sequentially operate according to the priority of the extracted generalized error correction rules, if the rule having a higher priority among the extracted generalized error correction rules is not valid in the context structure of the input query , It is possible to verify whether the generalized error correction rule extracted in subordinate order is a valid rule in the context structure of the input query.

의미 오류 교정 장치(300)는 타당한 규칙으로 검증된 일반화된 오류 교정 규칙에 따라 입력문의 문맥에 맞는 입력문의 오류 교정 규칙을 생성한다(S140). 예를 들면, 표 18을 참고하면, 규칙#1로 생성된 입력문의 오류 교정 규칙은 '[빗다 형태소 동사] [이전] [형태소=을 조사, 대표태그=형용사, 대표태그=어미'이고, 규칙#2로 생성된 입력문의 오류 교정 규칙은 '[빗다 형태소 동사] [이전] [형태소=송편 명사, 대표태그=목적격 조사, 어절=예쁘게 *]'이고, 규칙#3으로 생성된 입력문의 오류 교정 규칙은 '[빗다 형태소 동사] [이전] [형태소=예쁘다 *, 대표태그=어미]' 일 수 있다. The semantic error correcting apparatus 300 generates an error correcting rule of the input query according to the generalized error correcting rule verified by a proper rule (S140). For example, referring to Table 18, it can be seen that the error correcting rule of the input query generated by rule # 1 is "[Binda morpheme verb] [previous] [morpheme = investigation, representative tag = adjective, representative tag = The error correction rule of the input query generated by # 2 is' [Binda morpheme verb] [previous] [morpheme = Song noun, representative tag = The rule can be '[BIGDA] morpheme verb [previous] [morpheme = pretty *, representative tag = mother]'.

의미 오류 교정 장치(300)는 교정 내용(value)을 찾기 위해, 규칙 사전(400)에서 입력문의 오류 교정 규칙을 탐색한다(S150).The semantic error correcting apparatus 300 searches the rule dictionary 400 for the error correction rule of the input query to find the correction value (S150).

의미 오류 교정 장치(300)는 규칙 사전(400)에 입력문의 오류 교정 규칙이 있으면, 해당 오류 교정 규칙의 교정 내용(value)을 추출한다(S160). 의미 오류 교정 장치(300)는 규칙 사전(400)에 입력문의 오류 교정 규칙이 없으면, 타당한 규칙인지 검증하는 단계(S130)로 이동하여 입력문의 새로운 오류 교정 규칙을 생성한다. If there is an error correction rule of the input query in the rule dictionary 400, the meaning error correction device 300 extracts the correction value of the error correction rule (S160). If there is no error correcting rule of the input query in the rule dictionary 400, the semantic error correcting apparatus 300 moves to step S130 of verifying whether it is a valid rule and generates a new error correcting rule of the input query.

의미 오류 교정 장치(300)는 추출된 교정내용(value)으로 교정대상(key)을 교정한다(S170). The meaning error correcting apparatus 300 calibrates the calibration target (key) with the extracted calibration value (S170).

의미 오류 교정 장치(300)는 교정된 어휘/형태소가 문법적으로 결합 가능한지 문법 오류 검사하고, 형태소인 경우 활용형으로 어휘를 생성하여, 의미 오류 교정된 출력문을 생성한다(S180).The semantic error correcting apparatus 300 generates a semantic error-corrected output statement by generating a vocabulary in a utilization form when the corrected vocabulary / morpheme is grammatically combinable or grammatical error check (S 180).

특히, 일반화된 오류 교정 규칙이 우선순위에 따라 순차적으로 정렬되어 있으므로, 의미 오류 교정 장치(300)는 우선순위가 높은 순서대로 입력문의 문맥 정보에 맞는 입력문의 오류 교정 규칙을 생성하고, 규칙 사전(400)에서 입력문의 오류 교정 규칙이 있으면, 교정 내용(value)을 추출하여 오류 교정을 종료할 수 있다. 즉, 표 17을 참고하면, 의미 오류 교정 장치(300)는 우선순위가 높은 규칙#2로 생성된 입력문의 오류 교정 규칙이 규칙 사전(400)에 존재하는지 탐색하고, 존재하는 경우, 해당 규칙의 교정내용('빚다')을 추출하여 의미 오류 교정 절차를 종료할 수 있다. 만약, 규칙#2로 생성된 입력문의 오류 교정 규칙이 규칙 사전(400)에 존재하지 않으면, 의미 오류 교정 장치(300)는 다음 우선순위에 해당하는 규칙#1로 생성된 입력문의 오류 교정 규칙이 규칙 사전(400)에 존재하는지 탐색한다. 따라서, 의미 오류 교정 장치(300)는 복수의 일반화된 오류 교정 규칙이 있더라도, 이들 모두를 입력문의 문맥 정보에 맞는 입력문의 오류 교정 규칙을 생성하고, 규칙 사전(400)에서 대응되는 오류 교정 규칙의 교정내용을 찾을 필요 없이, 교정내용이 출력되면 곧바로 오류 교정 규칙 탐색 절차를 종료하면 된다. In particular, since the generalized error correction rules are sequentially arranged in order of priority, the semantic error correction device 300 generates an error correction rule of the input query corresponding to the context information of the input query in order of priority, 400), if there is an error correction rule of the input query, the error correction can be terminated by extracting the correction value (value). That is, referring to Table 17, the semantic error correcting apparatus 300 searches whether the error correction rule of the input query generated by rule # 2 having a higher priority exists in the rule dictionary 400, and if so, And the semantic error correction procedure can be terminated by extracting the correction contents ('ode'). If the error correction rule generated by the rule # 2 does not exist in the rule dictionary 400, the error correction device 300 corrects the error correction rule of the input query generated by the rule # 1 corresponding to the next priority The rule dictionary 400 is searched. Therefore, even if there are a plurality of generalized error correcting rules, the semantic error correcting apparatus 300 generates error correcting rules of the input query corresponding to the context information of the input query, If there is no need to search for the calibration contents, the error correction rule search procedure can be terminated as soon as the calibration contents are output.

한편, 의미 오류 교정 장치(300)는 구문 분석기로 입력문의 구문 분석을 하여, 의미 오류 교정 규칙을 적용할지 판단할 수 있다. 이 경우, 의미 오류 교정 장치(300)는 추출된 일반화된 오류 교정 규칙이 입력문의 문맥 구조에서 타당한 규칙인지 검증(단계 S130)하기 전에, 입력문의 구문 분석 정보를 기초로 의미 오류 교정 규칙이 존재할 수 있는 입력문인지(즉, 의미 오류 교정 규칙의 적용 대상인지) 검증할 수 있다. 예를 들어, 지정된 의존 관계 유형일 때에만 규칙을 적용하여 오류 교정할 수 있는 경우, 지정된 의존 관계 유형에 해당하지 않으면 의미 오류 교정 규칙이 있더라도 적용하지 않으므로, 처음부터 의미 오류 교정 규칙을 탐색할 필요도 없기 때문이다.On the other hand, the semantic error correcting apparatus 300 can analyze the syntax of the input statement with the syntax analyzer to determine whether to apply the semantic error correcting rule. In this case, before the generalized error correction rule is verified in the input structure of the context structure (step S130), the semantic error correcting apparatus 300 may determine whether there is a semantic error correction rule (That is, whether the semantic error correcting rule is applied). For example, if you can apply error correction only when you specify a dependency type, and you do not apply the semantic error correction rule if it does not fall under the specified dependency type, you do not need to search semantic error correction rules from the beginning It is because there is not.

도 4는 본 발명의 한 실시예에 따른 오류 교정 시스템의 오류 교정 방법의 흐름도이다.4 is a flowchart of an error correction method of the error correction system according to an embodiment of the present invention.

도 4를 참고하면, 오류 교정 시스템(10)은 입력문이 들어오면, 전처리 검사 및 형태소 분석을 한다(S210). 오류 교정 시스템(10)은 구분 분석을 통해 주술관계, 의존관계 등을 파악할 수 있다.Referring to FIG. 4, the error correction system 10 performs preprocessing and morphological analysis when an input statement is received (S210). The error correction system 10 can identify the spell relation, the dependency relation, etc. through the classification analysis.

오류 교정 시스템(10)은 텍스트 오류 검사(띄어쓰기 오류 검사, 철자 오류 검사 등)를 한다(S220).The error correction system 10 performs a text error check (spelling error check, spelling error check, etc.) (S220).

오류 교정 시스템(10)은 입력문의 어절 및 형태소 중에서 의미 오류 발생 가능성이 있는 교정대상(key)을 추출한다(S230). 표 18을 참고하면, 교정대상(key) 후보는 어절 '빗었다', 형태소 '빗다', 이들의 품사 정보를 나타내는 대표 태그나 세부 태그일 수 있다. The error correction system 10 extracts a correction target (key) having a possibility of occurrence of a semantic error among the input word morpheme and morpheme (S230). Referring to Table 18, the candidates for the key may be a word 'BITSU', a morpheme 'BITDA', and a representative tag or a detailed tag indicating the parts of speech information.

오류 교정 시스템(10)은 규칙 사전(400)과 일반화된 규칙 사전(500)을 기초로 교정대상(key)의 의미 오류를 교정할 수 있는 적어도 하나의 의미 오류 교정 규칙을 추출한다(S240). 의미 오류 교정 규칙은 문맥 탐색 방향과 문맥(context)을 포함하는 규칙으로서, 문맥은 형태소나 어절뿐만 아니라 품사 정보로도 기술된다. 문맥은 복수의 문맥 정보가 조합될 수 있다.The error correction system 10 extracts at least one semantic error correction rule capable of correcting a semantic error of a key based on the rule dictionary 400 and the generalized rule dictionary 500 at step S240. Meaning Error correction rules are a rule that includes a context search direction and a context. The context is described by not only a morpheme or a phrase, but also parts of speech information. The context may be composed of a plurality of context information.

오류 교정 시스템(10)은 교정대상(key)을 의미 오류 교정 규칙의 교정내용(value)으로 교정한다(S250). The error correction system 10 corrects the calibration target (key) to the calibration value of the semantic error correction rule (S250).

오류 교정 시스템(10)은 교정된 어휘/형태소가 문법적으로 결합 가능한지 문법 오류 검사하고, 형태소인 경우 활용형으로 어휘를 생성하여, 의미 오류 교정된 출력문을 생성한다(S260).The error correction system 10 checks whether the corrected vocabulary / morpheme is grammatically combinable or grammatical error check, generates a vocabulary in the form of a morpheme if it is a morpheme, and generates a semantic error-corrected output statement (S260).

이와 같이, 본 발명의 실시예에 따르면 대용량 데이터를 이용하여 대량의 오류 후보군 및 오류 데이터쌍을 수집할 수 있어서, 정확도가 높은 의미 오류 규칙을 발굴 및 생성할 수 있으며 구축 시간을 단축할 수 있다. 본 발명의 실시예에 따르면 품사 정보를 문맥으로 활용하여 규칙을 기술하므로, 최소의 규칙으로 다양한 오류를 교정할 수 있으므로 시간과 비용을 절감할 수 있다. 본 발명의 실시예에 따르면 오류 교정 규칙에 기반한 오류 교정의 재현율을 높일 수 있고, 특히 품사 정보가 문맥으로 활용하여 문장의 길이 및 구성에 다양하게 적용할 수 있어 오류 교정의 재현율을 높일 수 있다.As described above, according to the embodiment of the present invention, it is possible to collect a large number of error candidates and error data pairs by using large-capacity data, so that semantic error rules with high accuracy can be found and generated and the construction time can be shortened. According to the embodiment of the present invention, because the rules are described using the parts-of-speech information as a context, various errors can be corrected with a minimum rule, thereby saving time and cost. According to the embodiment of the present invention, it is possible to increase the recall rate of the error correction based on the error correction rule, and in particular, it can be applied variously to the length and configuration of the sentence by using the part of speech information as context.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not implemented only by the apparatus and method, but may be implemented through a program for realizing the function corresponding to the configuration of the embodiment of the present invention or a recording medium on which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It belongs to the scope of right.

Claims

A method for generating a semantic error correction rule for an error correction system operating by at least one processor,
Extracting error data pairs from a large amount of data,
A semantic error correcting rule including a correcting object in which a semantic error occurred, a context, a direction in which a context is searched from the object to be corrected, and a correction content to be corrected when the context is detected, Creating, and
And storing the semantic error correcting rule in a semantic error correcting rule dictionary,
Wherein the error data pair is a data pair having the same left and right context but a certain length of phoneme difference.

The method of claim 1,
Wherein each of the calibration target, the calibration content, and the context of the semantic error correction rule is described as at least one of a word, morpheme, and part-
Wherein the part-of-speech information is any one of a part-of-speech representative tag and a part-of-speech tag.

The method of claim 1,
The direction to search for the context
A second direction for instructing to detect a context existing after the calibration object, a second direction for instructing detection of a context existing after the calibration object, and an instruction to detect a context existing before and after the calibration object And a third direction in which the semantic error correction rule is generated.

The method of claim 1,
Wherein the semantic error correction rule further includes information indicating whether to extend the rule,
The information indicating whether or not the rule is expanded
A first information for instructing the rule to be applied to a case clearly specified in the rule, and a second information for instructing the rule to be applied to the dependency specified through the parsing.

The method of claim 1,
Extracting a specific calibration object included in the generated specific meaning error correction rule and a specific calibration content by a calibration pair,
Collecting from the bulk data the same extended error data pair of left and right contexts of the calibration pair,
Extracting a context of the extended error data pair and generating a new semantic error correcting rule for the corrected pair
The method comprising the steps of:

The method of claim 1,
The step of extracting the error data pair
extracting a pair of data having the same length in the left and right context but having a predetermined length of phonemic difference in the refined data as the error data pair,
A semantic error correction rule generation method for extracting the error data pair from a dependency pair obtained by classifying and analyzing a large amount of data.

A semantic error correction method of an error correction system operating by at least one processor,
Classifying the input sentence into a word phrase, a morpheme, and part-of-speech information, extracting a correction target of the input query that may cause a semantic error based on the classified information,
Extracting a specific meaning error correction rule corresponding to the context included in the input subject and the calibration subject in the first rule dictionary, and
Correcting the calibration target with the calibration content of the specific meaning error correction rule,
Wherein the first rule dictionary stores a plurality of semantic error correction rules including the specific semantic error correction rule and the specific semantic error correction rule is a rule that the correction subject is displayed together with the context, Indicates the rule to correct by content,
Wherein the context is represented by at least one of a word, morpheme, and part of speech information.

8. The method of claim 7,
The step of extracting the semantic error correction rule
And generating at least one candidate semantic error rule that can be generated by the input statement based on the context of the input statement, searching for the candidate semantic error rule in the first rule dictionary, Meaning error is to extract calibration rules, meaning error correction methods.

8. The method of claim 7,
The step of extracting the semantic error correction rule
Extracting, in a second rule dictionary, at least one generalized rule relating to the calibration object,
Selecting a generalized rule that is valid in the context structure of the input query from among the extracted generalized rules based on information obtained by classifying the input statement,
Generating a semantic error rule of the input statement by applying the input statement to the context structure of the validated generalized rule; and
Extracting the specific meaning error correction rule corresponding to the first meaning error rule of the input query in the first rule dictionary,
Wherein the second rule dictionary stores a generalized rule in which the plurality of semantic error correction rules are generalized to information representing a context structure.

The method of claim 9,
Further comprising generating the second rule dictionary,
The step of generating the second rule dictionary
Converting at least one semantic error correcting rule related to the object to be corrected into the generalized rule, calculating an encoding value calculated according to a context detection direction, a number of contexts, and a context included in each generalized rule, And storing the corresponding semantic error correction method.

11. The method of claim 10,
The step of generating the second rule dictionary
Determining the priority of the generalized rule by calculating an encoding value of each generalized rule based on a binary mask having a binary value according to the context detection direction, the number of contexts, and the type of context,
The step of extracting the at least one generalized rule
Extracting at least one encoded value related to the object to be corrected in accordance with a priority order in the second rule dictionary, and decoding the extracted encoded value to obtain a generalized rule.

12. The method of claim 11,
The step of selecting a valid rule for the context structure of the input query
Determining whether the first rule having the highest priority in the extracted generalized rule is valid in the context structure of the input query,
When the first rule is not valid in the context structure of the input statement or when the first semantic error rule of the input query generated by the first rule is not present in the first rule dictionary, And judges whether the second rule of the next rank of the first rule is valid in the context structure of the input statement.

8. The method of claim 7,
Before the step of extracting the specific meaning error correction rule,
Determining whether an arbitrary semantic error correcting rule can be applied to the input sentence based on a result of analyzing the syntax of the input sentence
Wherein the error correction method further comprises:

An error correction system operated by at least one processor,
An error data pair collecting device for extracting a pair of data having a certain length of phoneme difference from a large amount of data in the same right and left context but in an error data pair, and
A semantic error correcting rule including a correcting object in which a semantic error occurred, a context, a direction in which a context is searched from the object to be corrected, and a correction content to be corrected when the context is detected, Generate meaning error correction rules technology device, and
A meaning error correcting device for correcting an error of the input query based on the meaning error correcting rules described in the semantic error correcting rule description device
The error correction system comprising:

The method of claim 14,
The semantic error correction rule description device
Describing the calibration object, the calibration content, and the context as at least one of a word phrase, morpheme, and part-
Wherein the part-of-speech information is one of a part-of-speech representative tag and a part-of-speech tag.

The method of claim 14,
The semantic error correction rule description device
The direction to search for the context
A second direction for instructing to detect a context existing after the calibration object, a second direction for instructing detection of a context existing after the calibration object, and an instruction to detect a context existing before and after the calibration object And a second direction in which the first and second directions are orthogonal to each other.

The method of claim 14,
The semantic error correcting device
Classifying the input sentence into a phrase, morpheme, and part-of-speech information, extracting a correction target of the input query that may cause a semantic error based on the classified information,
Extracting a specific meaning error correction rule corresponding to a context contained in the input subject and a correction subject of the input query from the semantic error correction rules described in the semantic error correction rule description device,
And corrects the calibration object of the input query with the calibration contents of the specific meaning error correction rule.

The method of claim 17,
The semantic error correcting device
Generating at least one candidate semantic error rule that can be generated by the input statement based on the context of the input statement,
Wherein the specific meaning error correction rule corresponding to the candidate meaning error rule is extracted from the semantic error correction rules described in the semantic error correction rule description device.

The method of claim 17,
A first rule dictionary storing semantic error correction rules described in the semantic error correction rule description device, and
Further comprising a second rule dictionary storing a generalized rule in which semantic error correction rules stored in the first rule are generalized to information indicating a context structure,
The semantic error correcting device
Extracting at least one generalized rule relating to an object of correction of the input query from a second rule dictionary and extracting at least one generalized rule from the generalized rules extracted from the extracted generalized rules, And then generates the candidate semantic error rule by applying the input statement to the context structure of the validated generalized rule.

20. The method of claim 19,
The second rule dictionary
A binary value is added to the context detection direction and the context structure included in each generalized rule to store the encoded value corresponding to the generalized rule,
Wherein each generalized rule is assigned a binary value in a binary mask to indicate a priority determined by the context detection direction and the context structure.