KR101747924B1

KR101747924B1 - Method of correcting korean utterance and apparatus perfroming the same

Info

Publication number: KR101747924B1
Application number: KR1020160052500A
Authority: KR
Inventors: 이효행; 김다경; 오병훈
Original assignee: (주)기술공감
Priority date: 2016-04-28
Filing date: 2016-04-28
Publication date: 2017-06-27

Abstract

본 발명의 일 실시예에 따른 한국어 철자 검사 방법은 텍스트 파일을 읽어 단어 데이터베이스를 참조로 상기 텍스트 파일 안의 텍스트의 단어를 분석하여 철자를 검사하는 단계, 문장 성분 데이터베이스를 이용하여 상기 단어 분석된 상기 텍스트의 어절을 분석하여 철자를 검사하는 단계 및 문장 데이터베이스를 참조로 상기 어절 분석된 텍스트의 문장을 분석하여 상기 문장의 의미를 검사하는 단계를 포함한다. 따라서, 본 발명은 텍스트를 어절 단위로 분석하여 철자 검사하기 때문에 종래와 같이 다수어절로 이루어진 단어의 철자도 검사할 수 있다.According to an embodiment of the present invention, there is provided a method of checking a spelling of a text, the method comprising the steps of: reading a text file and analyzing a word of the text in the text file with reference to the word database to check spelling; Analyzing a word of the sentence and analyzing the spelling of the sentence by analyzing the sentence of the text analyzed with reference to the sentence database. Accordingly, the present invention analyzes spelling in text by analyzing the text in units of words, so that the spelling of a word composed of plural words can be checked as in the conventional art.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a spelling check method,

본 발명의 실시예들은 한국어 철자 검사 방법 및 이를 실행하는 장치에 관한 것이다.
Embodiments of the present invention relate to a method of inspecting Korean spelling and an apparatus for executing the method.

일반적으로 맞춤법 검사기란 좁게는 단순한 철자 및 문법의 교정에 필요한 정보를 제공하는 것을 말하며, 좀더 넓은 범위로는 문서, 혹은 문장의 효과적인 작성을 지원하기 위해서 문체상 오류를 검사하는 기능을 하며, 또한, 순화 용어를 제공하고, 구두점 등 문장부호의 사용에 대한 적합성을 알려주며, 오류에 대한 학습 정보와, 오류의 빈도나 강도에 따른 교정된 정보를 제공하고, 단어의 사용 용례 등을 함께 제공하는 것을 말한다.In general, a spell checker refers to providing information necessary for simple spelling and grammar proofing. More broadly, the spell checker performs a function of checking for errors in a style to support effective writing of a document or a sentence. In addition, It refers to providing refinement terms, indicating suitability for use of punctuation marks such as punctuation marks, providing learning information about errors, corrected information according to the frequency and intensity of errors, and providing usage examples of words and the like .

현대 사회에서 컴퓨터에 의한 문서 편집, 탁상출판(DTP), 전자서식(CTS) 등 그밖에 여러 워드 작성기는 문서 작성과 편집에 드는 시간과 노력을 줄여 주었으며, 문서의 질적 향상을 가져왔다.In modern society, many word-writers such as computer-assisted document editing, desktop publishing (DTP), electronic formatting (CTS), etc. have reduced the time and effort required for document creation and editing, and have improved the document quality.

그러나 작성한 문서에 대한 퇴고는 아직 수작업에 의존하고 있으며, 이에 따라서 퇴고 작업이 문서 작성에 병목 현상을 초래하고 있다. 또한 철자가 틀리거나 잘못 입력된 자료는 문서 검색과 보관 및 문서의 처리에 큰 어려움을 준다. 따라서 자동 퇴고 시스템의 경제적, 사회적 활용성과 필요성은 날로 증대되고 있다.However, the retirement of the written documents is still dependent on manual work, and thus the retrenchment is a bottleneck in document preparation. In addition, incorrectly spelled or incorrectly typed data presents a great challenge to document retrieval, archiving, and processing of documents. Therefore, the economic and social utilization and the necessity of the automatic retreat system are increasing day by day.

통상적으로 맞춤법 검사방법은 정제할 문장을 형태소로 분석하여 대상 형태소만 검사하는 방법이 많이 이용되고 있으나, 이렇게 형태소 분석만으로 맞춤법을 검사하는 방법은 다수어절로 이루어진 단어는 맞춤법 검사 대상으로 고려할 수 없다는 문제점이 있다.In general, the spell check method is a method of analyzing the target morpheme by analyzing the sentence to be refined. However, the method of checking the spelling using only the morpheme analysis is that the word composed of plural words can not be considered as the target of the spell check .

한국공개특허 제10-2005-0111920호는 오류의심어절 판별 및 교정정보 제공시스템 및 그 구현방법에 관한 것으로, 자립 형태소에 따른 조사교정은 가능하지만, 단지 자립형태소에 붙는 조사의 맞춤법만 검증하므로, 다수어절로 이루어진 단어와, 복합명사를 맞춤법의 검사대상으로 하지 못하는 문제점이 있다.
Korean Patent Laid-Open No. 10-2005-0111920 relates to a system for providing a false doubtful word discrimination and correction information and an implementation method thereof. Although it is possible to perform a proofreading according to a self-supporting morpheme, There is a problem that the word composed of plural words and the compound noun can not be used as an inspection target of spelling.

한국공개특허 제10-2005-0111920호Korean Patent Publication No. 10-2005-0111920

본 발명은, 텍스트를 어절 단위로 분석하여 철자 검사하기 때문에 종래와 같이 다수어절로 이루어진 단어의 철자도 검사할 수 있도록 하는 한국어 철자 검사 방법 및 이를 실행하는 장치를 제공하는 것을 목적으로 한다.An object of the present invention is to provide a Korean spelling inspection method and a device for executing the spelling check method, which can check the spelling of a word composed of a plurality of words, as in the prior art, by analyzing the text in units of words and verifying the spelling.

또한, 본 발명은 문맥 철자오류를 교정함으로써 한국어 문서 교정기의 성능을 높일 수 있도록 하는 한국어 철자 검사 방법 및 이를 실행하는 장치를 제공하는 것을 목적으로 한다.
It is another object of the present invention to provide a Korean spelling test method and a device for executing the Korean spelling test method that can improve the performance of a Korean document correcting device by correcting context spelling errors.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제(들)로 제한되지 않으며, 언급되지 않은 또 다른 과제(들)은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.
The problems to be solved by the present invention are not limited to the above-mentioned problem (s), and another problem (s) not mentioned can be clearly understood by those skilled in the art from the following description.

실시예들 중에서, 한국어 철자 검사 방법은 텍스트 파일을 읽어 단어 데이터베이스를 참조로 상기 텍스트 파일 안의 텍스트의 단어를 분석하여 철자를 검사하는 단계, 문장 성분 데이터베이스를 이용하여 상기 단어 분석된 상기 텍스트의 어절을 분석하여 철자를 검사하는 단계 및 문장 데이터베이스를 참조로 상기 어절 분석된 텍스트의 문장을 분석하여 상기 문장의 의미를 검사하는 단계를 포함한다.Among the embodiments, the Korean spelling examination method includes the steps of reading a text file and analyzing spelling by analyzing words of text in the text file with reference to the word database, analyzing the spelling of the word analyzed using the sentence composition database Analyzing the spelling of the sentence, analyzing the spelling, and analyzing the sentence of the text analyzed with reference to the sentence database to check the meaning of the sentence.

실시예들 중에서, 한국어 철자 장치는 복수의 표준 단어가 저장되어 있는 단어 데이터베이스, 복수의 문장 성분이 저장되어 있는 문장 성분 데이터베이스, 복수의 문장이 저장되어 있는 문장 데이터베이스, 텍스트 파일을 읽어 단어 데이터베이스를 참조로 상기 텍스트 파일 안의 텍스트의 단어를 분석하여 철자를 검사하는 단어 분석부, 상기 문장 성분 데이터베이스를 이용하여 상기 단어 분석부에 의해 단어 분석된 상기 텍스트의 어절을 분석하여 철자를 검사하는 어절 분석부 및 상기 문장 데이터베이스를 참조로 상기 어절 분석부에 의해 어절 분석된 텍스트의 문장을 분석하여 상기 문장의 의미를 검사하는 문장 분석부를 포함한다.
Among the embodiments, the Korean spelling apparatus includes a word database in which a plurality of standard words are stored, a sentence component database in which a plurality of sentence components are stored, a sentence database in which a plurality of sentences are stored, A word analysis unit for analyzing a word of the text in the text file to check spelling; a word analysis unit for analyzing a word of the text analyzed by the word analysis unit using the sentence component database, And a sentence analysis unit for analyzing the sentence of the text analyzed by the word analysis unit with reference to the sentence database to check the meaning of the sentence.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 첨부 도면들에 포함되어 있다.The details of other embodiments are included in the detailed description and the accompanying drawings.

본 발명의 이점 및/또는 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다.
BRIEF DESCRIPTION OF THE DRAWINGS The advantages and / or features of the present invention, and how to accomplish them, will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. It should be understood, however, that the invention is not limited to the disclosed embodiments, but is capable of many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

본 발명에 따르면, 텍스트를 어절 단위로 분석하여 철자 검사하기 때문에 종래와 같이 다수어절로 이루어진 단어의 철자도 검사할 수 있다는 장점이 있다.According to the present invention, since the text is spell-checked by analyzing the text in units of words, there is an advantage that the spelling of a word composed of plural words can be inspected as in the conventional art.

또한 본 발명에 따르면, 문맥 철자오류를 교정함으로써 한국어 문서 교정기의 성능을 높일 수 있다는 장점이 있다.
Also, according to the present invention, the performance of the Korean document calibrator can be improved by correcting the context spelling errors.

도 1은 본 발명의 일 실시예에 따른 한국어 철자 검사 장치의 내부 구조를 설명하기 위한 블록도이다.
도 2는 본 발명에 따른 한국어 철자 검사 방법의 일 실시예를 설명하기 위한 흐름도이다.1 is a block diagram illustrating an internal structure of a Korean spelling test apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart for explaining an embodiment of the Korean spelling test method according to the present invention.

이하에서는 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 한다.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 한국어 철자 검사 장치의 내부 구조를 설명하기 위한 블록도이다.1 is a block diagram illustrating an internal structure of a Korean spelling test apparatus according to an embodiment of the present invention.

도 1을 참조하면, 한국어 철자 검사 장치(100)는 단어 데이터베이스(110), 문장 성분 데이터베이스(120), 문장 데이터베이스(130), 단어 분석부(140), 어절 분석부(150) 및 문장 분석부(160)를 포함한다.1, a Korean spelling test apparatus 100 includes a word database 110, a sentence component database 120, a sentence database 130, a word analysis unit 140, a word analysis unit 150, (160).

단어 데이터베이스(110)에는 복수의 표준 단어가 저장되어 있다. 단어 데이터베이스(110)에 저장된 표준 단어는 표준 국어 대사전에 담겨있는 단어이다. 따라서, 단어 분석부(140)는 텍스트에서 추출한 복수의 단어 각각과 단어 데이터베이스(110)에 저장된 단어를 비교하여 복수의 단어 각각의 철자 오류 여부를 확인할 수 있는 것이다. 이러한 과정은 이하에서 보다 구체적으로 설명하기로 한다.The word database 110 stores a plurality of standard words. The standard word stored in the word database 110 is a word contained in the standard Korean dictionary. Therefore, the word analyzing unit 140 can compare the plurality of words extracted from the text with the words stored in the word database 110, thereby verifying whether each of the plurality of words is misspelled. This process will be described in more detail below.

문장 성분 데이터베이스(120)에는 복수의 문장 성분이 저장되어 있다. 따라서, 문장 분석부(160)는 문장 성분 데이터베이스(120)에 미리 저장된 문장 성분의 철자와 텍스트에서 추출된 문장 성분의 철자가 상이하면 텍스트에서 추출된 문장 성분의 철자에 오류 여부를 확인할 수 있는 것이다. 이러한 과정은 이하에서 보다 구체적으로 설명하기로 한다.The sentence component database 120 stores a plurality of sentence components. Therefore, if the spelling of the sentence component stored in the sentence component database 120 differs from the spelling of the sentence component extracted from the text, the sentence analysis unit 160 can check whether the spelling of the sentence component extracted from the text is an error . This process will be described in more detail below.

문장 데이터베이스(130)에는 복수의 문장이 저장되어 있다. 따라서, 문장 분석부(160)는 문장 데이터베이스(130)에 미리 저장된 문장을 참조로 텍스트에서 추출한 문장의 의미를 분석할 수 있다. 이러한 과정은 이하에서 보다 구체적으로 설명하기로 한다.The sentence database 130 stores a plurality of sentences. Therefore, the sentence analysis unit 160 can analyze the meaning of the sentence extracted from the text by referring to the sentence stored in the sentence database 130 in advance. This process will be described in more detail below.

단어 분석부(140)는 텍스트 파일을 읽어 단어 데이터베이스(110)를 참조로 텍스트 파일 안의 텍스트의 단어를 분석하여 철자를 검사한다. The word analysis unit 140 analyzes the spelling of the text in the text file by reading the text file and referring to the word database 110.

먼저, 단어 분석부(140)는 텍스트를 품사 단위(즉, 단위는 명사, 대명사, 동사, 형용사, 관형사, 수사, 부사, 조사 및 감탄사)로 분석하여 복수의 단어를 생성한다. 이때, 복수의 단어는 문법상 일정한 뜻을 가지며 홀로 쓰일 수 있는 최소의 단위이며 품사 단위와 일치하기 때문에 단어 분석부(140)는 텍스트를 품사 단위로 분석하는 것이다. First, the word analysis unit 140 generates a plurality of words by analyzing the text by a part-of-speech unit (that is, units are nouns, pronouns, verbs, adjectives, adverbs, investigations, adverbs, surveys and exclamations). At this time, the plurality of words have a certain meaning in the grammar and are the smallest unit that can be used alone, and the word analyzing unit 140 analyzes the text in terms of part-of-speech because it is consistent with the part-of-speech unit.

예를 들어, 단어 분석부(140)는 텍스트 “그 꽃은 매우 예뻤다”인 경우, 텍스트를 품사 단위로 분석하여 복수의 단어 “그”, “꽃”, “은”, “매우”, “예뻤다”를 생성할 수 있다. For example, if the word " the flower is very pretty ", the word analysis unit 140 analyzes the text in terms of parts of speech and generates a plurality of words " him ", " flower ", " silver ", " &Quot; can be generated.

그런 다음, 단어 분석부(140)는 텍스트를 품사 단위로 분석하여 복수의 단어를 생성하고, 복수의 단어 각각과 단어 데이터베이스(110)에 미리 저장된 단어를 비교하여 복수의 단어 각각의 철자 오류 여부를 확인한 후, 철자에 오류가 있는 단어에 대한 대체 단어를 이용하여 교정을 추천할 수 있다.Then, the word analyzing unit 140 analyzes the text in units of parts to generate a plurality of words, compares each of the plurality of words with the words stored in the word database 110 in advance, and determines whether or not each of the plurality of words is misspelled After verifying, you can recommend a correction using a replacement word for the misspelled word.

즉, 단어 분석부(140)는 단어 데이터베이스(110)에 미리 저장된 단어의 철자와 텍스트에서 추출된 단어의 철자가 상이하면 텍스트에서 추출된 단어의 철자에 오류가 있다고 판단한다.That is, if the spelling of the word stored in the word database 110 differs from the spelling of the word extracted from the text, the word analysis unit 140 determines that there is an error in the spelling of the word extracted from the text.

이때, 단어 분석부(140)는 단어 데이터베이스(110)에 미리 저장된 단어 중 철자에 오류가 있는 단어의 철자를 대체할 수 있는 대체 단어의 개수가 하나이면 단어 데이터베이스(110)에서 대체 단어를 추출한 후 대체 단어를 이용하여 텍스트에서 추출된 단어의 철자를 교정하는 것을 추천할 수 있다.At this time, if the number of substitute words that can replace the spelling of a word with a spelling error is one among the words stored in advance in the word database 110, the word analyzing unit 140 extracts a substitute word from the word database 110 It is advisable to correct the spelling of words extracted from text using alternative words.

한편, 단어 분석부(140)는 단어 데이터베이스(110)에 미리 저장된 단어 중 철자에 오류가 있는 단어의 철자를 대체할 수 있는 대체 단어의 개수가 복수 개이면 단어 데이터베이스(110)에서 대체 단어를 추출하여 대체 단어 리스트를 생성한다. On the other hand, if there are a plurality of substitute words that can replace the spelling of words with spelling errors in the words stored in the word database 110, the word analyzing unit 140 extracts substitute words from the word database 110 To generate a replacement word list.

그런 다음, 단어 분석부(140)는 대체 단어 리스트의 대체 단어 중 어느 하나의 대체 단어를 이용하여 텍스트에서 추출된 단어의 철자를 교정하는 것을 추천할 수 있다.Then, the word analysis unit 140 may recommend correcting the spelling of the word extracted from the text by using any of the alternative words of the substitute word list.

어절 분석부(150)는 문장 성분 데이터베이스(120)를 이용하여 단어 분석된 텍스트의 어절을 분석하여 철자를 검사한다. 어절 분석부(150)는 텍스트를 띄어쓰기 단위로 분할하여 복수의 문장 성분을 생성하고, 복수의 문장 성분 각각의 종류를 주성분, 부속 성분 및 독립 성분 중 어느 하나로 결정한다. 이때, 주성분은 주어, 서술어, 목적어 및 보어를 포함하고, 부속 성분은 관형어 및 부사어를 포함하고, 독립 성분은 독립어를 포함한다.The word analysis unit 150 analyzes the spelling of the word analyzed by using the sentence component database 120. The phrase analysis unit 150 generates a plurality of sentence components by dividing the text into spaces, and determines each kind of the plurality of sentence components as one of the main component, the subcomponent, and the independent component. At this time, the principal components include subject, predicate, object, and bore, the subcomponents include idiomatic and adjectival terms, and independent elements include independent words.

예를 들어, 어절 분석부(150)는 텍스트 “나는 맛있는 빵을 먹었다”를 띄어쓰기 단위로 분할하여 복수의 문장 성분 “나는”, “맛있는”, “빵을”, “먹었다”를 생성하고, 복수의 문장 성분 각각의 종류를 “나는” ? 주성분(주어), “맛있는(관형어)” - 부속 성분, “빵을” ? 주성분(목적어)”, “먹었다” ? 주성분(서술어)로 결정한다. For example, the phrase analysis unit 150 generates a plurality of sentence components "I", "Tasty", "Bread", "Eat" by dividing the text "I ate the delicious bread" Each type of sentence component "I"? Main ingredient (subject), "Delicious (tongue)" - Sub ingredient, "Bread"? Main ingredient (object) "," eaten "? It is determined by the principal component (predicate).

이때, 어절 분석부(150)는 복수의 문장 성분 중 종류가 결정되지 않은 문자 성분이 존재하면 해당 문장 성분과 문장 성분 데이터베이스(120)에 미리 저장된 문장 성분과 비교하여 문장 성분의 철자 오류 여부를 확인하다. At this time, if there is a character component whose type is not determined among a plurality of sentence components, the word analysis unit 150 compares the corresponding sentence component with a sentence component stored in the sentence component database 120 to check whether or not the sentence component is spelled out Do.

즉, 어절 분석부(150)는 문장 성분 데이터베이스(120)에 미리 저장된 문장 성분의 철자와 텍스트에서 추출된 문장 성분의 철자가 상이하면 텍스트에서 추출된 문장 성분의 철자에 오류가 있다고 판단한다. That is, if the spelling of the sentence component stored in the sentence component database 120 differs from the spelling of the sentence component extracted from the text, the word analyzing unit 150 determines that there is an error in the spelling of the sentence component extracted from the text.

이때, 어절 분석부(150)는 문장 성분 데이터베이스(120)에 미리 저장된 문장 성분 중 철자에 오류가 있는 문장 성분의 철자를 대체할 수 있는 대체 문장 성분의 개수가 하나이면 문장 성분 데이터베이스(120)에서 대체 문장 성분을 추출한 후 대체 문장 성분을 이용하여 텍스트에서 추출된 문장 성분의 철자를 교정하는 것을 추천할 수 있다. If the number of substitute sentence components that can replace the spelling of the sentence component having the spelling error is one of the sentence components previously stored in the sentence component database 120, It is recommended to extract the substitute sentence components and then correct the spelling of the sentence components extracted from the text using the substitute sentence components.

한편, 어절 분석부(150)는 문장 성분 데이터베이스(120)에 미리 저장된 문장 성분 중 철자에 오류가 있는 문장 성분의 철자를 대체할 수 있는 대체 문장 성분의 개수가 복수 개이면 대체 문장 성분 데이터베이스(120)에서 문장 성분에 해당하는 대체 문장 성분을 추출하여 대체 문장 성분 리스트를 생성한다. 그런 다음, 어절 분석부(150)는 대체 단어 리스트의 대체 단어 중 어느 하나의 대체 단어를 이용하여 텍스트에서 추출된 단어의 철자를 교정하는 것을 추천할 수 있다.On the other hand, if there are a plurality of alternative sentence components that can replace the spelling of the spelled-out sentence component among the sentence components previously stored in the sentence component database 120, the phrase analysis unit 150 ), A substitute sentence component corresponding to the sentence component is extracted and a substitute sentence component list is generated. Then, the word analyzing unit 150 may recommend correcting the spelling of the word extracted from the text using any of the substitute words of the substitute words in the substitute word list.

즉, 종래에는 철자 검사를 위해 텍스트를 형태소 단위로 분석하였으나 이렇게 형태소 분석만으로 맞춤법을 검사하는 방법은 다수어절로 이루어진 단어는 맞춤법 검사 대상으로 고려할 수 없다는 문제점이 있었으나, 본 발명에서는 텍스트를 어절 단위로 분석하여 철자 검사하기 때문에 종래와 같이 다수어절로 이루어진 단어의 철자도 검사할 수 있다.In other words, conventionally, the text is analyzed in terms of morpheme for spell checking. However, there is a problem in that a method of checking the spelling by merely morpheme analysis can not consider a word composed of plural words as a subject of spell check. However, So that the spelling of a word consisting of a plurality of words can be inspected as in the conventional art.

먼저, 문장 분석부(160)는 어절 분석부(150)에 의해 분석된 복수의 문장 성분 중 주어 및 서술어가 존재하는지 여부를 확인하고, 주어 또는 서술어가 존재하지 않으면 문장에 오류가 있다고 판단하여 주어 또는 서술어가 누락된 사실을 알릴 수 있다. First, the sentence analysis unit 160 checks whether a subject and a predicate exist among a plurality of sentence components analyzed by the word analysis unit 150. If there is no subject or a predicate, the sentence analysis unit 150 determines that there is an error in the sentence Or the fact that the predicate is missing.

또한, 문장 분석부(160)는 어절 분석부(150)에 의해 분석된 복수의 문장 성분을 결합하여 서브 문장을 생성하고 문장 데이터베이스에 미리 저장된 문장을 참조로 서브 문장의 의미를 검사한다. In addition, the sentence analyzer 160 combines a plurality of sentence components analyzed by the phrase analyzer 150 to generate a sub-sentence, and checks the meaning of the sub-sentence by referring to sentences stored in the sentence database.

즉, 문장 분석부(160)는 어절 분석부(150)에 의해 분석된 복수의 문장 성분 중 적어도 두 개의 문장 성분을 결합한 서브 문장이 존재하는지 여부를 확인하고, 확인 결과에 따라 문장 데이터베이스(130)에 미리 저장된 문장 중 서브 문장을 포함하는 문장이 있는지 확인하여 문장의 의미를 검사하는 것이다.That is, the sentence analysis unit 160 checks whether or not a sub-sentence combining at least two sentence components among a plurality of sentence components analyzed by the word analyzing unit 150 exists, And checking the meaning of the sentence by checking whether there is a sentence including the sub-sentence among the sentences stored in advance.

일 실시예에서, 문장 분석부(160)는 어절 분석부(150)에 의해 분석된 복수의 문장 성분 중 주어, 목적어, 보어 및 서술어 중 적어도 두 개의 문장 성분을 결합한 문장이 문장 데이터베이스(130)에 존재하는지 여부에 따라 문장의 의미를 검사한다. In one embodiment, the sentence analysis unit 160 determines whether sentences combining at least two sentence components of subject, object, bore, and predicate among a plurality of sentence components analyzed by the phrase analysis unit 150 are included in the sentence database 130 Check the meaning of the sentence according to whether it exists or not.

이러한 일 실시예에서, 문장 분석부(160)는 어절 분석부(150)에 의해 분석된 복수의 문장 성분 중 주어, 목적어, 보어 및 서술어 중 적어도 두 개의 문장 성분을 결합한 문장이 문장 데이터베이스(130)에 존재하면 문장의 의미에 오류가 없다고 확인할 수 있다.In this embodiment, the sentence analysis unit 160 analyzes the sentence database 130 in such a manner that a sentence combining sentence components of at least two of subject, object, bore, and predicate among a plurality of sentence components analyzed by the word analyzing unit 150, , It can be confirmed that there is no error in the meaning of the sentence.

예를 들어, 문장 분석부(160)는 문장 데이터베이스(130)에 미리 저장된 문장 중 “나는(주어) + 밥을(목적어) + 먹었다(서술어)” 결합한 서브 문장이 존재하면 문장의 의미에 오류가 없다고 확인할 수 있다.For example, if the sentence analyzing unit 160 has a combination of sub-sentences pre-stored in the sentence database 130, " I (subject) + Bob (subject) Can be confirmed.

한편, 문장 분석부(160)는 어절 분석부(150)에 의해 분석된 복수의 문장 성분 중 주어, 목적어, 보어 및 서술어 중 적어도 두 개의 문장 성분을 결합한 문장이 문장 데이터베이스(130)에 존재하지 않으면, 문장 데이터베이스(130)에서 텍스트의 문장 중 적어도 일부와 일치하는 문장을 추출하고, 문장을 참조로 텍스트의 문장을 교정하는 것을 추천할 수 있다.On the other hand, if the sentence database 130 does not contain sentences combining sentence components of at least two of subject, object, bore, and predicate among a plurality of sentence components analyzed by the word analyzing unit 150 , It is recommendable to extract a sentence matching at least part of the sentences of the text in the sentence database 130 and correct the sentence of the text with reference to the sentence.

예를 들어, 문장 분석부(160)는 문장 데이터베이스(130)에 미리 저장된 문장 중 “사장님께(주어) + 결제를(목적어) + 올렸습니다(서술어)” 결합한 서브 문장이 존재하지 않으면, 문장 데이터베이스(130)에서 텍스트의 문장 중 “사장님께(주어) 및 올렸습니다(서술어)”를 포함하는 문장 “사장님께 결재를 올렸습니다”를 추출하고, 문장을 참조로 텍스트의 문장의 문장 성분 목적어를 “결제 ? 결재”로 교정하는 것을 추천할 수 있다.For example, if there is no combined sub-sentence, the sentence analyzing unit 160 outputs " boss (subject) + settlement (object) + posting (descriptor) " among sentences stored in the sentence database 130 in advance. (130), extracts the sentence "I have made a payment to the boss" containing the sentence "Boss (subject) and uploaded (narrative)" in the text sentence, and refers to the sentence, payment ? It is recommended to calibrate to "payment".

다른 예를 들어, 문장 분석부(160)는 문장 데이터베이스(130)에 미리 저장된 문장 중 “나는(주어) + 수학을(목적어) + 가르켰다(서술어)”를 결합한 서브 문장이 존재하지 않으면, 문장 데이터베이스(130)에서 텍스트의 문장 중 “나는(주어) + 수학을(목적어)”를 포함하는 문장 “나는 수학을 가르쳤다”를 추출하고, 문장을 참조로 텍스트의 문장의 문장 성분 서술어를 “가르켰다 ? 가르쳤다”로 교정하는 것을 추천할 수 있다.
For example, if there is no sub-sentence combining "I (subject) + math (object) + pointed to (predicate)" among the sentences stored in the sentence database 130 in advance in the sentence analysis unit 160, In the database 130, the sentence "I taught mathematics" containing the sentence "I (subject) + mathematics (object)" is extracted, and the sentence is referred to as a sentence, ? Can be recommended as "teaching".

도 2는 본 발명에 따른 한국어 철자 검사 방법의 일 실시예를 설명하기 위한 흐름도이다.FIG. 2 is a flowchart for explaining an embodiment of the Korean spelling test method according to the present invention.

도 2를 참조하면, 한국어 철자 검사 장치(100)는 텍스트 파일을 읽어 단어 데이터베이스를 참조로 상기 텍스트 파일 안의 텍스트의 단어를 분석하여 철자를 검사한다(단계 S210).Referring to FIG. 2, the Korean spelling examination apparatus 100 reads a text file and analyzes a word of a text in the text file with reference to a word database (step S210).

단계 S210에 대한 일 실시예에서, 한국어 철자 검사 장치(100)는 텍스트를 품사 단위로 분석하여 복수의 단어를 생성하고, 복수의 단어 각각과 상기 단어 데이터베이스에 미리 저장된 단어를 비교하여 상기 복수의 단어 각각의 철자 오류 여부를 확인한다. In one embodiment of the step S210, the Korean spelling examination apparatus 100 analyzes a text on a part-of-speech basis to generate a plurality of words, compares each of the plurality of words with a word stored in advance in the word database, Check each spelling error.

이때, 한국어 철자 검사 장치(100)는 단어 데이터베이스에 미리 저장된 단어 중 철자에 오류가 있는 단어의 철자를 대체할 수 있는 대체 단어의 개수가 하나이면 단어 데이터베이스(110)에서 대체 단어를 추출한 후 대체 단어를 이용하여 텍스트에서 추출된 단어의 철자를 교정하는 것을 추천할 수 있다.At this time, if the number of alternative words that can replace the spelling of a word having an error in spelling is one of the words stored in advance in the word database, the Korean spelling test apparatus 100 extracts a substitute word from the word database 110, It is recommended to correct the spelling of words extracted from the text.

한편, 한국어 철자 검사 장치(100)는 단어 데이터베이스(110)에 미리 저장된 단어 중 철자에 오류가 있는 단어의 철자를 대체할 수 있는 대체 단어의 개수가 복수 개이면 단어 데이터베이스(110)에서 대체 단어를 추출하여 대체 단어 리스트를 생성한다. 그런 다음, 한국어 철자 검사 장치(100)는 대체 단어 리스트의 대체 단어 중 어느 하나의 대체 단어를 이용하여 텍스트에서 추출된 단어의 철자를 교정하는 것을 추천할 수 있다.On the other hand, if the number of replacement words that can replace the spelling of a word having a spelling error in a word stored in advance in the word database 110 is plural, the Korean spelling examination apparatus 100 may search for a substitute word in the word database 110 And generates a replacement word list. Then, the Korean spelling examination apparatus 100 may recommend correcting the spelling of the word extracted from the text by using a substitute word of the substitute words of the substitute word list.

한국어 철자 검사 장치(100)는 문장 성분 데이터베이스를 이용하여 상기 단어 분석된 텍스트의 어절을 분석하여 철자를 검사한다(단계 S220). The Korean spelling test apparatus 100 analyzes the spellings of the word analyzed text using the sentence component database (step S220).

단계 S220에 대한 일 실시예에서, 한국어 철자 검사 장치(100)는 텍스트를 띄어쓰기 단위로 분할하여 복수의 문장 성분을 생성하고, 복수의 문장 성분 각각의 종류를 주성분, 부속 성분 및 독립 성분 중 어느 하나로 결정한다. 이때, 주성분은 주어, 서술어, 목적어 및 보어를 포함하고, 부속 성분은 관형어 및 부사어를 포함하고, 독립 성분은 독립어를 포함한다.In one embodiment of step S220, the Korean spelling examination apparatus 100 divides text into spaces and generates a plurality of sentence components, and classifies each of the plurality of sentence components into one of a main component, an ancillary component, and an independent component . At this time, the principal components include subject, predicate, object, and bore, the subcomponents include idiomatic and adjectival terms, and independent elements include independent words.

이때, 한국어 철자 검사 장치(100)는 문장 성분 데이터베이스에 미리 저장된 문장 성분 중 철자에 오류가 있는 문장 성분의 철자를 대체할 수 있는 대체 문장 성분의 개수가 하나이면 문장 성분 데이터베이스에서 대체 문장 성분을 추출한 후 대체 문장 성분을 이용하여 텍스트에서 추출된 문장 성분의 철자를 교정하는 것을 추천할 수 있다. At this time, if the number of substitute sentence components that can replace the spelling of the sentence component that has the spelling error is one of the sentence components previously stored in the sentence component database, the Korean spelling test apparatus 100 extracts the substitute sentence component from the sentence component database It is recommended to correct the spelling of the sentence components extracted from the text using the post-replacement sentence component.

한편, 한국어 철자 검사 장치(100)는 문장 성분 데이터베이스에 미리 저장된 문장 성분 중 철자에 오류가 있는 문장 성분의 철자를 대체할 수 있는 대체 문장 성분의 개수가 복수 개이면 대체 문장 성분 데이터베이스(120)에서 문장 성분에 해당하는 대체 문장 성분을 추출하여 대체 문장 성분 리스트를 생성한다. 그런 다음, 한국어 철자 검사 장치(100)는 대체 단어 리스트의 대체 단어 중 어느 하나의 대체 단어를 이용하여 텍스트에서 추출된 단어의 철자를 교정하는 것을 추천할 수 있다.On the other hand, if there are a plurality of substitute sentence components that can replace the spelling of sentence components that have spelling errors, among the sentence components previously stored in the sentence component database, the Korean spelling examination apparatus 100 And generates an alternative sentence component list by extracting an alternative sentence component corresponding to the sentence component. Then, the Korean spelling examination apparatus 100 may recommend correcting the spelling of the word extracted from the text by using a substitute word of the substitute words of the substitute word list.

한국어 철자 검사 장치(100)는 문장 데이터베이스를 참조로 어절 분석된 텍스트의 문장을 분석하여 상기 문장의 의미를 검사한다(단계 S230). The Korean spelling examination apparatus 100 analyzes the sentence of the text analyzed by referring to the sentence database to check the meaning of the sentence (step S230).

단계 S320에 대한 일 실시예에서, 한국어 철자 검사 장치(100)는 어절 분석 과정을 통해 생성된 복수의 문장 성분 중 주어 및 서술어가 존재하는지 여부를 확인하고, 주어 또는 서술어가 존재하지 않으면 문장에 오류가 있다고 판단하여 주어 또는 서술어가 누락된 사실을 알릴 수 있다.In one embodiment of step S320, the Korean spelling examination apparatus 100 checks whether a subject and a predicate exist among a plurality of sentence components generated through a word analysis process, and if there is no subject or a predicate, And the fact that the subject or the predicate is missing can be notified.

또한, 한국어 철자 검사 장치(100)는 어절 분석 과정을 통해 복수의 문장 성분을 결합하여 서브 문장을 생성하고 문장 데이터베이스에 미리 저장된 문장을 참조로 서브 문장의 의미를 검사한다.
Further, the Korean spelling examination apparatus 100 combines a plurality of sentence components through a word analysis process to generate a sub-sentence, and checks the meaning of the sub-sentence with reference to sentences stored in the sentence database.

지금까지 본 발명에 따른 구체적인 실시예에 관하여 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서는 여러 가지 변형이 가능함은 물론이다. 그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허 청구의 범위뿐 아니라 이 특허 청구의 범위와 균등한 것들에 의해 정해져야 한다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the scope of the appended claims and equivalents thereof.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, Modification is possible. Accordingly, the spirit of the present invention should be understood only in accordance with the following claims, and all equivalents or equivalent variations thereof are included in the scope of the present invention.

100: 한국어 철자 검사 장치
110: 단어 데이터베이스
120: 문장 성분 데이터베이스
130: 문장 데이터베이스
140: 단어 분석부
150: 어절 분석부
160: 문장 분석부100: Korean spelling checker
110: Word database
120: Sentence composition database
130: sentence database
140: word analysis section
150:
160: sentence analysis unit

Claims

A spell checking method using a Korean spelling checker,
Reading the text file and analyzing the word of the text in the text file with reference to the word database to check spelling;
Analyzing a word of the text analyzed by using a sentence component database and checking the spelling; And
Analyzing the sentence of the text analyzed with reference to the sentence database to check the meaning of the sentence,
Reading the text file and analyzing a word of the text with reference to a word database,
Analyzing the text in a part-of-speech unit including a noun, a pronoun, a verb, an adjective, an observer, an investigation, an adverb, an investigation, and an exclamation point to generate a plurality of words;
Comparing each of the plurality of words with a word stored in the word database in advance to check whether each of the plurality of words is spelled out; And
And recommending correction of the word to a word stored in advance in the word database according to the result of the check,
Analyzing the word of the word and analyzing the spelling,
Dividing the text into spaces and generating a plurality of sentence components; And
Determining a type of each of the plurality of sentence components as any one of a subcomponent including a predicate, an annotated word and an adjective including a predicate, an object and a bore, and an independent element including an independent word,
Wherein the method comprises the steps of:

delete

The method according to claim 1,
The step of determining the type of each of the plurality of sentence components as one of a main component, an ancillary component, and an independent component
Comparing a spelling of the sentence component with a spelling of a sentence component previously stored in the sentence component database to check whether the spelling component is spelled out if there is a character component whose type is not determined among the plurality of sentence components; And
And recommending spelling correction of the sentence component with spelling of sentence components stored in advance in the sentence component database according to the result of the check
How to check spelling in Korean.

The method according to claim 1,
Analyzing the sentence of the text analyzed by referring to the sentence database and checking the spelling
Confirming whether or not a sub-sentence combining at least two sentence components of the plurality of sentence components is present in a sentence previously stored in the sentence database;
And recommending correction of a sub-sentence combining the at least two sentence components with reference to a sentence previously stored in the sentence database according to the result of the check
How to check spelling in Korean.

A word database in which a plurality of standard words are stored;
A sentence component database storing a plurality of sentence components;
A sentence database storing a plurality of sentences;
A word analysis unit for reading the text file and analyzing the word of the text in the text file with reference to the word database to check spelling;
A word analyzer for analyzing a word of the text analyzed by the word analyzer using the sentence component database to check spelling;
And a sentence analyzing unit for analyzing the sentence of the text analyzed by the word analyzing unit with reference to the sentence database to check the meaning of the sentence,
The word analyzing unit,
A plurality of words are generated by analyzing the text in a part-of-speech unit including a noun, a pronoun, a verb, an adjective, an observer, an investigation, an adverb, an investigation and an exclamation; Checking whether each of the plurality of words is misspelled, recommending correction of the word to a word stored in advance in the word database according to the result of the check,
The phrase analyzing unit,
The method includes generating a plurality of sentence components by dividing the text into a plurality of spacing units and assigning a kind of each of the plurality of sentence components and including an auxiliary component including a principal component, Which is determined by one of the independent components.

delete

The method according to claim 6,
The phrase analysis unit
Comparing the spelling of the sentence component with a spelling of a sentence component previously stored in the sentence component database to check whether or not the spelling component is spelled out if there is a character component whose type is not determined among the plurality of sentence components, And the spelling correction of the corresponding sentence component is recommended with the spelling of the sentence component previously stored in the sentence component database according to the result
Korean spelling checker.

The method according to claim 6,
The sentence analysis unit
The method comprising: confirming whether or not a sub-sentence combining at least two sentence components of the plurality of sentence components is present in a sentence previously stored in the sentence database; Recommends correction of sub-sentences combining sentence components
Korean spelling checker.