KR102604758B1

KR102604758B1 - System and Method for Spell Checking using User Information

Info

Publication number: KR102604758B1
Application number: KR1020200176593A
Authority: KR
Inventors: 권혁철; 김민호
Original assignee: 부산대학교 산학협력단
Priority date: 2019-12-16
Filing date: 2020-12-16
Publication date: 2023-11-22
Also published as: KR20210076877A

Abstract

본 발명은 인명, 직명, 기관명 등과 같은 사용자 정보를 이용하여 형태분석 사전에 없는 미등록어를 효율적으로 검사할 수 있도록 한 사용자 정보를 이용한 맞춤법 검사 시스템 및 방법에 관한 것으로, 명사, 동사, 형용사와 같은 일반적인 어휘를 기반으로 오류 여부를 판단하는 일반사전 기반 분석부;데이터베이스로부터 인명, 직명, 기관명과 같은 사용자 정보를 획득하여 이를 사전화하는 사용자 사전 생성부;사용자 사전을 기반으로 오류 여부를 판단하는 사용자 사전 기반 분석부;를 포함하는 것이다.The present invention relates to a spelling check system and method using user information that enables efficient checking of unregistered words that are not in the morphological analysis dictionary using user information such as person name, job title, name of organization, etc., such as nouns, verbs, and adjectives. A general dictionary-based analysis unit that determines whether there is an error based on a general vocabulary; a user dictionary creation unit that acquires user information such as a person's name, job title, or organization name from the database and dictionaries it; a user who determines whether there is an error based on the user dictionary It includes a dictionary-based analysis unit.

Description

{System and Method for Spell Checking using User Information}

본 발명은 맞춤법 검사에 관한 것으로, 구체적으로 인명, 직명, 기관명 등과 같은 사용자 정보를 이용하여 형태분석 사전에 없는 미등록어를 효율적으로 검사할 수 있도록 한 사용자 정보를 이용한 맞춤법 검사 시스템 및 방법에 관한 것이다.The present invention relates to spelling checking, and specifically relates to a spelling checking system and method using user information that enables efficient checking of unregistered words that are not in the morphological analysis dictionary using user information such as person's name, job title, organization name, etc. .

일반적으로 맞춤법 검사기란 좁게는 단순한 철자 및 문법의 교정에 필요한 정보를 제공하는 것을 의미한다.In general, a spell checker narrowly means providing information necessary for simple correction of spelling and grammar.

더 넓은 범위로는 문서, 혹은 문장의 효과적인 작성을 지원하기 위해서 문체상 오류를 검사하는 기능을 하며, 순화 용어를 제공하고, 구두점 등 문장부호의 사용에 대한 적합성을 알려주며, 오류에 대한 학습 정보와, 오류의 빈도나 강도에 따른 교정된 정보를 제공하고, 단어의 사용 용례 등을 함께 제공하는 것을 말한다.In a broader scope, it functions to check for stylistic errors in order to support the effective writing of documents or sentences, provides purified terms, informs the appropriateness of use of punctuation marks such as punctuation marks, and provides learning information and information about errors. , refers to providing corrected information according to the frequency or intensity of errors, along with examples of word usage, etc.

이와 같은 맞춤법 검사기는 사용자가 입력한 문장이나 작성된 문서 내에 철자오류와 같은 잘못된 단어가 있는지를 자동으로 검사하고 교정하는 시스템이다. This type of spell checker is a system that automatically checks and corrects incorrect words such as spelling errors in sentences entered by the user or in written documents.

일반적으로 맞춤법 검사는 검사 대상을 형태소나 어절 단위의 토큰으로 자른 다음 해당 토큰이 사전(dictionary)에 존재하지 않으면 오류어로 판단한다.In general, spelling checks break the test target into morphemes or word-level tokens, and then determine it as an error word if the token does not exist in the dictionary.

그러므로 맞춤법 검사기의 사전에 존재하지 않는 신조어나 고유명사와 같은 미등록어는 오류어로 판단된다.Therefore, unregistered words such as new words or proper nouns that do not exist in the spelling checker's dictionary are judged as error words.

따라서, 맞춤법 검사기의 사전에 존재하지 않는 신조어나 고유명사와 같은 미등록어의 맞춤법 검사가 효율적으로 이루어지도록 한 새로운 기술의 개발이 요구되고 있다.Accordingly, there is a need for the development of new technology to efficiently check the spelling of unregistered words such as new words or proper nouns that do not exist in the dictionary of the spelling checker.

대한민국 공개특허 제10-2009-0090840호Republic of Korea Patent Publication No. 10-2009-0090840 대한민국 공개특허 제10-2019-0129701호Republic of Korea Patent Publication No. 10-2019-0129701 대한민국 공개특허 제10-2005-0026732호Republic of Korea Patent Publication No. 10-2005-0026732

본 발명은 종래 기술의 맞춤법 검사 시스템의 문제점을 해결하기 위한 것으로, 인명, 직명, 기관명 등과 같은 사용자 정보를 이용하여 형태분석 사전에 없는 미등록어를 효율적으로 검사할 수 있도록 한 사용자 정보를 이용한 맞춤법 검사 시스템 및 방법을 제공하는데 그 목적이 있다.The present invention is intended to solve the problems of the spelling check system of the prior art, and is a spelling test using user information that allows unregistered words that are not in the morphological analysis dictionary to be efficiently checked using user information such as a person's name, job title, or organization name. The purpose is to provide systems and methods.

본 발명은 인명, 직명, 기관명과 같은 고유명사가 오류어로 판단되지 않도록 사용자 정보에서 추출한 고유명사를 이용한 맞춤법 검사로 정확도를 높인 사용자 정보를 이용한 맞춤법 검사 시스템 및 방법을 제공하는데 그 목적이 있다.The purpose of the present invention is to provide a spelling check system and method using user information that improves accuracy by checking spelling using proper nouns extracted from user information so that proper nouns such as personal names, job titles, and organization names are not judged as incorrect words.

본 발명은 사용자 정보를 포함하고 있는 사용자 데이터베이스로부터 자동으로 생성한 사용사 사전을 이용하는 맞춤법 검사를 통하여 검사 정확도를 높일 수 있도록 한 사용자 정보를 이용한 맞춤법 검사 시스템 및 방법을 제공하는데 그 목적이 있다.The purpose of the present invention is to provide a spelling check system and method using user information that can increase test accuracy through a spelling check using a dictionary automatically generated from a user database containing user information.

본 발명의 다른 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Other objects of the present invention are not limited to the objects mentioned above, and other objects not mentioned will be clearly understood by those skilled in the art from the description below.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 시스템은 명사, 동사, 형용사와 같은 일반적인 어휘를 기반으로 오류 여부를 판단하는 일반사전 기반 분석부;데이터베이스로부터 인명, 직명, 기관명과 같은 사용자 정보를 획득하여 이를 사전화하는 사용자 사전 생성부;사용자 사전을 기반으로 오류 여부를 판단하는 사용자 사전 기반 분석부;를 포함하는 것을 특징으로 한다.A spelling check system using user information according to the present invention to achieve the above purpose includes a general dictionary-based analysis unit that determines errors based on general vocabulary such as nouns, verbs, and adjectives; a person's name, job title, and organization name from the database; It is characterized by including a user dictionary creation unit that acquires user information such as and dictionaries it; a user dictionary-based analysis unit that determines whether there is an error based on the user dictionary.

여기서, 상기 일반사전 기반 분석부는, 개별 단어가 형태론적 또는 통사론적 특성에 따라 구별되는 식별번호를 가진 일반사전을 사용하는 것을 특징으로 한다.Here, the general dictionary-based analysis unit is characterized by using a general dictionary in which individual words have identification numbers that are distinguished according to morphological or syntactic characteristics.

그리고 상기 사용자 사전 생성부는, 데이터베이스로부터 사용자 정보를 자동으로 추출하고 이를 사전화하는 것을 특징으로 한다.And the user dictionary creation unit is characterized in that it automatically extracts user information from the database and dictionaries it.

그리고 상기 사용자 사전 기반 분석부는, 사용자가 수동으로 생성한 사용자 사전과 데이터베이스로부터 자동으로 생성된 사용자 사전을 모두 이용하는 것을 특징으로 한다.And the user dictionary-based analysis unit is characterized by using both a user dictionary manually created by the user and a user dictionary automatically created from a database.

그리고 사용자 사전 생성부에서 해당 테이블을 읽어 들인 다음, 각 칸에 들어있는 값을 일반사전 기반 분석부를 이용하여 분석하고, 미등록어로 나오는 단어를 모두 사용자 사전에 추가하는 것을 특징으로 한다.After reading the table from the user dictionary creation unit, the values in each column are analyzed using a general dictionary-based analysis unit, and all words appearing in unregistered languages are added to the user dictionary.

다른 목적을 달성하기 위한 본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 방법은 검색 대상 문장이 들어오면 띄어쓰기 단위인 어절로 토크나이징하는 단계;일반사전 기반 분석부에서 일반사전을 이용하여 각 어절을 형태소 단위로 분석하고, 만약 분석이 실패하면 오류어로 판단하는 단계;사용자 사전 기반 분석부에서 일반사전 기반 분석부에 의해 오류로 판단된 단어를 대상으로 사용자 사전에 기반하여 분석하는 단계;사용자 사전 생성부에서 사용자 데이터베이스를 이용하여 사용자 사전을 자동으로 생성하는 단계;를 포함하는 것을 특징으로 한다.The spelling check method using user information according to the present invention to achieve another purpose includes the steps of tokenizing a search target sentence into a word that is a spacing unit; using a general dictionary in the general dictionary-based analysis unit to convert each word into a morpheme. Analyzing as a unit, and if analysis fails, determining an error word; Step of analyzing words judged as errors by the general dictionary-based analysis unit in the user dictionary-based analysis unit based on the user dictionary; User dictionary generation unit It is characterized in that it includes a step of automatically generating a user dictionary using a user database.

여기서, 사용자 사전 생성부에서 해당 테이블을 읽어 들인 다음, 각 칸에 들어있는 값을 일반사전 기반 분석부를 이용하여 분석하고, 미등록어로 나오는 단어를 모두 사용자 사전에 추가하는 것을 특징으로 한다.Here, the user dictionary creation unit reads the table, analyzes the values in each column using a general dictionary-based analysis unit, and adds all words that appear in unregistered languages to the user dictionary.

이상에서 설명한 바와 같은 본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 시스템 및 방법은 다음과 같은 효과가 있다.As described above, the spell checking system and method using user information according to the present invention has the following effects.

첫째, 인명, 직명, 기관명 등과 같은 사용자 정보를 이용하여 형태분석 사전에 없는 미등록어를 효율적으로 검사할 수 있도록 한다.First, it allows efficient inspection of unregistered words that are not in the morphological analysis dictionary by using user information such as person name, job title, and organization name.

둘째, 인명, 직명, 기관명과 같은 고유명사가 오류어로 판단되지 않도록 사용자 정보에서 추출한 고유명사를 이용한 맞춤법 검사로 정확도를 높인다.Second, accuracy is improved by checking spelling using proper nouns extracted from user information to prevent proper nouns such as personal names, job titles, and organization names from being judged as incorrect words.

셋째, 사용자 정보를 포함하고 있는 사용자 데이터베이스로부터 자동으로 생성한 사용사 사전을 이용하는 맞춤법 검사를 통하여 검사 정확도를 높일 수 있도록 한다.Third, the accuracy of the test can be improved through a spell check using a dictionary automatically created from a user database containing user information.

도 1은 본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 시스템 구성도
도 2는 인명, 직명, 기관명을 포함하고 있는 사용자 데이터베이스의 예시도
도 3은 본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 방법을 나타낸 플로우 차트1 is a diagram illustrating the configuration of a spelling check system using user information according to the present invention.
Figure 2 is an example of a user database containing people's names, job titles, and organization names.
Figure 3 is a flow chart showing a spelling check method using user information according to the present invention.

이하, 본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 시스템 및 방법의 바람직한 실시 예에 관하여 상세히 설명하면 다음과 같다.Hereinafter, a preferred embodiment of the spelling check system and method using user information according to the present invention will be described in detail as follows.

본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 시스템 및 방법의 특징 및 이점들은 이하에서의 각 실시 예에 대한 상세한 설명을 통해 명백해질 것이다.The features and advantages of the spelling check system and method using user information according to the present invention will become apparent through the detailed description of each embodiment below.

도 1은 본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 시스템 구성도이고, 도 2는 인명, 직명, 기관명을 포함하고 있는 사용자 데이터베이스의 예시도이다.Figure 1 is a diagram showing the configuration of a spelling check system using user information according to the present invention, and Figure 2 is an example diagram of a user database containing a person's name, job title, and organization name.

본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 시스템 및 방법은 인명, 직명, 기관명 등과 같은 사용자 정보를 이용하여 형태분석 사전에 없는 미등록어를 효율적으로 검사할 수 있도록 한 것이다.The spelling check system and method using user information according to the present invention enables efficient checking of unregistered words that do not exist in the morphological analysis dictionary by using user information such as person name, job title, organization name, etc.

이를 위하여 본 발명은 인명, 직명, 기관명과 같은 고유명사가 오류어로 판단되지 않도록 사용자 정보에서 추출한 고유명사를 이용한 맞춤법 검사를 하는 구성을 포함할 수 있다.To this end, the present invention may include a configuration that performs a spell check using proper nouns extracted from user information to prevent proper nouns such as personal names, job titles, and organization names from being judged as incorrect words.

본 발명은 사용자 사전 생성부를 통하여 데이터베이스로부터 사용자 정보를 자동으로 추출하고 이를 사전화하는 구성을 포함할 수 있다.The present invention may include a configuration that automatically extracts user information from a database and dictionaries it through a user dictionary creation unit.

본 발명은 사용자 사전 기반 분석부를 통하여 사용자가 수동으로 생성한 사용자 사전과 데이터베이스로부터 자동으로 생성된 사용자 사전을 모두 이용하는 구성을 포함할 수 있다.The present invention may include a configuration that uses both a user dictionary manually created by a user and a user dictionary automatically generated from a database through a user dictionary-based analysis unit.

본 발명은 사용자 사전 생성부에서 해당 테이블을 읽어 들인 다음, 각 칸에 들어있는 값을 일반사전 기반 분석부를 이용하여 분석하고, 미등록어로 나오는 단어를 모두 사용자 사전에 추가하는 구성을 포함할 수 있다.The present invention may include a configuration that reads the table from the user dictionary creation unit, analyzes the values in each column using a general dictionary-based analysis unit, and adds all words appearing in unregistered languages to the user dictionary.

도1은 본 발명에 사용자 정보를 포함하고 있는 사용자 데이터베이스로부터 자동으로 생성한 사용사 사전을 이용하는 맞춤법 검사 시스템의 구성도이다.Figure 1 is a configuration diagram of a spelling check system using a dictionary automatically created from a user database containing user information in the present invention.

본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 시스템은 검색 대상 문장이 들어오면 띄어쓰기 단위인 어절로 토크나이징하는 토크나이징부(100)와, 명사, 동사, 형용사와 같은 일반적인 어휘를 기반으로 오류 여부를 판단하는 일반사전 기반 분석부(101)와, 데이터베이스로부터 인명, 직명, 기관명과 같은 사용자 정보를 획득하여 이를 사전화하는 사용자 사전 생성부(102)와, 사용자 사전을 기반으로 오류 여부를 판단하는 사용자 사전 기반 분석부(103)를 포함한다.The spelling check system using user information according to the present invention includes a tokenizing unit 100 that tokenizes a search target sentence into words that are spacing units, and checks for errors based on general vocabulary such as nouns, verbs, and adjectives. A general dictionary-based analysis unit 101 that makes judgments, a user dictionary creation unit 102 that acquires user information such as people's names, job titles, and organization names from a database and dictionaries, and a user that determines whether there is an error based on the user dictionary. It includes a dictionary-based analysis unit 103.

여기서, 상기 일반사전 기반 분석부(101)는 개별 단어가 형태론적 또는 통사론적 특성에 따라 구별되는 식별번호를 가진 일반사전을 사용하는 것이다.Here, the general dictionary-based analysis unit 101 uses a general dictionary in which individual words have identification numbers distinguished according to morphological or syntactic characteristics.

그리고 사용자 사전 생성부(102)는 데이터베이스로부터 사용자 정보를 자동으로 추출하고 이를 사전화하는 것이다.And the user dictionary creation unit 102 automatically extracts user information from the database and dictionaries it.

도 2는 인명, 직명, 기관명을 포함하고 있는 사용자 데이터베이스의 예시도이다.Figure 2 is an example diagram of a user database containing people's names, job titles, and organization names.

사용자 사전 생성부(102)는 해당 테이블을 읽어 들인 다음, 각 칸에 들어있는 값을 일반사전 기반 분석부(101)를 이용하여 분석하고, 미등록어로 나오는 단어를 모두 사용자 사전에 추가한다.The user dictionary creation unit 102 reads the table, analyzes the values in each column using the general dictionary-based analysis unit 101, and adds all words appearing in unregistered languages to the user dictionary.

그리고 사용자 사전 기반 분석부(103)는 사용자가 수동으로 생성한 사용자 사전과 데이터베이스로부터 자동으로 생성된 사용자 사전을 모두 이용하는 것이다.And the user dictionary-based analysis unit 103 uses both the user dictionary manually created by the user and the user dictionary automatically created from the database.

본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 방법을 구체적으로 설명하면 다음과 같다.The spelling check method using user information according to the present invention will be described in detail as follows.

도 3은 본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 방법을 나타낸 플로우 차트이다.Figure 3 is a flow chart showing a spelling check method using user information according to the present invention.

먼저, 검색 대상 문장이 들어오면 토크나이징부(100)에서 띄어쓰기 단위인 어절로 토크나이징한다.(S301)First, when a search target sentence is entered, the tokenizing unit 100 tokenizes it into words, which are spacing units (S301).

이어, 일반사전 기반 분석부(101)는 일반사전을 이용하여 각 어절을 형태소 단위로 분석하고, 만약 분석이 실패하면 오류어로 판단한다.(S302)Next, the general dictionary-based analysis unit 101 analyzes each word by morpheme using a general dictionary, and if the analysis fails, it is judged as an error word (S302).

예를 들어, '나는'이라는 어절은 '나 + 는', '나다 + 는', '날다 + 는'의 형태소 결합으로 분석될 수 있으므로 오류가 아닌다.For example, the word 'I' can be analyzed as a morpheme combination of 'I + is', 'I am + is', and 'fly + is', so it is not an error.

반면에 '김민호는'이라는 어절은 일반사전에 '김민호'라는 형태소가 없다면 분석이 될 수 없으므로 오류로 판단한다.On the other hand, the word ‘Kim Min-ho is’ cannot be analyzed if there is no morpheme ‘Kim Min-ho’ in the general dictionary, so it is judged to be an error.

그리고 사용자 사전 기반 분석부(103)는 일반사전 기반 분석부(101)에 의해 오류로 판단된 단어를 대상으로 사용자 사전에 기반하여 분석한다.(303)Then, the user dictionary-based analysis unit 103 analyzes words determined to be errors by the general dictionary-based analysis unit 101 based on the user dictionary (303).

예를 들어, '김민호는'은 일반사전 기반 분석부(101)에 의하여 오류였지만, 사용자 사전에 '김민호'라는 형태소가 들어있다면 이를 바탕으로 '김민호 + 는'이라고 분석을 할 수 있다.For example, 'Kim Min-ho is' was an error by the general dictionary-based analysis unit 101, but if the morpheme 'Kim Min-ho' is included in the user dictionary, it can be analyzed as 'Kim Min-ho + is' based on this.

사용자 사전 생성부(102)는 사용자 데이터베이스를 이용하여 사용자 사전을 자동으로 생성한다.(304)The user dictionary creation unit 102 automatically creates a user dictionary using the user database (304).

이상에서 설명한 본 발명에 따른 사용자 정보를 이용한 맞춤법 검사 시스템 및 방법은 인명, 직명, 기관명과 같은 고유명사가 오류어로 판단되지 않도록 사용자 정보에서 추출한 고유명사를 이용한 맞춤법 검사를 하는 것으로, 사용자 정보를 이용하여 형태분석 사전에 없는 미등록어를 효율적으로 검사할 수 있도록 한 것이다.The spelling check system and method using user information according to the present invention described above performs a spell check using proper nouns extracted from user information to prevent proper nouns such as personal names, job titles, and organization names from being judged as incorrect words, using user information. This makes it possible to efficiently inspect unregistered words that are not in the morphological analysis dictionary.

이상에서의 설명에서와 같이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 본 발명이 구현되어 있음을 이해할 수 있을 것이다.As described above, it will be understood that the present invention is implemented in a modified form without departing from the essential characteristics of the present invention.

그러므로 명시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 하고, 본 발명의 범위는 전술한 설명이 아니라 특허청구 범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.Therefore, the specified embodiments should be considered from an illustrative rather than a limiting point of view, the scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the equivalent scope are intended to be included in the present invention. It will have to be interpreted.

101. 일반사전 기반 분석부
102. 사용자 사전 생성부
103. 사용자 사전 기반 분석부101. General dictionary-based analysis department
102. User dictionary creation unit
103. User dictionary-based analysis unit

Claims

A general dictionary-based analysis unit that determines errors based on general vocabulary such as nouns, verbs, and adjectives;
a user dictionary creation unit that obtains user information such as a person's name, job title, and organization name from a database and dictionaries it;
It includes a user dictionary-based analysis unit that determines whether there is an error based on the user dictionary,
The user dictionary creation unit automatically extracts user information from the database, dictionaries it, reads the corresponding table from the user dictionary creation unit, and then analyzes the values in each column using a general dictionary-based analysis unit, and translates them into unregistered words. A spelling check system using user information, characterized by adding all appearing words to the user dictionary.

The method of claim 1, wherein the general dictionary-based analysis unit,
A spelling check system using user information, characterized by using a general dictionary in which individual words have identification numbers that are distinguished according to morphological or syntactic characteristics.

delete

The method of claim 1, wherein the user dictionary-based analysis unit,
A spelling check system using user information, characterized by using both a user dictionary manually created by the user and a user dictionary automatically created from a database.

delete

When a search target sentence is entered, tokenizing it into words that are spacing units in a tokenizing unit;
A step of analyzing each word in units of morphemes using a general dictionary in the general dictionary-based analysis unit, and determining it as an error word if the analysis fails;
Analyzing words determined to be errors by the general dictionary-based analysis unit in the user dictionary-based analysis unit based on the user dictionary;
Including a step of automatically creating a user dictionary using a user database in a user dictionary creation unit,
Spelling check using user information, characterized by reading the corresponding table from the user dictionary creation unit, analyzing the values in each column using a general dictionary-based analysis unit, and adding all words that appear in unregistered languages to the user dictionary. method.

delete