KR20210055175A

KR20210055175A - Apparatus and method for measuring difficulty level of chinese character using regression analysis

Info

Publication number: KR20210055175A
Application number: KR1020190141339A
Authority: KR
Inventors: 김순태; 최정환; 노지우
Original assignee: 전북대학교산학협력단; (주)빅스톤하우스
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2021-05-17
Also published as: KR102321045B1

Abstract

Disclosed are a Chinese character difficulty level measuring device using regression analysis and a method thereof. According to an embodiment of the present invention, the Chinese character difficulty level measuring device using regression analysis comprises the following steps of: preparing a first data set about a Chinese character usage frequency from sample text data about each of predetermined Chinese characters; preparing a second data set for a Korean character meaning frequency from the sample text data about each of the predetermined Chinese characters; and measuring a difficulty level of each of the predetermined Chinese characters through multiple regression analysis based on the first data set, the second data set, and stroke count information of each of the predetermined Chinese characters.

Description

A device and method for measuring Chinese character difficulty using regression analysis {APPARATUS AND METHOD FOR MEASURING DIFFICULTY LEVEL OF CHINESE CHARACTER USING REGRESSION ANALYSIS}

본원은 회귀 분석을 이용한 한자 난이도 측정 장치 및 방법에 관한 것이다.The present application relates to an apparatus and method for measuring difficulty of Chinese characters using regression analysis.

현재 국가공인 한자 검정시험을 주최하는 기관은 한국어문회, 한자교육진흥회, 대한검정회 등을 포함한 총 9개가 존재한다. 그러나 한자 검정시험을 주최하는 곳마다 급수별 한자가 다르게 배정되어 있으며, 그 배정 기준이 불분명하거나 아예 공개되지 않은 경우도 있다. 이러한 한자 급수의 경우, 보통 8급에서 1급까지 분류되며, 1급으로 갈수록 어려운 한자가 포함되는 방식을 택한다.Currently, there are nine institutions that host the nationally recognized Chinese character test, including the Korean Language Literature Association, the Chinese Character Education Promotion Association, and the Korean Language Proficiency Test Association. However, different Chinese characters are assigned for each series of places that host the Chinese character test, and there are cases where the criteria for assignment are unclear or have not been disclosed at all. In the case of such a series of Chinese characters, they are usually classified from level 8 to level 1, and the more difficult the Chinese characters are included in level 1.

구체적으로, 한국어문회가 배정한 급수별 한자를 살펴보면, 한자에 처음 입문하는 사람들을 위한 가장 낮은 난이도의 급수인 8급의 한자 배정 수준은 '한자 학습 동기 부여를 위한 급수(상용한자 50자)'라고 명시되어 있을 뿐이며, 배정된 한자들을 살펴보면 초보자가 학습하기에 다소 난이도가 있는 한자를 포함하는 경우도 쉽게 찾아볼 수 있다.Specifically, looking at the Chinese characters for each grade assigned by the Korean Literature, the level of Chinese character allocation at level 8, which is the lowest level of difficulty for those who are new to Chinese characters, is called'a series for motivating Chinese character learning (50 common Chinese characters). It is only specified, and if you look at the assigned kanji, you can easily find cases that contain kanji that are somewhat difficult for beginners to learn.

또한, 대한 상공회의소에서 제공하는 한자 검정기준표를 보면 9급에서 5급까지는 초등학생을 대상으로 한자가 배정되며, 9급의 경우, '초등학교 1학년 수준의 일상적인 한자어가 사용된 국한혼용의 문장을 어느 정도 읽고 이해할 수 있는 한자 능력 수준'이라고 쓰여있으나 실제로 초등학교 1학년에서 해당 한자들이 얼마나 자주 쓰이는지에 대하서는 확인하기 어렵다.Also, looking at the kanji test table provided by the Korea Chamber of Commerce and Industry, Chinese characters are allocated for elementary school students in grades 9 to 5, and in the case of grade 9, ‘Korean-Korean mixed sentences using ordinary Chinese characters at the level of the first grade of elementary school are used. It is written that the level of Chinese character ability to read and understand to a certain extent', but it is difficult to confirm how often the corresponding Chinese characters are used in the first grade of elementary school.

한자는 표어문자로 낱자, 즉 하나의 문자에도 각각의 뜻과 음을 포함하며, 이러한 특성에 의해 한자어를 읽고 그 뜻을 정확히 이해하기 위해서는 낱자 하나하나의 뜻을 인지할 것이 필수적으로 요구된다. 또한, 한자의 뜻(의미)로 사용되는 표현과 실생활에서 일반적으로 쓰이는 한글 단어 사이에 편차가 존재하는데, 예를 들어, 대한상공회의소가 선정한 8급 배정 한자에서의 '쇠 금(金)' 의 뜻인 쇠의 경우 초등학교 교과서에서 거의 출현하지 않으며, '저자 시(市)'의 경우에도 '저자'는 시장, 저잣거리와 같은 의미를 가지나 초등학교 교과서에서는 거의 등장하지 않는다.Chinese characters are slogans, and each character, that is, each character contains its meaning and sound, and by this characteristic, it is essential to recognize the meaning of each character in order to read a Chinese character and understand its meaning accurately. In addition, there are differences between expressions used as meanings (meanings) of Chinese characters and Korean words commonly used in real life. For example, the meaning of'metal gold (金)' in the 8th class assigned Chinese characters selected by the Korea Chamber of Commerce and Industry. The meaning of Soo rarely appears in elementary school textbooks, and even in the case of'author city','author' has the same meaning as the mayor and the street, but rarely appears in elementary school textbooks.

이렇듯, 종래의 국가공인 한자 검정시험을 주최하는 기관에서 제공하는 급수별 한자들에 대한 난이도는 객관적인 기준으로 평가된 것으로 보기 어려우며, 한자의 난이도를 객관적인 지표를 기준으로 평가할 수 있는 기술은 전무한 실정이다.As such, it is difficult to say that the difficulty of Chinese characters for each series provided by the institution hosting the conventional nationally recognized Chinese character test is evaluated on an objective basis, and there is no technology that can evaluate the difficulty of Chinese characters on the basis of objective indicators. .

본원의 배경이 되는 기술은 한국등록특허공보 제10-0779022호에 개시되어 있다.The technology behind the present application is disclosed in Korean Patent Publication No. 10-0779022.

본원은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 초등학교 교과서 등 대상자의 학습 수준을 반영할 수 있는 표본 텍스트 데이터로부터 수집 및 추출된 빅 데이터(데이터 세트)에 기반하여 다중 선형 회귀 분석을 이용하여 소정의 한자 각각에 대한 난이도를 객관적으로 제공할 수 있는 회귀 분석을 이용한 한자 난이도 측정 장치 및 방법을 제공하려는 것을 목적으로 한다.The present application is to solve the problems of the prior art described above, using multiple linear regression analysis based on big data (data set) collected and extracted from sample text data that can reflect the learning level of the subject, such as elementary school textbooks. An object of the present invention is to provide an apparatus and method for measuring difficulty of Chinese characters using regression analysis capable of objectively providing the difficulty level for each predetermined Chinese character.

다만, 본원의 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the embodiments of the present application is not limited to the technical problems as described above, and other technical problems may exist.

상기한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 방법은, 소정의 한자 각각에 대하여, 표본 텍스트 데이터로부터 한자 활용 빈도에 대한 제1 데이터 세트를 준비하는 단계, 상기 소정의 한자 각각에 대하여, 상기 표본 텍스트 데이터로부터 한글 의미 빈도에 대한 제2 데이터 세트를 준비하는 단계 및 상기 제1 데이터 세트, 상기 제2 데이터 세트 및 상기 소정의 한자 각각의 획수 정보에 기초하여 다중 회귀 분석을 통해 상기 소정의 한자 각각의 난이도를 측정하는 단계를 포함할 수 있다.As a technical means for achieving the above-described technical problem, the method of measuring Chinese character difficulty using regression analysis according to an embodiment of the present application includes, for each predetermined Chinese character, a first data set for the frequency of using Chinese characters from sample text data. Preparing, for each of the predetermined Chinese characters, preparing a second data set for the Hangul meaning frequency from the sample text data, and the number of strokes of each of the first data set, the second data set, and the predetermined Chinese characters It may include the step of measuring the difficulty of each of the predetermined Chinese characters through multiple regression analysis on the basis of the information.

또한, 상기 제1 데이터 세트를 준비하는 단계는, 상기 표본 텍스트 데이터에 등장하는 문장을 수집하는 단계, 상기 문장 각각에서 한자 단어에 해당하는 한글 부분에 대응되는 한자 표기를 병기하는 단계 및 상기 한자 표기가 병기된 문장과 상기 소정의 한자 각각을 매핑하여 상기 한자 활용 빈도를 카운트하는 단계를 포함할 수 있다.In addition, the preparing of the first data set may include collecting sentences appearing in the sample text data, stating a Chinese character notation corresponding to a Hangul part corresponding to a Chinese character word in each of the sentences, and the Chinese character notation. It may include the step of counting the frequency of use of the Chinese characters by mapping each of the sentences in which a is added and each of the predetermined Chinese characters.

또한, 상기 제2 데이터 세트를 준비하는 단계는, 수집된 상기 문장을 토큰화하는 단계, 상기 토큰화를 통해 분절된 단어들을 어근화하는 단계 및 상기 어근화를 통해 변환된 어근과 상기 소정의 한자 각각을 매핑하여 상기 한글 의미 빈도를 카운트하는 단계를 포함할 수 있다.In addition, the preparing of the second data set includes tokenizing the collected sentences, rooting words segmented through the tokenization, and rooting converted through the rooting and the predetermined Chinese character. It may include the step of counting the Hangul semantic frequency by mapping each.

또한, 상기 한글 의미 빈도를 카운트하는 단계는, 상기 변환된 어근에 포함된 본용언에 기초하여 상기 한글 의미 빈도를 카운트할 수 있다.In addition, in the counting the Hangul semantic frequency, the Hangul semantic frequency may be counted based on the original word included in the converted root.

또한, 상기 한글 의미 빈도를 카운트하는 단계는, 상기 소정의 한자 각각의 유의어를 고려하여 상기 매핑을 수행할 수 있다.In addition, the counting the Hangul semantic frequency may perform the mapping in consideration of the synonyms of each of the predetermined Chinese characters.

또한, 상기 표본 텍스트 데이터는 학년별 초등 교과서를 포함하고, 상기 제1 데이터 세트 및 상기 제2 데이터 세트는 상기 학년별로 구분되어 준비되는 것일 수 있다.In addition, the sample text data may include elementary textbooks for each grade, and the first data set and the second data set may be divided and prepared for each grade.

또한, 상기 소정의 한자 각각의 난이도를 측정하는 단계는, 상기 소정의 한자 각각에 대한 적정 학습 시기 정보를 수신하는 경우, 상기 적정 학습 시기 정보를 더 고려하여 상기 난이도를 측정할 수 있다.In addition, in the step of measuring the difficulty level of each of the predetermined Chinese characters, when receiving information about an appropriate learning time for each of the predetermined Chinese characters, the difficulty level may be measured by further considering the appropriate learning time information.

또한, 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 방법은, 상기 소정의 한자 각각의 난이도에 기초하여 한자로 이루어진 단어의 난이도를 측정하는 단계를 포함할 수 있다.In addition, the method of measuring the difficulty of Chinese characters using regression analysis according to an exemplary embodiment of the present disclosure may include measuring the difficulty of words composed of Chinese characters based on the difficulty of each of the predetermined Chinese characters.

또한, 상기 단어의 난이도를 측정하는 단계는, 상기 단어를 이루는 각각의 한자에 대하여 측정된 상기 난이도의 평균값을 상기 단어의 난이도로 결정할 수 있다.In addition, in measuring the difficulty of the word, the average value of the difficulty measured for each Chinese character constituting the word may be determined as the difficulty of the word.

또한, 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 방법은, 상기 소정의 한자 각각의 난이도 및 상기 한자 활용 빈도에 기초하여 상기 소정의 한자 중 일부를 소정의 학년에 대응하는 학습대상 한자로 분류하는 단계를 포함할 수 있다.In addition, the method of measuring Chinese character difficulty using regression analysis according to an embodiment of the present application includes some of the predetermined Chinese characters corresponding to a predetermined grade based on the difficulty level of each of the predetermined Chinese characters and the frequency of use of the Chinese characters. It may include the step of classifying as.

또한, 상기 학습대상 한자로 분류하는 단계는, 상기 제1 데이터 세트에 기초하여 상기 소정의 한자 중 전체 학년에서의 상기 한자 활용 빈도가 기 설정된 임계 빈도 이상인 한자를 필터링하는 단계 및 상기 필터링된 한자 중 상기 학년별로 준비된 상기 제1 데이터 세트에 기초하여 해당 학년에서의 상기 한자 활용 빈도가 기 설정된 학년별 임계 빈도 이상이고, 해당 학년에 대하여 기 설정된 난이도 범위에 속하는 난이도를 가지는 한자를 해당 학년에 대한 상기 학습대상 한자로 선택하는 단계를 포함할 수 있다.In addition, the step of classifying Chinese characters for learning may include filtering Chinese characters whose utilization frequency of the Chinese characters in all grades is equal to or higher than a preset threshold frequency among the predetermined Chinese characters based on the first data set, and among the filtered Chinese characters. Based on the first data set prepared for each grade, the frequency of using the Chinese characters in the corresponding grade is greater than or equal to a preset threshold frequency for each grade, and the learning of Chinese characters having a difficulty level that falls within a preset difficulty range for the corresponding grade. It may include the step of selecting a target Chinese character.

한편, 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치는, 소정의 한자 각각에 대하여, 표본 텍스트 데이터로부터 한자 활용 빈도에 대한 제1 데이터 세트를 준비하는 한자 활용 빈도 획득부, 상기 소정의 한자 각각에 대하여, 상기 표본 텍스트 데이터로부터 한글 의미 빈도에 대한 제2 데이터 세트를 준비하는 한글 의미 빈도 획득부 및 상기 제1 데이터 세트, 상기 제2 데이터 세트 및 상기 소정의 한자 각각의 획수 정보에 기초하여 다중 회귀 분석을 통해 상기 소정의 한자 각각의 난이도를 측정하는 난이도 분석부를 포함할 수 있다.On the other hand, the Chinese character difficulty measuring apparatus using the regression analysis according to an embodiment of the present application, for each predetermined Chinese character, a Chinese character utilization frequency acquisition unit for preparing a first data set for the Chinese character utilization frequency from sample text data, the predetermined For each of the Chinese characters, a Hangul semantic frequency acquisition unit for preparing a second data set for Hangul semantic frequencies from the sample text data, and the first data set, the second data set, and the stroke count information of each of the predetermined Chinese characters. Based on the multiple regression analysis, it may include a difficulty analysis unit that measures the difficulty level of each of the predetermined Chinese characters.

또한, 상기 표본 텍스트 데이터는 학년별 초등 교과서를 포함하고, 상기 한자 활용 빈도 획득부는 학년별로 구분된 상기 제1 데이터 세트를 준비하고, 상기 한글 의미 빈도 획득부는 학년별로 구분된 상기 제2 데이터 세트를 준비할 수 있다.In addition, the sample text data includes elementary textbooks for each grade, the Chinese character utilization frequency acquisition unit prepares the first data set divided by grade, and the Hangul semantic frequency acquisition unit prepares the second data set classified by grade. can do.

또한, 상기 난이도 분석부는, 상기 소정의 한자 각각에 대한 적정 학습 시기 정보를 수신하는 경우, 상기 적정 학습 시기 정보를 더 고려하여 상기 난이도를 측정할 수 있다.In addition, when receiving information about an appropriate learning time for each of the predetermined Chinese characters, the difficulty analysis unit may measure the difficulty by further considering the appropriate learning time information.

또한, 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치는, 상기 소정의 한자 각각의 난이도 및 상기 한자 활용 빈도에 기초하여 상기 소정의 한자 중 일부를 소정의 학년에 대응하는 학습대상 한자로 분류하는 학습대상 한자 결정부를 포함할 수 있다.In addition, the apparatus for measuring Chinese character difficulty using regression analysis according to an embodiment of the present application includes some of the predetermined Chinese characters corresponding to a predetermined grade based on the difficulty level of each of the predetermined Chinese characters and the frequency of use of the Chinese characters. It may include a learning target Chinese character determination unit classified as.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본원을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary and should not be construed as limiting the present application. In addition to the above-described exemplary embodiments, additional embodiments may exist in the drawings and detailed description of the invention.

전술한 본원의 과제 해결 수단에 의하면, 초등학교 교과서 등 대상자의 학습 수준을 반영할 수 있는 표본 텍스트 데이터로부터 수집 및 추출된 빅 데이터에 기반하여 다중 선형 회귀 분석을 이용하여 소정의 한자 각각에 대한 난이도를 객관적으로 제공할 수 있는 회귀 분석을 이용한 한자 난이도 측정 장치 및 방법을 제공할 수 있는 효과가 있다.According to the above-described problem solving means of the present application, the difficulty of each predetermined Chinese character is determined using multiple linear regression analysis based on the big data collected and extracted from sample text data that can reflect the learning level of the subject, such as elementary school textbooks. There is an effect of providing an apparatus and method for measuring difficulty of Chinese characters using regression analysis that can be provided objectively.

전술한 본원의 과제 해결 수단에 의하면, 한자 난이도 산정에 대한 객관적인 근거를 제공할 수 있다.According to the above-described problem solving means of the present application, it is possible to provide an objective basis for calculating the difficulty of Chinese characters.

전술한 본원의 과제 해결 수단에 의하면, 객관적으로 한자 각각의 난이도를 측정할 수 있어, 소정의 한자를 학년별 학습대상 한자를 분류하거나 난이도 수준을 고려한 급수별 한자 목록을 제공할 수 있다.According to the above-described problem solving means of the present application, it is possible to objectively measure the degree of difficulty of each Chinese character, so that a predetermined Chinese character can be classified for each grade, or a list of Chinese characters for each series in consideration of the level of difficulty can be provided.

전술한 본원의 과제 해결 수단에 의하면, 교과서 등에서 실제 활용되는 어휘, 단어들을 고려하여 학습대상 한자를 제공함으로써 학습 대상자에게 더 큰 한자 학습 동기 부여를 제공할 수 있다.According to the above-described problem solving means of the present application, a greater motivation for learning Chinese characters can be provided to a learner by providing a Chinese character to be learned in consideration of vocabulary and words actually used in a textbook.

전술한 본원의 과제 해결 수단에 의하면, 학습 대상자의 수준을 고려한 맞춤형 학습 서비스를 제공할 수 있다.According to the above-described problem solving means of the present application, it is possible to provide a customized learning service in consideration of the level of the learner.

다만, 본원에서 얻을 수 있는 효과는 상기된 바와 같은 효과들로 한정되지 않으며, 또 다른 효과들이 존재할 수 있다.However, the effect obtainable in the present application is not limited to the above-described effects, and other effects may exist.

도 1은 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치를 포함하는 한자 난이도 측정 시스템의 개략적인 구성도이다.
도 2는 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치가 한자 활용 빈도를 획득하는 과정을 설명하기 위한 개념도이다.
도 3은 수집된 문장과 해당 문장에서 한자 단어에 해당하는 한글 부분에 대응되는 한자 표기를 병기한 문장을 나타낸 도표이다.
도 4는 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치가 한글 의미 빈도를 획득하는 과정을 설명하기 위한 개념도이다.
도 5는 수집된 문장과 해당 문장에 대한 토큰화 결과 및 어근화 결과를 나타낸 도표이다.
도 6은 소정의 한자의 의미, 음, 제1 데이터 세트, 제2 데이터 세트 및 소정의 한자 각각의 획수 정보를 통합하여 나타낸 도표이다.
도 7은 소정의 기준에 따라 제1 데이터 세트, 제2 데이터 세트 및 획수 정보를 포함하는 특성 중 다중 회귀를 진행할 특성 일부를 예시적으로 선택하여 나타낸 도표이다.
도 8은 다중 회귀 분석을 통해 완성한 한자 난이도 예측식에서의 특성별 가중치값 및 Bias 값을 예시적으로 나타낸 도표이다
도 9는 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치에 의해 소정의 한자 각각의 난이도를 측정한 결과를 나타낸 도표이다.
도 10은 학년별 학습대상 한자를 분류하기 위한 학년별 기 설정된 난이도 범위 및 학년별로 선택된 학습대상 한자의 수를 나타낸 도표이다.
도 11은 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치와 연계된 일 실험예로써, 초등학교 1학년에 대한 학습대상 한자 분류 결과를 나타낸 도표이다.
도 12는 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치의 개략적인 구성도이다.
도 13은 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 방법의 동작 흐름도이다.1 is a schematic configuration diagram of a Chinese character difficulty measuring system including a Chinese character difficulty measuring apparatus using regression analysis according to an embodiment of the present application.
2 is a conceptual diagram illustrating a process of obtaining a Chinese character utilization frequency by a Chinese character difficulty measuring apparatus using regression analysis according to an embodiment of the present application.
3 is a diagram showing a sentence in which a collected sentence and a Chinese character notation corresponding to a Hangul part corresponding to a Chinese character word in the corresponding sentence are also included.
4 is a conceptual diagram illustrating a process of obtaining a Hangul semantic frequency by a Chinese character difficulty measuring apparatus using regression analysis according to an embodiment of the present application.
5 is a chart showing collected sentences, tokenization results and rootization results for the sentences.
6 is a chart showing the meaning, sound, the first data set, the second data set, and the number of strokes of each of the predetermined Chinese characters.
FIG. 7 is a diagram illustrating by exemplarily selecting some of the characteristics to be subjected to multiple regression among characteristics including a first data set, a second data set, and stroke number information according to a predetermined criterion.
8 is a diagram exemplarily showing weight values and bias values for each characteristic in a Chinese character difficulty prediction equation completed through multiple regression analysis.
9 is a chart showing a result of measuring the difficulty of each predetermined Chinese character by a Chinese character difficulty measuring apparatus using a regression analysis according to an embodiment of the present application.
10 is a chart showing the number of learning target kanji selected for each grade and a preset difficulty range for each grade for classifying kanji for learning by grade.
11 is an experimental example linked to a device for measuring Chinese character difficulty using a regression analysis according to an embodiment of the present application, and is a chart showing the result of classifying Chinese characters to be studied for a first grader of an elementary school.
12 is a schematic configuration diagram of a Chinese character difficulty measuring apparatus using regression analysis according to an embodiment of the present application.
13 is a flowchart illustrating a method of measuring Chinese character difficulty using regression analysis according to an exemplary embodiment of the present application.

아래에서는 첨부한 도면을 참조하여 본원이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본원의 실시예를 상세히 설명한다. 그러나 본원은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본원을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present application will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present application. However, the present application may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts not related to the description are omitted in order to clearly describe the present application, and similar reference numerals are attached to similar parts throughout the specification.

본원 명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결" 또는 "간접적으로 연결"되어 있는 경우도 포함한다. Throughout this specification, when a part is said to be "connected" with another part, it is not only a case that it is "directly connected", but also "electrically connected" or "indirectly connected" with another element interposed therebetween. "Including the case.

본원 명세서 전체에서, 어떤 부재가 다른 부재 "상에", "상부에", "상단에", "하에", "하부에", "하단에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout the present specification, when a member is positioned "on", "upper", "upper", "under", "lower", and "lower" of another member, this means that a member is located on another member. This includes not only the case where they are in contact but also the case where another member exists between the two members.

본원 명세서 전체에서, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다.In the entire specification of the present application, when a certain part "includes" a certain component, it means that other components may be further included rather than excluding other components unless specifically stated to the contrary.

도 1은 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치를 포함하는 한자 난이도 측정 시스템의 개략적인 구성도이다.1 is a schematic configuration diagram of a Chinese character difficulty measuring system including a Chinese character difficulty measuring apparatus using regression analysis according to an embodiment of the present application.

도 1을 참조하면, 본원의 일 실시예에 따른 한자 난이도 측정 시스템(10)은 표본 텍스트 데이터(1)에 대한 분석을 통해 표본 텍스트 데이터(1)에서 추출된 한자 활용 빈도 및 한글 의미 빈도를 고려하여 소정의 한자(2)에 포함된 한자 각각의 난이도를 측정(평가)하거나 소정의 한자(2) 중 일부의 한자를 학습대상 한자로 선별하는 시스템을 의미할 수 있다.Referring to FIG. 1, the Chinese character difficulty measuring system 10 according to an embodiment of the present application considers the frequency of using Chinese characters and the frequency of Korean meaning extracted from the sample text data 1 through analysis of the sample text data 1 Thus, it may mean a system that measures (evaluates) the difficulty level of each Chinese character included in a predetermined Chinese character (2) or selects some Chinese characters from the predetermined Chinese characters (2) as learning target Chinese characters.

본원의 실시예에 관한 설명에서 소정의 한자(2)는 교육부 지정 상용한자 1800자를 의미하는 것일 수 있으나, 이에만 한정되는 것은 아니다. 예를 들어, 소정의 한자(2)는 한국어문회, 한자교육진흥회, 대한검정회 등의 기관에서 제공하는 한자 검정시험에서의 급수별 한자일 수 있다. 다른 예로, 소정의 한자(2)는 복수의 한자를 포함하는 임의의 한자 집합으로 결정될 수 있다.In the description of the embodiment of the present application, the predetermined Chinese character (2) may mean 1800 common Chinese characters designated by the Ministry of Education, but is not limited thereto. For example, the predetermined kanji (2) may be a kanji for each grade in a kanji test provided by an institution such as the Korean Literature Society, the Chinese Character Education Promotion Association, and the Korean Examination Association. As another example, the predetermined Chinese character 2 may be determined as an arbitrary set of Chinese characters including a plurality of Chinese characters.

이해를 돕기 위해 예시하면, 소정의 한자(2)가 교육부 지정 상용한자 1800자인 경우, 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치(100)(이하, '한자 난이도 측정 장치(100)'라 한다.)는, 교육부 지정 상용한자 1800자에 포함된 1800개의 한자 각각의 난이도를 표본 텍스트 데이터(1)를 분석하여 획득된 빅 데이터를 기반으로 평가하여 제공하는 장치로 이해될 수 있다.Illustratively for better understanding, when the predetermined Chinese character (2) is 1800 common Chinese characters designated by the Ministry of Education, the Chinese character difficulty measuring device 100 using a regression analysis according to an embodiment of the present application (hereinafter,'Chinese character difficulty measuring device 100 )'.) can be understood as a device that evaluates and provides the difficulty level of each of 1800 Chinese characters included in 1800 common Chinese characters designated by the Ministry of Education based on big data obtained by analyzing sample text data (1). .

또한, 본원의 실시예에 관한 설명에서 표본 텍스트 데이터(1)는 학년별 초등 교과서를 의미하는 것일 수 있으나, 이에만 한정되는 것은 아니다. 예를 들어, 표본 텍스트 데이터(1)는 본원의 한자 난이도 측정 장치(100)를 통해 산출하고자 하는 한자 난이도의 목적, 유형 등에 따라 임의의 서적을 포함하거나 중등부 교과서, 고등부 교과서 등으로 결정되는 것일 수 있다.In addition, in the description of the embodiment of the present application, the sample text data (1) may mean an elementary textbook for each grade, but is not limited thereto. For example, the sample text data (1) may include arbitrary books according to the purpose, type, etc. of the Chinese character difficulty to be calculated through the Chinese character difficulty measuring device 100 of the present application, or may be determined as a secondary school textbook, a high school textbook, etc. have.

이해를 돕기 위해 예시하면, 표본 텍스트 데이터(1)가 학년별 초등 교과서인 경우, 일반적인 초등학생의 교육 커리큘럼 등을 고려하여 소정의 한자(2) 각각에 대한 난이도가 측정되는 것일 수 있으며, 이와 달리 표본 텍스트 데이터(1)가 중등부 교과서 또는 고등부 교과서인 경우, 일반적인 중학생이나 고등학생의 교육 커리큘럼 등을 고려하여 소정의 한자(2) 각각에 대한 난이도가 측정되는 것일 수 있다.For better understanding, if the sample text data (1) is an elementary textbook for each grade, the difficulty level for each of the predetermined Chinese characters (2) may be measured in consideration of the educational curriculum of general elementary school students. When the data (1) is a middle school textbook or a high school textbook, the difficulty level for each of the predetermined Chinese characters (2) may be measured in consideration of the educational curriculum of a general middle school student or high school student.

달리 말해, 본원에서의 소정의 한자(2)는 난이도를 측정하고자 하는 평가 대상이고, 표본 텍스트 데이터(1)는 소정의 한자(2)에 대한 난이도 측정의 기준이 되는 학습 데이터를 의미할 수 있다. 다만, 본원의 실시예에 관한 아래의 설명은 설명의 편의를 위하여 소정의 한자(2)가 교육부 지정 상용한자 1800자이고, 표본 텍스트 데이터(1)가 학년별 초등 교과서로 선정된 것을 기준으로 설명하도록 한다.In other words, a predetermined Chinese character (2) in the present application is an evaluation target for measuring the difficulty level, and the sample text data (1) may mean learning data that is a standard for measuring the difficulty level for a predetermined Chinese character (2). . However, for convenience of explanation, the following description of the embodiments of the present application is based on the fact that the predetermined Chinese character (2) is 1800 common Chinese characters designated by the Ministry of Education, and the sample text data (1) is selected as an elementary textbook for each grade. do.

한자 난이도 측정 장치(100)는, 소정의 한자(2) 각각에 대하여, 표본 텍스트 데이터로(1)부터 한자 활용 빈도에 대한 제1 데이터 세트(A)를 준비할 수 있다.The Chinese character difficulty measuring apparatus 100 may prepare a first data set (A) for the frequency of using Chinese characters from sample text data (1) for each of the predetermined Chinese characters (2).

도 2는 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치가 한자 활용 빈도를 획득하는 과정을 설명하기 위한 개념도이다.2 is a conceptual diagram illustrating a process of obtaining a Chinese character utilization frequency by a Chinese character difficulty measuring apparatus using regression analysis according to an embodiment of the present application.

도 2를 참조하면, 한자 난이도 측정 장치(100)는, 표본 텍스트 데이터(1)에 등장하는 문장(sentence)을 수집하고, 수집된 문장 각각에서 한자 단어에 해당하는 한글 부분에 대응되는 한자 표기를 병기(marked sentence)할 수 있다.Referring to FIG. 2, the Chinese character difficulty measuring apparatus 100 collects sentences appearing in sample text data 1, and performs a Chinese character representation corresponding to a Hangul part corresponding to a Chinese character word in each of the collected sentences. Can be marked sentence.

도 3은 수집된 문장과 해당 문장에서 한자 단어에 해당하는 한글 부분에 대응되는 한자 표기를 병기한 문장을 나타낸 도표이다.3 is a diagram showing a sentence in which a collected sentence and a Chinese character notation corresponding to a Hangul part corresponding to a Chinese character word in the corresponding sentence are also included.

도 3을 참조하면, 한자 난이도 측정 장치(100)는 표본 텍스트 데이터(1)에 등장하는 문장 중 예시적으로 '학교에서 만난 사람들과 인사해 봅시다.'라는 문장을 수집할 수 있고, 수집된 문장에서 한자 단어에 해당하는 한글 부분(학교, 인사)에 대응되는 한자 표기(學校, 人事)를 병기할 수 있다. 병기 방식과 관련하여 도 3을 참조하면, 예시적으로 한자 난이도 측정 장치(100)는 한글 부분과 연속하여 괄호를 표시하고 괄호안에 한글 부분에 대응되는 한자 표기를 병기하는 방식을 채택할 수 있으나, 이에만 한정되는 것은 아니다.Referring to FIG. 3, the Chinese character difficulty measuring apparatus 100 may collect, for example, a sentence "Let's say hello to the people we met at school" among sentences appearing in the sample text data 1, and the collected sentences. In, the Chinese characters corresponding to the Hangul part (school, personnel) corresponding to the Chinese word can be written together. Referring to FIG. 3 with respect to the staging method, as an example, the Chinese character difficulty measuring apparatus 100 may adopt a method in which parentheses are displayed in succession with the Hangul part and a Chinese character corresponding to the Hangul part is written in the parentheses. It is not limited to this.

본원의 일 실시예에 따르면, 한글 부분에 대응되는 한자 표기는 Utagger 등의 한글 한자 자동 변환 시스템을 활용하여 병기되는 것일 수 있다.According to the exemplary embodiment of the present application, the Chinese character notation corresponding to the Hangul part may be performed using a Hangul Hanja automatic conversion system such as Utagger.

또한, 한자 난이도 측정 장치(100)는 한자 표기가 병기된 문장(marked sentence)과 소정의 한자(2) 각각을 매핑하여 한자 활용 빈도를 카운트할 수 있다.In addition, the Chinese character difficulty measuring apparatus 100 may count the frequency of using Chinese characters by mapping each of a marked sentence with a Chinese character and a predetermined Chinese character 2.

본원의 일 실시예에 따르면, 소정의 한자(2) 각각에 대한 한자 활용 빈도는 수집된 문장에 한자 표기를 병기한 문장 집합에서 해당 한자가 등장한 횟수를 카운트한 것일 수 있다.According to an exemplary embodiment of the present application, the frequency of using Chinese characters for each of the predetermined Chinese characters 2 may be counting the number of times the corresponding Chinese characters appear in a sentence set in which Chinese characters are notation in the collected sentences.

또한, 본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는 획득된 소정의 한자(2) 각각에 대한 한자 활용 빈도가 올바르게 카운트되었는지 검토하고 수정하는 과정을 통해 제1 데이터 세트(A)를 확정할 수 있다.In addition, according to an embodiment of the present application, the Chinese character difficulty measuring apparatus 100 examines and corrects whether the Chinese character utilization frequency for each of the obtained predetermined Chinese characters (2) is correctly counted, so that the first data set (A) Can be confirmed.

예를 들어, 한자 난이도 측정 장치(100)는 한자 표기가 병기된 문장(marked sentence)를 탐색하여 해당 한글 부분의 의미에 부합하도록 올바르게 결정되었는지를 확인하고, 부적절한 한자 표기가 병기된 경우 해당 한자 표기의 병기를 수정할 수 있다.For example, the Chinese character difficulty measuring device 100 checks whether the Chinese character is correctly determined to match the meaning of the corresponding Korean part by searching for a marked sentence in which the Chinese character is written, and if an inappropriate Chinese character is written, the corresponding Chinese character is displayed. You can modify your weapon.

또한, 한자 난이도 측정 장치(100)는, 소정의 한자(2) 각각에 대하여, 표본 텍스트 데이터(1)로부터 한글 의미 빈도에 대한 제2 데이터 세트(B)를 준비할 수 있다.In addition, the Chinese character difficulty measuring apparatus 100 may prepare a second data set B for Hangul semantic frequency from the sample text data 1 for each of the predetermined Chinese characters 2.

도 4는 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치가 한글 의미 빈도를 획득하는 과정을 설명하기 위한 개념도이다.4 is a conceptual diagram illustrating a process of obtaining a Hangul semantic frequency by a Chinese character difficulty measuring apparatus using regression analysis according to an embodiment of the present application.

도 4를 참조하면, 한자 난이도 측정 장치(100)는, 표본 텍스트 데이터에 등장하는 문장을 수집하고, 수집된 문장 각각을 토큰화(Tokenizing)할 수 있다. 또한, 한자 난이도 측정 장치(100)는 토큰화를 통해 분절된 문장 내 단어 각각을 어근화(Stemming)할 수 있다.Referring to FIG. 4, the Chinese character difficulty measuring apparatus 100 may collect sentences appearing in sample text data and tokenize each of the collected sentences. In addition, the Chinese character difficulty measuring apparatus 100 may stemming each word in a segmented sentence through tokenization.

토큰화(Tokenizing)는 주어진 코퍼스(corpus)를 토큰(Token)을 단위로 하여 나누는 작업을 의미하는 용어로, 단어를 토큰 단위로 하여 문장을 분절하거나, 문장 부호를 제거하고 띄어쓰기를 기준으로 문장을 분절하는 등의 작업을 가리킬 수 있다.Tokenizing is a term that refers to the work of dividing a given corpus in units of tokens. Segmenting sentences using words as token units, removing punctuation marks, and making sentences based on spaces. It can refer to tasks such as segmenting.

어근화(Stemming)는 어간 추출 등으로 달리 지칭될 수 있으며, 어근을 중심으로 어미를 제거하거나 변화시켜 표준화하는 작업을 가리킬 수 있다.Stemming may be referred to differently as stem extraction, etc., and may refer to the work of standardizing by removing or changing the ending around the root.

도 5는 수집된 문장과 해당 문장에 대한 토큰화 결과 및 어근화 결과를 나타낸 도표이다. 5 is a chart showing collected sentences, tokenization results and rootization results for the sentences.

도 5를 참조하면, 한자 난이도 측정 장치(100)는 표본 텍스트 데이터(1)에 등장하는 문장 중 예시적으로 '학교에서 만난 사람들과 인사해 봅시다.'라는 문장을 수집할 수 있고, 수집된 문장에 대한 토큰화(Tokenizing)를 통해 단어 단위로 해당 문장을 분절할 수 있다. 토큰화에 의해 분절된 결과는 도 5를 참조하면, '학교/Noun', '에서/Josa', '만난/Noun', '사람/Noun', '들/Suffix', '과/Josa', '인사/Noun', '해/Verb', '봅시다/Verb', './Punctuation'으로 각각 표현될 수 있다.Referring to FIG. 5, the Chinese character difficulty measuring apparatus 100 may collect, for example, a sentence “Let’s say hello to the people we met at school” among sentences appearing in the sample text data 1, and the collected sentences. The sentence can be segmented in units of words through tokenizing for. Referring to FIG. 5, the results segmented by tokenization are'School/Noun','At/Josa','Meet/Noun','People/Noun','Suffix','Gwa/Josa', It can be expressed as'Greetings/Noun','Sun/Verb','Let's see/Verb', and'./Punctuation'.

본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는 토큰화를 통해 분절된 단어 각각의 품사 정보를 획득할 수 있다. 예를 들어 도 5를 참조하면, 분절된 단어 각각의 품사 정보는 분절된 단어에 '/품사' 형태로 표시될 수 있으며, 품사 표기 중 Noun은 명사를, Josa는 조사를, Suffix는 접미사를, Verb는 동사를, Punctuation은 구두점을 각각 의미하는 것일 수 있다. 다만, 품사 형태 표시는 이에만 한정되는 것은 아니다.According to an exemplary embodiment of the present disclosure, the Chinese character difficulty measuring apparatus 100 may obtain part of speech information of each segmented word through tokenization. For example, referring to FIG. 5, part of speech information of each segmented word may be displayed in the form of'/ part of speech' in the segmented word. Among parts of speech notation, Noun is a noun, Josa is a survey, Suffix is a suffix, Verb may refer to verbs and punctuation may refer to punctuation marks. However, the display of parts of speech is not limited to this.

또한, 도 5를 참조하면, 토큰화를 통해 분절된 단어들을 어근화한 결과(변환된 어근)는 '학교/Noun', '에서/Josa', '만난/Noun', '사람/Noun', '들/Suffix', '과/Josa', '인사/Noun', '하다/Verb', '보다/Verb', './Punctuation'으로 각각 표현될 수 있다.In addition, referring to FIG. 5, the result of rooting the segmented words through tokenization (transformed root) is'School/Noun','At/Josa','Met/Noun','People/Noun', It can be expressed as'Deul/Suffix','And/Josa','Greeting/Noun','Hada/Verb','Boda/Verb', and'./Punctuation', respectively.

구체적으로, 토큰화 결과와 어근화 결과를 비교하면, 어근화에 따라 '봅시다'가 '보다'로, '해'가 '하다'로 각각 변환될 수 있다.Specifically, if the tokenization result and the rooting result are compared,'Let's see' may be converted into'look' and'sun' may be converted to'hada' according to the rooting.

또한, 본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는 OKT(Open-Korean-Txt) 등의 한글 형태소 분석기를 활용하여 어근화를 수행하는 것일 수 있다.In addition, according to an embodiment of the present application, the Chinese character difficulty measuring apparatus 100 may perform rooting using a Hangul morpheme analyzer such as Open-Korean-Txt (OKT).

한자 난이도 측정 장치(100)는 어근화를 통해 변환된 어근과 소정의 한자(2) 각각을 매핑하여 한글 의미 빈도를 카운트할 수 있다. 또한, 본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는 한글 의미 빈도를 카운트할 때, 변환된 어근에 포함된 본용언에 기초하여 한글 의미 빈도를 카운트하는 것일 수 있다. 예를 들어, 어근화에 의해 원본 단어 '살펴보다'가 본용언인 '살피다'와 보조용언인 '보다'로 변환된 경우, 한자 난이도 측정 장치(100)는 실질적 의미를 나타내는 본용언에 해당하는 '살피다'에 대응하는 소정의 한자(2) 중 적어도 하나의 한자에 대한 한글 의미 빈도는 카운트하되, 문법적 의미를 나타내는 보조용언에 해당하는 '보다'에 대응하는 한자의 한글 의미 빈도는 카운트하지 않도록 동작할 수 있다.The Chinese character difficulty measuring apparatus 100 may count the Hangul meaning frequency by mapping each of the roots converted through rooting and a predetermined Chinese character 2. In addition, according to an exemplary embodiment of the present disclosure, when counting the Hangul meaning frequency, the Chinese character difficulty measuring apparatus 100 may count the Hangul meaning frequency based on the original word included in the converted root. For example, when the original word'look through' is converted into the main word'look through' and the supplementary word'bather' by rooting, the Chinese character difficulty measuring device 100'looks through', which corresponds to the main term representing the actual meaning. The Hangul semantic frequency of at least one of the predetermined Chinese characters (2) corresponding to 'is counted, but the Hangul semantic frequency of the Chinese characters corresponding to'more' corresponding to the auxiliary term indicating the grammatical meaning is not counted. I can.

또한, 본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는 소정의 한자(2) 각각의 유의어를 고려하여 매핑을 수행함으로써 한글 의미 빈도를 카운트할 수 있다. 예를 들어, 변환된 어근 '보다/Verb'과 관련하여(다만, 여기서는 '보다'가 본 용언으로 활용된 경우를 가정한다.) 볼 견(見), 볼 관(觀), 볼 시(視) 등의 한자의 한글 의미 빈도가 카운트되는 것일 수 있다.In addition, according to an exemplary embodiment of the present disclosure, the apparatus 100 for measuring difficulty of Chinese characters may count the frequency of Hangul meanings by performing mapping in consideration of the synonyms of each of the predetermined Chinese characters 2. For example, in relation to the transformed root'Boda/Verb' (however, it is assumed here that'Boda' is used as this term). The frequency of Hangul meanings of Chinese characters such as) may be counted.

또한, 본원의 일 실시에에 따르면, 한자 난이도 측정 장치(100)는 변환된 어근이 등장한 문장 내의 문맥을 고려하여 한글 의미 빈도를 카운트할 수 있다. 예를 들어, 변환된 어근 '눈/Noun'과 관련하여, 한자 난이도 측정 장치(100)는 변환된 어근이 등장한 문장 내에서의 문맥을 고려하여 빛의 자극을 받아 물체를 볼 수 있는 감각 기관인 '눈'을 의미하는 것인지, 대기의 수증기가 찬 기운을 만나 얼어서 땅 위로 떨어지는 얼음의 결정체인 '눈'을 의미하는 것인지를 구분하고, 분석된 의미에 부합하는 의미를 가지는 소정의 한자에 대한 한글 의미 빈도를 카운트할 수 있다. 달리 말해, 한자 난이도 측정 장치(100)는 동음이의어를 고려하여 한글 의미 빈도를 카운트할 수 있다.In addition, according to an embodiment of the present application, the Chinese character difficulty measuring apparatus 100 may count the frequency of Hangul meaning in consideration of the context in the sentence in which the converted root appears. For example, with respect to the transformed root'eye/Noun', the Chinese character difficulty measuring device 100 considers the context in the sentence in which the transformed root appears, and is a sensory organ that can see an object by being stimulated by light. It distinguishes whether it means'snow' or'snow', which is a crystal of ice falling on the ground by freezing when water vapor in the atmosphere meets a cold energy, and the Hangul meaning for a predetermined Chinese character that has a meaning consistent with the analyzed meaning. You can count the frequency. In other words, the Chinese character difficulty measuring apparatus 100 may count the frequency of Hangul meanings in consideration of the homophones.

또한, 본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는 획득된 소정의 한자(2) 각각에 대한 한글 의미 빈도가 올바르게 카운트되었는지 검토하고 수정하는 과정을 통해 제2 데이터 세트(B)를 확정할 수 있다.In addition, according to an embodiment of the present application, the Chinese character difficulty measuring apparatus 100 examines and corrects whether the Hangul semantic frequency for each of the obtained predetermined Chinese characters (2) is correctly counted, so that the second data set (B) Can be confirmed.

도 6은 소정의 한자의 의미, 음, 제1 데이터 세트, 제2 데이터 세트 및 소정의 한자 각각의 획수 정보를 통합하여 나타낸 도표이다.6 is a chart showing the meaning, sound, the first data set, the second data set, and the number of strokes of each of the predetermined Chinese characters.

도 6을 참조하면, 제1 데이터 세트(A), 제2 데이터 세트(B) 및 소정의 한자(2) 각각에 대한 정보는 테이블 형태로 획득될 수 있다.Referring to FIG. 6, information on each of a first data set (A), a second data set (B), and a predetermined Chinese character (2) may be obtained in the form of a table.

구체적으로 살펴보면, 소정의 한자(2) 각각에 대하여 식별 번호(idhanja)가 부여될 수 있고(예를 들어, 교육부 지정 상용한자 1800자의 경우 0 내지 1799), 한자 각각의 한문 표기(hanja), 음(sound), 의미(mean), 획수(St, C) 등이 함께 획득될 수 있다.Specifically, an identification number (idhanja) may be assigned to each of the predetermined Chinese characters (2) (for example, 0 to 1799 in the case of 1800 common Chinese characters designated by the Ministry of Education), and the Chinese characters (hanja) for each Chinese character, um (sound), meaning, and number of strokes ( St , C) can be obtained together.

또한, 본원의 일 실시예에 따르면, 표본 텍스트 데이터(1)가 학년별 초등 교과서인 경우, 제1 데이터 세트(A) 및 제2 데이터 세트(B)는 학년별로 구분되어 준비(획득)될 수 있다. 본원의 일 실시예에 따르면, 표본 텍스트 데이터(1)는 초등 1학년부터 6학년까지의 국어 교과서, 초등 1학년 및 2학년의 바른 생활, 슬기로운 생활, 즐거운 생활 교과서, 초등 3학년부터 6학년까지의 사회 교과서를 포함할 수 있다. 다만, 이에만 한정되는 것은 아니며, 초등 커리큘럼의 변동에 따라 다른 표본 텍스트 데이터(1)가 활용될 수 있다.In addition, according to an embodiment of the present application, when the sample text data 1 is an elementary textbook for each grade, the first data set A and the second data set B may be divided and prepared (obtained) for each grade. . According to an embodiment of the present application, the sample text data (1) is a Korean language textbook from the first grade to the sixth grade of elementary school, the right life of the first and second graders of elementary school, a wise life, a fun life textbook, and from the third to sixth grade of elementary school. May include social textbooks. However, it is not limited thereto, and other sample text data 1 may be used according to changes in the elementary curriculum.

예를 들어, 도 6을 참조하면, FHU는 전체 학년(1학년 내지 6학년) 초등 교과서 전체에서의 소정의 한자(2) 각각에 대한 한자 활용 빈도이고, FHU_1 내지 FHU_6은 각각의 학년별 초등 교과서에서의 소정의 한자(2) 각각에 대한 한자 활용 빈도일 수 있다. 즉, FHU_1는 1학년 초등 교과서에서의 한자 활용 빈도를, FHU_6은 6학년 초등 교과서에서의 한자 활용 빈도일 수 있다.For example, referring to FIG. 6, FHU is the frequency of using Chinese characters for each predetermined Chinese character (2) in the entire grade (1st to 6th grade) elementary textbook, and FHU_1 to FHU_6 are in elementary textbooks for each grade. It may be the frequency of using the Chinese characters for each of the predetermined Chinese characters (2). That is, FHU_1 may be the frequency of using Chinese characters in first grade elementary textbooks, and FHU_6 may be the frequency of using Chinese characters in sixth grade elementary textbooks.

마찬가지로, 도 6을 참조하면, FWS는 전체 학년(1학년 내지 6학년) 초등 교과서 전체에서의 소정의 한자(2) 각각에 대한 한글 의미 빈도이고, FWS_1 내지 FWS_6은 각각의 학년별 초등 교과서에서의 소정의 한자(2) 각각에 대한 한글 의미 빈도일 수 있다. 즉, FWS_1는 1학년 초등 교과서에서의 한글 의미 빈도를, FWS_6은 6학년 초등 교과서에서의 한글 의미 빈도일 수 있다Likewise, referring to FIG. 6, FWS is the Hangul meaning frequency for each predetermined Chinese character (2) in the entire grade (1st to 6th grade) elementary textbook, and FWS_1 to FWS_6 are predetermined in elementary textbooks for each grade. It may be the frequency of Hangul meaning for each of the Chinese characters (2). That is, FWS_1 may be the frequency of Hangul meaning in the first grade elementary textbook, and FWS_6 may be the Hangul semantic frequency in the sixth grade elementary textbook.

또한, 한자 난이도 측정 장치(100)는 제1 데이터 세트(A), 제2 데이터 세트(B) 및 소정의 한자(2) 각각의 획수 정보(C)에 기초하여 다중 회귀 분석(Multiple Linear Regression)을 통해 소정의 한자(2) 각각의 난이도를 측정할 수 있다. 도 6을 참조하면, 다중 회귀 분석 적용을 위한 특성(Feature)은 획수 정보(St, C) 1개, 학년별로 분리 수집된 제1 데이터 세트(FHU_n) 6개, 총 학년별 제1 데이터 세트(FHU) 1개, 학년별로 분리 수집된 제2 데이터 세트(FWS_n) 6개, 총 학년별 제2 데이터 세트(FWS) 1개를 도합하여 15개로 결정될 수 있으나, 이에만 한정되는 것은 아니다. 다른 예로, 후술하는 바와 같이, 한자 난이도 측정 장치(100)는 소정의 기준에 따라 획득된 특성들(예를 들면 15개) 중 일부의 특성을 선택하여 다중 회귀 분석을 적용할 수 있다.In addition, the Chinese character difficulty measuring apparatus 100 is based on the first data set (A), the second data set (B), and the stroke number information (C) of each of the predetermined Chinese characters (2), multiple linear regression analysis (Multiple Linear Regression). It is possible to measure the difficulty of each of the predetermined Chinese characters (2). Referring to FIG. 6, the feature for applying multiple regression analysis is 1 stroke count information ( St , C), 6 first data sets (FHU_n ) collected separately for each grade, and a first data set for each grade (FHU). ) One, six second data sets ( FWS_n ) collected separately for each grade, and one second data set ( FWS ) for each grade may be combined to be 15, but is not limited thereto. As another example, as will be described later, the Chinese character difficulty measuring apparatus 100 may apply multiple regression analysis by selecting some of the characteristics (eg, 15) obtained according to a predetermined criterion.

이하에서는, 도 7 및 도 8을 참조하여, 한자 난이도 측정 장치(100)가 다중 회귀 분석을 통해 소정의 한자(2) 각각에 대한 난이도를 측정하는 과정을 상세히 설명하도록 한다.Hereinafter, with reference to FIGS. 7 and 8, a process in which the Chinese character difficulty measuring apparatus 100 measures the difficulty level for each of the predetermined Chinese characters 2 through multiple regression analysis will be described in detail.

본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는, 제1 데이터 세트(A), 제2 데이터 세트(B) 및 획수 정보(C)를 포함하는 특성(Feature) 중 일부를 선택할 수 있다. 예를 들어, 한자 난이도 측정 장치(100)는 단계적 회귀 분석(Stepwise Regression Analysis)를 활용하여 획득된 특성들 중 일부를 선택할 수 있다. 구체적으로, 한자 난이도 측정 장치(100)는 단계적 회귀 분석(Stepwise Regression Analysis)를 활용하여 특성 중 일부를 포함시키거나 배제시킨 상태에서의 학습된 다중 회귀 분석 모델을 검증하는 과정을 반복하여 특성을 선택할 수 있다.According to an embodiment of the present application, the Chinese character difficulty measuring apparatus 100 may select some of features including a first data set (A), a second data set (B), and stroke number information (C). have. For example, the Chinese character difficulty measuring apparatus 100 may select some of the acquired characteristics using a stepwise regression analysis. Specifically, the Chinese character difficulty measuring apparatus 100 repeats the process of verifying the learned multiple regression analysis model in a state in which some of the features are included or excluded using Stepwise Regression Analysis to select a feature. I can.

이러한, 제1 데이터 세트(A), 제2 데이터 세트(B) 및 획수 정보(C)를 포함하는 특성(Feature) 중 일부를 선택하는 프로세스는 예시적으로 총 한자 활용 빈도(FHU)가 임계 수준 이하로 적은 소정의 한자(2) 중 일부의 한자를 제외하는 방식으로 반복 수행될 수 있다. 본원의 일 실시예에 따르면, 일부의 한자를 제외하기 위한 기준이 되는 임계 수준은 각 반복 시행마다 달리 설정될 수 있다.The process of selecting some of the features including the first data set (A), the second data set (B), and the number of strokes information (C) is illustratively the total Chinese character utilization frequency ( FHU ) at a critical level. It may be repeatedly performed in a manner excluding some Chinese characters among the predetermined Chinese characters (2) written below. According to an exemplary embodiment of the present disclosure, a threshold level that is a criterion for excluding some Chinese characters may be set differently for each repeated trial.

본원의 일 실시예에 따르면, 특성 중 일부를 포함시키거나 배제시킨 상태에서의 다중 회귀 분석 모델을 검증하는 것은 각각의 모델에서, 추정된 선형 모형이 주어진 자료에 적합한 정도를 측정하는 척도인 결정계수(coefficient of determination)를 산출함으로써 검증되는 것일 수 있다.According to an embodiment of the present application, verifying the multiple regression model in the state of including or excluding some of the features is a coefficient of determination, which is a measure of the degree to which the estimated linear model fits a given data in each model. It may be verified by calculating (coefficient of determination).

또한, 본원의 일 실시예에 따르면, 특성 중 일부를 포함시키거나 배제시킨 상태에서의 다중 회귀 분석 모델을 검증하는 것은 각각의 모델에서, RMSE(Root Mean Squared Error)를 산출함으로써 검증되는 것일 수 있다. 구체적으로, 모델마다의 RMSE 값은 하기 식 1에 의해 산출되는 것일 수 있다.In addition, according to an embodiment of the present disclosure, verifying a multiple regression analysis model in a state in which some of the features are included or excluded may be verified by calculating a root mean squared error (RMSE) in each model. . Specifically, the RMSE value for each model may be calculated by Equation 1 below.

[식 1][Equation 1]

도 7은 소정의 기준에 따라 제1 데이터 세트, 제2 데이터 세트 및 획수 정보를 포함하는 특성 중 다중 회귀를 진행할 특성 일부를 예시적으로 선택하여 나타낸 도표이다.FIG. 7 is a diagram illustrating by exemplarily selecting some of the characteristics to be subjected to multiple regression among characteristics including a first data set, a second data set, and stroke number information according to a predetermined criterion.

도 7을 참조하면, 특성 중 일부를 포함시키거나 배제시킨 상태에서의 다중 회귀 분석 모델에 대한 검증을 반복 수행한 결과 각각의 결정계수(coefficient of determination) 값을 고려하여, 획수 정보(St, C), 3학년 교과서에 대한 한글 의미 빈도(FWS_3), 5학년 교과서에 대한 한글 의미 빈도(FWS_5), 1학년 교과서에 대한 한자 활용 빈도(FHU_1), 2학년 교과서에 대한 한자 활용 빈도(FHU_2) 및 4학년 교과서에 대한 한자 활용 빈도(FHU_4)가 특성으로 선택될 수 있다. 도 7을 참조하면, 상술한 특성들을 포함하는 다중 회귀 모델의 경우 결정계수 값이 0.852로 다른 시행에 비해 가장 우수한 결과를 보이는 것을 확인할 수 있다.Referring to FIG. 7, as a result of repeatedly performing verification of the multiple regression analysis model in the state of including or excluding some of the characteristics, in consideration of each coefficient of determination, stroke number information ( St , C ), Hangul semantic frequency for 3rd grade textbooks ( FWS_3 ), Hangul semantic frequency for 5th grade textbooks ( FWS_5 ), Chinese character usage frequency for 1st grade textbooks ( FHU_1 ), Chinese character usage frequency for 2nd grade textbooks ( FHU_2 ), and The frequency of using Chinese characters ( FHU_4 ) for 4th grade textbooks can be selected as a characteristic. Referring to FIG. 7, it can be seen that in the case of a multiple regression model including the above-described characteristics, the coefficient of determination value is 0.852, which is the best result compared to other trials.

또한, 도 7을 참조하면, 소정의 한자(2) 중 총 한자 활용 빈도(FHU)가 2 이상인 한자를 활용하는 경우 결정계수 값이 가장 우수한 결과를 보이는 것을 확인할 수 있다. 따라서, 한자 난이도 측정 장치(100)는 준비된 소정의 한자(2) 각각에 대한 제1데이터 세트(A), 제2 데이터 세트(B) 및 획수 정보(C) 중 총 한자 활용 빈도가 임계 수준(FHU≥2) 이상인 한자들의 선택된 특성(St, FWS_3, FWS_5, FHU_1, FHU_2, FHU_4)을 활용하여 다중 회귀를 진행할 수 있다.In addition, referring to FIG. 7, it can be seen that when a Chinese character having a total Chinese character utilization frequency (FHU ) of 2 or more among the predetermined Chinese characters (2) is used, the coefficient of determination shows the best result. Therefore, the Chinese character difficulty measuring apparatus 100 has the total Chinese character utilization frequency of the first data set (A), the second data set (B), and the stroke number information (C) for each of the prepared predetermined Chinese characters (2) at a critical level ( Multiple regression can be performed using selected characteristics (St , FWS_3 , FWS_5 , FHU_1 , FHU_2 , FHU_4 ) of Chinese characters with FHU ≥2) or higher.

다만, 전술한 과정에 의해 선택된 특성, 총 한자 활용 빈도에 대한 임계 수준은 표본 텍스트 데이터(1) 또는 소정의 한자(2)를 어떤 것으로 설정하는지에 따라 변동될 수 있는 예시적인 사항으로 이해되어야 하며, 본원의 기술적 사상을 한정하도록 해석되어서는 안될 것이다.However, the characteristics selected by the above-described process and the threshold level for the total Chinese character utilization frequency should be understood as exemplary matters that may change depending on which sample text data (1) or a predetermined Chinese character (2) is set, It should not be construed to limit the technical idea of the present application.

본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는 선택된 특성 및 소정의 한자(2) 중 한자 활용 빈도에 대하여 결정된 임계 수준에 따라 필터링된 한자(예를 들어, FHU가 2 이상인 한자들)에 대하여 선택된 특성에 기초하여 다중 회귀 분석을 적용하여 특성별 가중치값 및 Bias 값을 계산하여 한자 난이도 예측식을 완성할 수 있다.According to an embodiment of the present application, the Chinese character difficulty measuring apparatus 100 is a Chinese character filtered according to a threshold level determined for a selected characteristic and a Chinese character utilization frequency among a predetermined Chinese character (2) (e.g., Chinese characters having an FHU of 2 or more). ), a weight value and a bias value for each feature may be calculated by applying multiple regression analysis on the basis of the selected feature to complete the Chinese character difficulty prediction equation.

도 8은 다중 회귀 분석을 통해 완성한 한자 난이도 예측식에서의 특성별 가중치값 및 Bias 값을 예시적으로 나타낸 도표이다.8 is a diagram exemplarily showing a weight value and a bias value for each characteristic in a Chinese character difficulty prediction equation completed through multiple regression analysis.

도 8을 참조하면, 한자 난이도 측정 장치(100)는 정규화된 획수 정보(NSt, C)에 대한 가중치를 0.6164로, 정규화된 3학년 교과서에 대한 한글 의미 빈도(NFWS_3)에 대한 가중치를 -1.1085로, 정규화된 5학년 교과서에 대한 한글 의미 빈도(NFWS_5)에 대한 가중치를 -0.2659로, 정규화된 1학년 교과서에 대한 한자 활용 빈도(NFHU_1)에 대한 가중치를 -0.2659로, 정규화된 2학년 교과서에 대한 한자 활용 빈도(NFHU_2)에 대한 가중치를 0.2542로, 정규화된 4학년 교과서에 대한 한자 활용 빈도(NFHU_4)에 대한 가중치를 -0.3491로, Bias 값을 0.4096으로 결정하여 한자 난이도 예측식을 완성할 수 있다. 완성된 한자 난이도 예측식을 수식으로 표현하면 하기 식 2와 같다.Referring to FIG. 8, the Chinese character difficulty measuring apparatus 100 sets the weight for the normalized stroke number information (NSt, C) to 0.6164 and the weight for the Hangul semantic frequency (NFWS_3) for the normalized third grade textbook to -1.1085. , The weight of the Hangul semantic frequency (NFWS_5) for the normalized fifth grade textbook is -0.2659, the weight for the Chinese character utilization frequency (NFHU_1) for the normalized first grade textbook is -0.2659, and the normalized second grade textbook is The Chinese character difficulty prediction equation can be completed by determining the weight for the frequency of using Chinese characters (NFHU_2) as 0.2542, the weight for the frequency of using Chinese characters (NFHU_4) for the normalized 4th grade textbook as -0.3491, and the Bias value as 0.4096. . The completed Chinese character difficulty prediction equation is expressed as Equation 2 below.

[식 2][Equation 2]

여기서, C는 소정의 한자(2)에 포함된 각각의 한자를 나타내는 변수이고, D는 난이도 예측 함수(식)을 의미할 수 있다. 즉, 한자 난이도 측정 장치(100)는 식 2 형태의 한자 난이도 예측식을 완성하면, 소정의 한자(2) 각각의 한자에 대한 6가지 특성값을 식에 대입하여 해당 한자에 대한 난이도 값을 획득할 수 있다.Here, C is a variable representing each Chinese character included in a predetermined Chinese character (2), and D may mean a difficulty prediction function (expression). That is, when the Chinese character difficulty measuring apparatus 100 completes the Chinese character difficulty prediction equation in the form of Equation 2, the six characteristic values for each Chinese character of a predetermined Chinese character (2) are substituted into the equation to obtain the difficulty value for the corresponding Chinese character. can do.

또한, 본원의 일 실시예에 따르면, 선택된 특성 각각에 대한 정규화된 데이터는 해당 특성에 대한 데이터를 학습 데이터(Train)와 테스트 데이터(Test set)로 분할(Split)하고, MinMaxScaler를 이용하여 정규화하는 과정을 통해 획득될 수 있다. 예시적으로, 학습 데이터(Train)는 전체 데이터 중 80%로 선택되고, 테스트 데이터(Test set)은 전체 데이터 중 20%로 선택될 수 있다.In addition, according to an embodiment of the present application, normalized data for each of the selected characteristics is divided into training data (Train) and test data (Test set), and normalized using MinMaxScaler. It can be obtained through the process. For example, the training data (Train) may be selected as 80% of the total data, and the test data (Test set) may be selected as 20% of the total data.

또한, 본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는 소정의 한자(2) 각각에 대한 적정 학습 시기 정보를 수신하는 경우, 수신된 적정 학습 시기 정보를 더 고려하여 난이도를 측정할 수 있다.In addition, according to an embodiment of the present application, when receiving appropriate learning timing information for each of the predetermined Chinese characters (2), the Chinese character difficulty measuring device 100 may measure the difficulty by further considering the received appropriate learning timing information. I can.

본원의 일 실시예에 따르면, 적정 학습 시기 정보는 소정의 한자(2) 각각에 대한 적정 학습 시기에 대한 설문 결과를 저장하는 별도의 데이터베이스 또는 서버로부터 한자 난이도 측정 장치(100)에 인가되는 것일 수 있다.According to an embodiment of the present application, the appropriate learning time information may be applied to the Chinese character difficulty measuring apparatus 100 from a separate database or server that stores a questionnaire result for an appropriate learning time for each predetermined Chinese character (2). have.

본원의 일 실시예에 따르면, 적정 학습 시기 정보는 소정의 한자(2) 각각에 대한 적정 학습 시기를 1학년 1학기부터 6학년 2학기까지의 12개의 값으로 책정한 형태로 생성될 수 있다. 예를 들어, 어느 하나의 한자에 대하여 적정 학습 시기가 '3학년 1학기'로 판단된 경우, 해당 한자에 대한 적정 학습 시기 정보는 '5'의 값을 가질 수 있다.According to the exemplary embodiment of the present application, the appropriate learning time information may be generated in a form in which the appropriate learning time for each predetermined Chinese character (2) is determined as 12 values from the 1st semester of the 1st grade to the 2nd semester of the 6th grade. For example, when it is determined that the appropriate learning time for any one Chinese character is '3rd grade 1st semester', the information on the appropriate learning time for the corresponding Chinese character may have a value of '5'.

한자 난이도 측정 장치(100)는 수신된 소정의 한자(2) 각각에 대한 복수의 적정 학습 시기 정보를 표준 편차를 고려하여 분포 중 95%(2σ)에 해당하는 값의 평균으로 해당 한자에 대한 적정 학습 시기 정보를 확정할 수 있다. 이렇듯, 한자 난이도 측정 장치(100)는 소정의 한자에 대하여 확정된 적정 학습 시기 정보를 보유한 경우, 적정 학습 시기 정보에 기초하여 한자 난이도 예측식에서의 가중치 값 및 Bias 값을 수정하도록 구현될 수 있다.The Chinese character difficulty measuring apparatus 100 uses the average of the values corresponding to 95% (2σ) of the distribution by considering the standard deviation of a plurality of appropriate learning time information for each of the received predetermined Chinese characters (2). The learning timing information can be confirmed. As such, the apparatus 100 for measuring Chinese character difficulty may be implemented to correct a weight value and a Bias value in a Chinese character difficulty prediction equation based on the appropriate learning time information when the predetermined Chinese character has appropriate learning time information.

도 9는 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치에 의해 소정의 한자 각각의 난이도를 측정한 결과를 나타낸 도표이다.9 is a chart showing a result of measuring the difficulty of each predetermined Chinese character by a Chinese character difficulty measuring apparatus using a regression analysis according to an embodiment of the present application.

도 9를 참조하면, 한자 난이도 측정 장치(100)에 의해 제공되는 한자 각각의 난이도는 DL로 측정될 수 있다. 여기서, DL의 값이 클수록 난이도가 높은 것으로 이해될 수 있다.Referring to FIG. 9, the difficulty level of each Chinese character provided by the Chinese character difficulty measuring apparatus 100 may be measured in DL. Here, it can be understood that the higher the value of DL, the higher the difficulty.

또한, 한자 난이도 측정 장치(100)는 측정된 소정의 한자(2) 각각의 난이도에 기초하여 한자로 이루어진 단어의 난이도를 측정할 수 있다. 본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는 난이도를 측정하고자 하는 단어를 이루는 각각의 한자에 대하여 측정된 난이도의 평균값을 단어의 난이도로 결정할 수 있다. 예를들어, 한자로 이루어진 단어인 '소방관(消防官)'의 경우, 한자 난이도 측정 장치(100)가 다중 회귀 분석을 통해 측정한 한자 '소(消)'에 대한 난이도, 한자 '방(防)'에 대한 난이도 및 한자 '방(官)'에 대한 난이도의 평균을 해당 단어의 난이도로 측정(평가)할 수 있다.In addition, the Chinese character difficulty measuring apparatus 100 may measure the difficulty of a word composed of Chinese characters based on the measured difficulty of each of the predetermined Chinese characters 2. According to an exemplary embodiment of the present disclosure, the Chinese character difficulty measuring apparatus 100 may determine an average value of the measured difficulty for each Chinese character constituting a word for which the difficulty level is to be measured, as the difficulty of the word. For example, in the case of the word'firefighter', which is a word composed of Chinese characters, the difficulty level of the Chinese character'small' measured by the Chinese character difficulty measuring device 100 through multiple regression analysis, and the Chinese character'bang (防) You can measure (evaluate) the average of the difficulty level for')' and the difficulty level for the Chinese character'Bang (官)' as the difficulty level of the word.

나아가, 한자 난이도 측정 장치(100)는 측정된 소정의 한자(2) 각각의 난이도에 기초하여 한문을 포함하는 텍스트(한문 원문 자료, 한글과 한문이 혼용된 문서 등)의 전체적인 난이도를 측정할 수 있다.Furthermore, the Chinese character difficulty measuring device 100 can measure the overall difficulty of text including Chinese characters (original Chinese data, documents in which Korean and Chinese are mixed, etc.) based on the measured difficulty of each of the predetermined Chinese characters (2). have.

또한, 본원의 일 실시예에 따르면, 한자 난이도 측정 장치(100)는 측정된 소정의 한자(2) 각각의 난이도에 기초하여 한글을 포함하는 텍스트(예를 들면, 전체적으로 한글로 작성되었으나 한자 단어에 해당하는 한글 부분을 포함하는 경우)로부터 한자 단어에 해당하는 부분의 한자의 난이도를 취합하여, 해당 텍스트의 가독성을 평가할 수 있다. 에를 들어, 해당 텍스트의 가독성은 기 설정된 복수의 등급 중 어느 하나의 등급으로 선택되는 방식으로 평가될 수 있다.In addition, according to an embodiment of the present application, the Chinese character difficulty measuring apparatus 100 includes a text including Hangul (for example, the entire Chinese character was written in Korean, but the Chinese character When the corresponding Hangul part is included), the level of difficulty of the kanji of the part corresponding to the kanji word can be collected and the readability of the corresponding text can be evaluated. For example, the readability of the text may be evaluated in a manner that is selected as any one of a plurality of preset grades.

또한, 한자 난이도 측정 장치(100)는 측정된 소정의 한자(2) 각각의 난이도 및 한자 활용 빈도에 기초하여 소정의 한자(2) 중 일부를 소정의 학년에 대응하는 학습대상 한자로 분류할 수 있다.In addition, the Chinese character difficulty measuring apparatus 100 may classify some of the predetermined Chinese characters 2 into a learning target Chinese character corresponding to a predetermined grade based on the measured difficulty of each of the predetermined Chinese characters 2 and the frequency of use of the Chinese characters. have.

구체적으로, 한자 난이도 측정 장치(100)는 제1 데이터 세트(A)에 기초하여 소정의 한자(2) 중 전체 학년에서의 한자 활용 빈도(FHU)가 기 설정된 임계 빈도 이상인 한자를 필터링하고, 필터링된 한자 중 학년별로 준비된 제1 데이터 세트(A, FHU_1 내지 FHU_6)에 기초하여 해당 학년에서의 한자 활용 빈도가 기 설정된 학년별 임계 빈도 이상이고, 해당 학년에 대하여 기 설정된 난이도 범위에 속하는 난이도를 가지는 한자를 해당 학년에 대한 학습대상 한자로 선택할 수 있다. Specifically, the Chinese character difficulty measuring apparatus 100 filters Chinese characters whose use frequency (FHU ) of Chinese characters in all grades is greater than or equal to a preset threshold frequency among predetermined Chinese characters (2) based on the first data set (A). Based on the first data set (A, FHU_1 to FHU_6 ) prepared for each grade, the frequency of using Chinese characters in the grade is greater than or equal to the preset threshold frequency for each grade, and has a difficulty level that falls within the preset difficulty range for the corresponding grade. Can be selected as the target Chinese character for the relevant grade.

본원의 일 실시예에 따르면, 전체 학년에서의 한자 활용 빈도(FHU)에 대한 기 설정된 임계 빈도는 5일 수 있다. 즉, 한자 난이도 측정 장치(100)는 전체 학년 교과서에서 5 이상의 한자 활용 빈도를 가진 한자들을 1차적으로 선별(필터링)할 수 있다.According to an embodiment of the present application, a preset threshold frequency for the Chinese character utilization frequency (FHU) in the entire grade may be 5. That is, the Chinese character difficulty measuring apparatus 100 may firstly select (filter) Chinese characters having a frequency of using Chinese characters of 5 or more in the textbook for all grades.

또한, 본원의 일 실시예에 따르면, 해당 학년에서의 한자 활용 빈도(FHU_n)에 대한 기 설정된 학년별 임계 빈도는 1일 수 있다. 즉, 한자 난이도 측정 장치(100)는 해당 학년 교과서에서 1 이상의 한자 활용 빈도를 가진 한자들을 2차적으로 선별(필터링)할 수 있다.In addition, according to an exemplary embodiment of the present application, a preset threshold frequency for each grade level for the Chinese character utilization frequency (FHU_n) in a corresponding grade may be 1. That is, the Chinese character difficulty measuring apparatus 100 may secondaryly select (filter) Chinese characters having a frequency of using one or more Chinese characters in a textbook for a corresponding grade.

이어서, 한자 난이도 측정 장치(100)는 1차 선별(필터링) 및 2차 선별(필터링)된 한자들 중에서 해당 학년에 대하여 기 설정된 난이도 범위에 속하는 난이도를 가지는 한자를 해당 학년에 대한 학습대상 한자로 최종 선택할 수 있다. 예를 들어, 1학년에 대한 난이도(d) 범위는 0≤d<0.35이고, 2학년에 대한 난이도(d) 범위는 0.35≤d<0.45이고, 3학년에 대한 난이도(d) 범위는 0.45≤d<0.455이고, 4학년에 대한 난이도(d) 범위는 0.455≤d<0.53이고, 5학년에 대한 난이도(d) 범위는 0.53≤d<0.57이고, 6학년에 대한 난이도(d) 범위는 0.57≤d<0.65일 수 있으나, 이에만 한정되는 것은 아니다.Subsequently, the Chinese character difficulty measuring apparatus 100 uses a Chinese character having a difficulty level that falls within a preset difficulty range for a corresponding grade from among the first selected (filtered) and second selected (filtered) Chinese characters as a learning target Chinese character for the corresponding grade. You can finally choose. For example, the range of difficulty (d ) for first grade is 0≤ d <0.35, the range of difficulty (d ) for second grade is 0.35≤ d <0.45, and the range of difficulty (d) for third grade is 0.45≤ d <0.455, the range of difficulty (d ) for 4th grade is 0.455≤ d <0.53, the range of difficulty (d ) for 5th grade is 0.53≤ d <0.57, and the range of difficulty (d) for 6th grade is 0.57 ≤ d <0.65, but is not limited thereto.

도 10은 학년별 학습대상 한자를 분류하기 위한 학년별 기 설정된 난이도 범위 및 학년별로 선택된 학습대상 한자의 수를 나타낸 도표이다.10 is a chart showing the number of learning target kanji selected for each grade and a preset difficulty range for each grade for classifying kanji for learning by grade.

도 10을 참조하면, 본원의 일 실시예에 의할 때, 한자 난이도 측정 장치(100)는 소정의 한자(2) 중 1학년 학습대상 한자 35개를 선별하고, 2학년 학습대상 한자 32개를 선별하고, 3학년 학습대상 한자 108개를 선별하고, 4학년 학습대상 한자 122개를 선별하고, 5학년 학습대상 한자 148개를 선별하고, 6학년 학습대상 한자 127개를 선별할 수 있다.Referring to FIG. 10, according to an embodiment of the present application, the Chinese character difficulty measuring device 100 selects 35 Chinese characters for first grade learning among predetermined Chinese characters 2, and 32 Chinese characters for second grade learning. You can select 108 Chinese characters for 3rd grade learning, 122 Chinese characters for 4th grade learning, 148 Chinese characters for 5th grade learning, and 127 Chinese characters for 6th grade learning.

도 11은 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치와 연계된 일 실험예로써, 초등학교 1학년에 대한 학습대상 한자 분류 결과를 나타낸 도표이다.11 is an experimental example linked to a device for measuring Chinese character difficulty using a regression analysis according to an embodiment of the present application, and is a chart showing the result of classifying Chinese characters to be studied for a first grader of elementary school.

도 11을 참조하면, 표본 텍스트 데이터(1)에서의 한자 활용 빈도, 한글 의미 빈도 및 한자의 획수 정보를 고려한 난이도 분류를 통해 해당 학년에서의 일반적 학습 수준에 부합하는 학습대상 한자가 선별된 것을 확인할 수 있다.Referring to FIG. 11, it is confirmed that the target Chinese characters that meet the general learning level in the corresponding grade were selected through the classification of the difficulty level in consideration of the frequency of use of Chinese characters, the frequency of Korean meaning, and the number of strokes of the Chinese characters in the sample text data (1). I can.

도 12는 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 장치의 개략적인 구성도이다.12 is a schematic configuration diagram of a Chinese character difficulty measuring apparatus using regression analysis according to an embodiment of the present application.

도 12를 참조하면, 한자 난이도 측정 장치(100)는, 한자 활용 빈도 획득부(110), 한글 의미 빈도 획득부(120), 난이도 분석부(130) 및 학습대상 한자 결정부(140)를 포함할 수 있다.Referring to FIG. 12, the Chinese character difficulty measuring apparatus 100 includes a Chinese character utilization frequency obtaining unit 110, a Korean meaning frequency obtaining unit 120, a difficulty level analyzing unit 130, and a learning target Chinese character determining unit 140. can do.

한자 활용 빈도 획득부(110)는, 소정의 한자(2) 각각에 대하여, 표본 텍스트 데이터(1)로부터 한자 활용 빈도에 대한 제1 데이터 세트(A)를 준비할 수 있다.The Chinese character utilization frequency acquisition unit 110 may prepare a first data set A for the Chinese character utilization frequency from the sample text data 1 for each of the predetermined Chinese characters 2.

여기서, 표본 텍스트 데이터(1)는 학년별 초등 교과서를 포함하고, 한자 활용 빈도 획득부(110)는 학년별로 구분된 제1 데이터 세트(A)를 준비할 수 있다.Here, the sample text data 1 may include elementary textbooks for each grade, and the Chinese character utilization frequency acquisition unit 110 may prepare a first data set A divided for each grade.

한글 의미 빈도 획득부(120)는, 소정의 한자(2) 각각에 대하여, 표본 텍스트 데이터(1)로부터 한글 의미 빈도에 대한 제2 데이터 세트(B)를 준비할 수 있다.The Hangul semantic frequency acquisition unit 120 may prepare a second data set B for the Hangul semantic frequency from the sample text data 1 for each of the predetermined Chinese characters 2.

여기서, 표본 텍스트 데이터(1)는 학년별 초등 교과서를 포함하고, 한글 의미 빈도 획득부(120)는 학년별로 구분된 제2 데이터 세트(B)를 준비할 수 있다.Here, the sample text data 1 may include elementary textbooks for each grade, and the Hangul semantic frequency acquisition unit 120 may prepare a second data set B divided for each grade.

난이도 분석부(130)는, 제1 데이터 세트(A), 제2 데이터 세트(B) 및 소정의 한자(2) 각각의 획수 정보(C)에 기초하여 다중 회귀 분석을 통해 소정의 한자(2) 각각의 난이도를 측정할 수 있다.The difficulty level analysis unit 130 performs multiple regression analysis based on the first data set (A), the second data set (B), and the stroke number information (C) of each of the predetermined Chinese characters (2). ) Each difficulty level can be measured.

또한, 난이도 분석부(130)는 소정의 한자(2) 각각에 대한 적정 학습 시기 정보를 수신하는 경우, 적정 학습 시기 정보를 더 고려하여 난이도를 측정할 수 있다.In addition, when the difficulty level analysis unit 130 receives appropriate learning timing information for each of the predetermined Chinese characters 2, the difficulty level may be measured by further considering the appropriate learning timing information.

학습대상 한자 결정부(140)는, 소정의 한자(2) 각각의 난이도 및 한자 활용 빈도에 기초하여 소정의 한자(2) 중 일부를 소정의 학년에 대응하는 학습대상 한자로 분류할 수 있다.The learning target Chinese character determination unit 140 may classify some of the predetermined Chinese characters 2 as learning target Chinese characters corresponding to a predetermined grade based on the difficulty level of each of the predetermined Chinese characters 2 and the frequency of use of the Chinese characters.

이하에서는 상기에 자세히 설명된 내용을 기반으로, 본원의 동작 흐름을 간단히 살펴보기로 한다.Hereinafter, based on the details described above, the operation flow of the present application will be briefly described.

도 13은 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 방법의 동작 흐름도이다.13 is a flowchart illustrating a method of measuring Chinese character difficulty using regression analysis according to an exemplary embodiment of the present application.

도 13에 도시된 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 방법은 앞서 설명된 한자 난이도 측정 장치(100)에 의하여 수행될 수 있다. 따라서, 이하 생략된 내용이라고 하더라도 한자 난이도 측정 장치(100)에 대하여 설명된 내용은 본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 방법에 대한 설명에도 동일하게 적용될 수 있다.The method of measuring Chinese character difficulty using regression analysis according to an embodiment of the present application illustrated in FIG. 13 may be performed by the apparatus 100 for measuring Chinese character difficulty described above. Accordingly, even if omitted below, the description of the Chinese character difficulty measuring apparatus 100 may be equally applied to the description of a method of measuring Chinese character difficulty using regression analysis according to an exemplary embodiment of the present disclosure.

도 13을 참조하면, 단계 S1310에서 한자 활용 빈도 획득부(110)는, 소정의 한자(2) 각각에 대하여, 표본 텍스트 데이터(1)로부터 한자 활용 빈도에 대한 제1 데이터 세트(A)를 준비할 수 있다.Referring to FIG. 13, in step S1310, the Chinese character utilization frequency acquisition unit 110 prepares a first data set (A) for the Chinese character utilization frequency from the sample text data (1) for each of the predetermined Chinese characters (2). can do.

구체적으로, 단계 S1310에서 한자 활용 빈도 획득부(110)는, 표본 텍스트 데이터(1)에 등장하는 문장을 수집할 수 있다. 이어서 한자 활용 빈도 획득부(110)는 수집된 문장 각각에서 한자 단어에 해당하는 한글 부분에 대응되는 한자 표기를 병기하고, 한자 표기가 병기된 문장과 소정의 한자(2) 각각을 매핑하여 한자 활용 빈도를 카운트함으로써 제1 데이터 세트(A)를 준비할 수 있다.Specifically, in step S1310, the Chinese character utilization frequency acquisition unit 110 may collect sentences appearing in the sample text data 1. Subsequently, the Chinese character utilization frequency acquisition unit 110 writes a Chinese character corresponding to the Hangul part corresponding to the Chinese character word in each of the collected sentences, and maps each sentence with Chinese character notation and a predetermined Chinese character (2) to utilize Chinese characters. The first data set A can be prepared by counting the frequency.

다음으로, 단계 S1320에서 한글 의미 빈도 획득부(120)는, 소정의 한자(2) 각각에 대하여, 표본 텍스트 데이터(1)로부터 한글 의미 빈도에 대한 제2 데이터 세트(B)를 준비할 수 있다.Next, in step S1320, the Hangul semantic frequency acquisition unit 120 may prepare a second data set (B) for the Hangul semantic frequency from the sample text data 1 for each of the predetermined Chinese characters (2). .

구체적으로, 단계 S1320에서 한글 의미 빈도 획득부(120)는 수집된 문장을 토큰화(Tokenizing)하고, 토큰화를 통해 분절된 단어들을 어근화(Stemming)할 수 있다. 또한, 한글 의미 빈도 획득부(120)는 어근화를 통해 변환된 어근과 소정의 한자(2) 각각을 매핑하여 한글 의미 빈도를 카운트함으로써 제2 데이터 세트(B)를 준비할 수 있다. 여기서, 한글 의미 빈도 획득부(120)는 변환된 어근에 포함된 본용언에 기초하여 한글 의미 빈도를 카운트할 수 있다. 또한, 한글 의미 빈도 획득부(120)는 소정의 한자(2) 각각의 유의어를 고려하여 매핑을 수행할 수 있다.Specifically, in step S1320, the Hangul semantic frequency acquisition unit 120 may tokenize the collected sentences and stemming the segmented words through tokenization. In addition, the Hangul semantic frequency acquisition unit 120 may prepare a second data set B by mapping a root converted through rooting and a predetermined Chinese character 2 to count the Hangul semantic frequency. Here, the Hangul semantic frequency acquisition unit 120 may count the Hangul semantic frequency based on the main word included in the converted root. In addition, the Hangul semantic frequency acquisition unit 120 may perform mapping in consideration of the synonyms of each of the predetermined Chinese characters 2.

다음으로, 단계 S1330에서 난이도 분석부(130)는, 제1 데이터 세트(A), 제2 데이터 세트(B) 및 소정의 한자(2) 각각의 획수 정보(C)에 기초하여 다중 회귀 분석을 통해 소정의 한자(2) 각각의 난이도를 측정할 수 있다.Next, in step S1330, the difficulty analysis unit 130 performs multiple regression analysis based on the stroke number information (C) of each of the first data set (A), the second data set (B), and a predetermined Chinese character (2). Through this, the difficulty level of each of the predetermined Chinese characters (2) can be measured.

구체적으로, 단계 S1330에서 난이도 분석부(130)는, 소정의 한자(2) 각각에 대한 적정 학습 시기 정보를 수신하는 경우, 수신된 적정 학습 시기 정보를 더 고려하여 난이도를 측정할 수 있다.Specifically, in step S1330, when receiving appropriate learning timing information for each of the predetermined Chinese characters 2, the difficulty level analysis unit 130 may further consider the received appropriate learning timing information to measure the difficulty.

다음으로, 단계 S1340에서 난이도 분석부(130)는, 소정의 한자 각각의 난이도에 기초하여 한자로 이루어진 단어의 난이도를 측정할 수 있다. 본원의 일 실시예에 따르면 단계 S1340에서 난이도 분석부(130)는, 단어를 이루는 각각의 한자에 대하여 측정된 난이도의 평균값을 단어의 난이도로 결정할 수 있다.Next, in step S1340, the difficulty level analysis unit 130 may measure the difficulty level of a word made of Chinese characters based on the difficulty level of each of the predetermined Chinese characters. According to an exemplary embodiment of the present disclosure, in step S1340, the difficulty analysis unit 130 may determine an average value of the measured difficulty for each Chinese character constituting the word as the difficulty of the word.

다음으로, 단계 S1350에서 학습대상 한자 결정부(140)는, 소정의 한자(2) 각각의 난이도 및 한자 활용 빈도에 기초하여 소정의 한자(2) 중 일부를 소정의 학년에 대응하는 학습대상 한자로 분류할 수 있다.Next, in step S1350, the learning target kanji determination unit 140, based on the difficulty level of each of the kanji (2) and the kanji utilization frequency, converts some of the kanji (2) to be studied kanji corresponding to a predetermined grade. It can be classified as

구체적으로, 단계 S1350에서 학습대상 한자 결정부(140)는, 제1 데이터 세트(A)에 기초하여 소정의 한자(2) 중 전체 학년에서의 한자 활용 빈도가 기 설정된 임계 빈도 이상인 한자를 필터링할 수 있다. 또한, 학습대상 한자 결정부(140)는, 필터링된 한자 중 학년별로 준비된 제1 데이터 세트(A)에 기초하여 해당 학년에서의 한자 활용 빈도가 기 설정된 학년별 임계 빈도 이상이고, 해당 학년에 대하여 기 설정된 난이도 범위에 속하는 난이도를 가지는 한자를 해당 학년에 대한 학습대상 한자로 선택할 수 있다.Specifically, in step S1350, the learning target kanji determination unit 140 filters kanji whose utilization frequency of kanji in all grades is greater than or equal to a preset threshold frequency among predetermined kanji (2) based on the first data set (A). I can. In addition, the learning target kanji determination unit 140, based on the first data set (A) prepared for each grade among the filtered kanji, the frequency of using kanji in a corresponding grade is greater than or equal to a preset threshold frequency for each grade, Chinese characters with difficulty levels that fall within the set difficulty range can be selected as the target Chinese characters for the grade.

상술한 설명에서, 단계 S1310 내지 S1350은 본원의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.In the above description, steps S1310 to S1350 may be further divided into additional steps or may be combined into fewer steps, according to an embodiment of the present disclosure. In addition, some steps may be omitted as necessary, or the order between steps may be changed.

본원의 일 실시예에 따른 회귀 분석을 이용한 한자 난이도 측정 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method for measuring Chinese character difficulty using regression analysis according to an exemplary embodiment of the present application may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

또한, 전술한 회귀 분석을 이용한 한자 난이도 측정 방법은 기록 매체에 저장되는 컴퓨터에 의해 실행되는 컴퓨터 프로그램 또는 애플리케이션의 형태로도 구현될 수 있다.In addition, the method of measuring Chinese character difficulty using the regression analysis described above may be implemented in the form of a computer program or application executed by a computer stored in a recording medium.

전술한 본원의 설명은 예시를 위한 것이며, 본원이 속하는 기술분야의 통상의 지식을 가진 자는 본원의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The foregoing description of the present application is for illustrative purposes only, and those of ordinary skill in the art to which the present application pertains will be able to understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present application. Therefore, it should be understood that the embodiments described above are illustrative and non-limiting in all respects. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본원의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본원의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present application is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present application.

10: 한자 난이도 측정 시스템
100: 회귀 분석을 이용한 한자 난이도 측정 장치
110: 한자 활용 빈도 획득부
120: 한글 의미 빈도 획득부
130: 난이도 분석부
140: 학습대상 한자 결정부
1: 표본 텍스트 데이터
2: 소정의 한자
A: 제1 데이터 세트
B: 제2 데이터 세트
C: 획수 정보10: Chinese character difficulty measuring system
100: Chinese character difficulty measuring device using regression analysis
110: Chinese character utilization frequency acquisition unit
120: Hangul meaning frequency acquisition unit
130: difficulty analysis unit
140: Chinese character determination unit to be studied
1: sample text data
2: prescribed kanji
A: first data set
B: second data set
C: Stroke information

Claims

In the method of measuring the difficulty of Chinese characters using regression analysis,
For each predetermined Chinese character, preparing a first data set for a frequency of using Chinese characters from sample text data;
For each of the predetermined Chinese characters, preparing a second data set for the Hangul meaning frequency from the sample text data; And
Measuring a difficulty level of each of the predetermined Chinese characters through multiple regression analysis based on the first data set, the second data set, and the stroke number information of each of the predetermined Chinese characters,
Containing, Chinese character difficulty measuring method.

The method of claim 1,
Preparing the first data set,
Collecting sentences appearing in the sample text data;
Stating a Chinese character notation corresponding to a Hangul part corresponding to a Chinese character word in each of the sentences; And
Counting the frequency of use of the Chinese characters by mapping each sentence with the Chinese character notation and each of the predetermined Chinese characters,
That includes, Chinese character difficulty measuring method.

The method of claim 2,
Preparing the second data set,
Tokenizing the collected sentences;
Rooting the segmented words through the tokenization; And
Counting the Hangul semantic frequency by mapping each of the roots converted through the rooting and the predetermined Chinese characters,
That includes, Chinese character difficulty measuring method.

The method of claim 3,
Counting the Hangul meaning frequency,
To count the frequency of the Hangul meaning based on the main word included in the converted root, Chinese character difficulty measuring method.

The method of claim 4,
Counting the Hangul meaning frequency,
To perform the mapping in consideration of the synonyms of each of the predetermined Chinese characters, the Chinese character difficulty measuring method.

The method of claim 3,
The sample text data includes elementary textbooks by grade,
The first data set and the second data set, characterized in that the preparation is divided by the grade, the Chinese character difficulty measuring method.

The method of claim 6,
Measuring the difficulty of each of the predetermined Chinese characters,
When receiving information about an appropriate learning time for each of the predetermined Chinese characters, the difficulty level is measured by further considering the appropriate learning time information.

The method of claim 6,
The method further comprising the step of measuring a difficulty level of a word made of Chinese characters based on the difficulty level of each of the predetermined Chinese characters.

The method of claim 8,
Measuring the difficulty of the word,
To determine the average value of the difficulty measured for each Chinese character constituting the word as the difficulty of the word, Chinese character difficulty measuring method.

The method of claim 6,
The method further comprising classifying some of the predetermined Chinese characters into learning target Chinese characters corresponding to a predetermined grade based on the difficulty level of each of the predetermined Chinese characters and the frequency of using the Chinese characters.

The method of claim 10,
The step of classifying the learning target Chinese characters,
Filtering a Chinese character whose utilization frequency of the Chinese character in all grades is greater than or equal to a preset threshold frequency among the predetermined Chinese characters based on the first data set; And
Among the filtered Chinese characters, based on the first data set prepared for each grade, the frequency of using the Chinese characters in the corresponding grade is greater than or equal to a preset threshold frequency for each grade, and corresponding to a Chinese character having a difficulty level that falls within a preset difficulty range for the corresponding grade. Selecting the Chinese character for the study target for the grade,
That includes, Chinese character difficulty measuring method.

In the Chinese character difficulty measuring device using regression analysis,
For each predetermined Chinese character, a Chinese character utilization frequency obtaining unit for preparing a first data set for the Chinese character utilization frequency from sample text data;
A Hangul semantic frequency acquisition unit for preparing a second data set for Hangul semantic frequencies from the sample text data for each of the predetermined Chinese characters; And
Difficulty analysis unit for measuring the difficulty of each of the predetermined Chinese characters through multiple regression analysis based on the first data set, the second data set, and the stroke number information of each of the predetermined Chinese characters,
Containing, Chinese character difficulty measuring device.

The method of claim 12,
The sample text data includes elementary textbooks by grade,
The Chinese character utilization frequency acquisition unit prepares the first data set divided by grade,
The Hangul semantic frequency acquisition unit, characterized in that to prepare the second data set divided by grade, Chinese character difficulty measuring apparatus.

The method of claim 12,
The difficulty analysis unit,
When receiving information about the appropriate learning time for each of the predetermined Chinese characters, the difficulty level is measured by further considering the appropriate learning time information.

The method of claim 13,
The Chinese character difficulty measuring apparatus further comprises a learning target Chinese character determination unit for classifying some of the predetermined Chinese characters into learning target Chinese characters corresponding to a predetermined grade based on the difficulty of each of the predetermined Chinese characters and the frequency of using the Chinese characters.