KR102490752B1

KR102490752B1 - Deep context-based grammatical error correction using artificial neural networks

Info

Publication number: KR102490752B1
Application number: KR1020207005087A
Authority: KR
Inventors: 후이 린; 추안 왕; 루오빙 리
Original assignee: 링고챔프 인포메이션 테크놀로지 (상하이) 컴퍼니, 리미티드
Priority date: 2017-08-03
Filing date: 2017-08-03
Publication date: 2023-01-20
Also published as: JP7031101B2; JP2020529666A; CN111226222A; CN111226222B; KR20200031154A; MX2020001279A; WO2019024050A1

Abstract

문법 오류 검출을 위한 방법 및 시스템이 본 명세서에 개시된다. 일례에서, 문장이 수신된다. 문장 내의 적어도 하나의 목표 단어(target word)는 하나 이상의 문법 오류 유형에 적어도 부분적으로 기초하여 식별된다. 하나 이상의 목표 단어의 각각은 하나 이상의 문법 오류 유형의 적어도 하나에 대응한다. 하나 이상의 목표 단어의 적어도 하나에 대하여, 대응하는 문법 오류 유형에 대한 목표 단어의 분류가 문법 오류 유형에 대하여 훈련된 인공 신경망(artificial neural network) 모델을 이용하여 추정된다. 문장에서의 문법 오류가 목표 단어 및 목표 단어의 추정된 분류에 적어도 부분적으로 기초하여 검출된다.Methods and systems for grammatical error detection are disclosed herein. In one example, a sentence is received. At least one target word in the sentence is identified based at least in part on one or more types of grammatical errors. Each of the one or more target words corresponds to at least one of the one or more grammatical error types. For at least one of the one or more target words, a classification of the target word for the corresponding grammatical error type is estimated using an artificial neural network model trained for the grammatical error type. A grammatical error in the sentence is detected based at least in part on the target word and the estimated classification of the target word.

Description

Deep context-based grammatical error correction using artificial neural networks

본 개시 내용은 일반적으로 인공 지능에 관한 것으로, 더욱 상세하게는, 인공 신경망(artificial neural network)을 이용한 문법 오류 정정에 관한 것이다.The present disclosure relates generally to artificial intelligence and, more particularly, to grammatical error correction using artificial neural networks.

자동화된 문법 오류 정정(grammatical error correction(GEC))은 제2 언어로서 영어를 학습하는 수백만의 사람들을 위한 필수적이고 유용한 도구이다. 작가들은 표준 교정 도구로는 해결되지 않는 다양한 문법 및 어법 실수를 범한다. 문법 오류 검출 및/또는 정정을 위하여 높은 정밀도(precision)와 리콜(recall) 기능을 갖는 자동화된 시스템을 개발하는 것이 자연어 프로세스(natural language process(NLP))에서 빠르게 성장하는 영역이 된다.Automated grammatical error correction (GEC) is an essential and useful tool for millions of people learning English as a second language. Writers make a variety of grammatical and grammatical mistakes that cannot be corrected by standard proofreading tools. Developing automated systems with high precision and recall capabilities for grammatical error detection and/or correction is a rapidly growing area in natural language processing (NLP).

이러한 자동화된 시스템에 대한 많은 가능성이 있지만, 알려진 시스템은 다양한 문법적 오류 패턴의 제한된 커버리지 및 정교한 언어적 특징 엔지니어링 또는 인간-주석 훈련 샘플의 요구와 같은 문제에 직면하였다.Although there are many possibilities for such automated systems, known systems face challenges such as limited coverage of various grammatical error patterns and the need for sophisticated linguistic feature engineering or human-annotation training samples.

일례에서, 문법 오류 검출 방법이 개시된다. 문장이 수신된다. 문장 내의 하나 이상의 목표 단어(target word)가 하나 이상의 문법 오류 유형에 적어도 부분적으로 기초하여 식별된다. 하나 이상의 목표 단어의 각각은 하나 이상의 문법 오류 유형의 적어도 하나에 대응한다. 하나 이상의 목표 단어의 적어도 하나에 대하여, 대응하는 문법 오류 유형에 대한 목표 단어의 분류가 문법 오류 유형에 대하여 훈련된 인공 신경망 모델을 이용하여 추정된다. 모델은 문장 내의 목표 단어 전의 적어도 하나의 단어와 목표 단어 후의 적어도 하나의 단어에 적어도 부분적으로 기초하여 목표 단어의 문맥 벡터(context vector)를 출력하도록 구성된 2개의 순환 신경망(recurrent neural network)을 포함한다. 모델은 목표 단어의 문맥 벡터에 적어도 부분적으로 기초하여 문법 오류 유형에 대한 목표 단어의 분류값을 출력하도록 구성된 피드포워드 신경망(feedforward neural network)을 더 포함한다. 문장 내의 문법 오류가 목표 단어 및 목표 단어의 추정된 분류에 적어도 부분적으로 기초하여 검출된다.In one example, a method for detecting grammatical errors is disclosed. sentence is received. One or more target words within the sentence are identified based at least in part on the one or more grammatical error types. Each of the one or more target words corresponds to at least one of the one or more grammatical error types. For at least one of the one or more target words, a classification of the target word for the corresponding grammatical error type is estimated using an artificial neural network model trained for the grammatical error type. The model includes two recurrent neural networks configured to output a context vector of a target word based at least in part on at least one word before the target word and at least one word after the target word in the sentence. . The model further includes a feedforward neural network configured to output classification values of the target word for types of grammatical errors based at least in part on the context vector of the target word. A grammatical error within the sentence is detected based at least in part on the target word and the estimated classification of the target word.

다른 예에서, 인공 신경망 모델 훈련 방법이 제공된다. 문법 오류 유형에 대하여 문장 내의 목표 단어의 분류를 추정하기 위한 인공 신경망 모델이 제공된다. 모델은 문장 내의 목표 단어 전의 적어도 하나의 단어와 목표 단어 후의 적어도 하나의 단어에 적어도 부분적으로 기초하여 목표 단어의 문맥 벡터를 출력하도록 구성된 2개의 순환 신경망을 포함한다. 모델은 목표 단어의 문맥 벡터에 적어도 부분적으로 기초하여 목표 단어의 분류값을 출력하도록 구성된 피드포워드 신경망을 더 포함한다. 훈련 샘플 세트가 획득된다. 훈련 샘플 세트 내의 각각의 훈련 샘플은 문법 오류 유형에 대한 목표 단어를 포함하는 문장과 문법 오류 유형에 대한 목표 단어의 실제 분류를 포함한다. 순환 신경망과 연관된 제1 파라미터 세트 및 피드포워드 신경망과 연관된 제2 파라미터 세트가 각각의 훈련 샘플에서의 목표 단어의 실제 분류 및 추정된 분류 사이의 차이에 적어도 부분적으로 기초하여 공동으로(jointly) 훈련된다.In another example, a method for training an artificial neural network model is provided. An artificial neural network model for estimating the classification of a target word in a sentence with respect to the type of grammatical error is provided. The model includes two recurrent neural networks configured to output a context vector of a target word based at least in part on at least one word before and after the target word in a sentence. The model further includes a feedforward neural network configured to output a classification value of the target word based at least in part on the context vector of the target word. A set of training samples is obtained. Each training sample in the set of training samples includes a sentence containing the target word for the grammatical error type and an actual classification of the target word for the grammatical error type. A first set of parameters associated with the recurrent neural network and a second set of parameters associated with the feedforward neural network are jointly trained based at least in part on the difference between the actual and estimated classification of the target word in each training sample. .

다른 예에서, 문법 오류 검출 시스템은 메모리 및 메모리에 결합된 적어도 하나의 프로세서를 포함한다. 적어도 하나의 프로세서는, 문장을 수신하고, 하나 이상의 문법 오류 유형에 적어도 부분적으로 기초하여 하나 이상의 목표 단어를 식별하도록 구성된다. 하나 이상의 목표 단어의 각각은 하나 이상의 문법 오류 유형의 적어도 하나에 대응한다. 적어도 하나의 프로세서는, 하나 이상의 목표 단어의 적어도 하나에 대하여, 문법 오류 유형에 대하여 훈련된 인공 신경망 모델을 이용하여 대응하는 문법 오류 유형에 대한 목표 단어의 분류를 추정하도록 더 구성된다. 모델은 문장 내의 목표 단어 전의 적어도 하나의 단어와 목표 단어 후의 적어도 하나의 단어에 적어도 부분적으로 기초하여 목표 단어의 문맥 벡터를 생성하도록 구성된 2개의 순환 신경망을 포함한다. 모델은 목표 단어의 문맥 벡터에 적어도 부분적으로 기초하여 문법 오류 유형에 대한 목표 단어의 분류값을 생성하도록 구성된 피드포워드 신경망을 더 포함한다. 적어도 하나의 프로세서는, 목표 단어 및 목표 단어의 추정된 분류에 적어도 부분적으로 기초하여 문장 내의 문법 오류를 검출하도록 더 구성된다.In another example, a grammatical error detection system includes a memory and at least one processor coupled to the memory. The at least one processor is configured to receive the sentence and identify one or more target words based at least in part on the one or more grammatical error types. Each of the one or more target words corresponds to at least one of the one or more grammatical error types. The at least one processor is further configured to estimate, for at least one of the one or more target words, a classification of the target word for the corresponding grammatical error type using an artificial neural network model trained for the grammatical error type. The model includes two recurrent neural networks configured to generate a context vector of a target word based at least in part on at least one word before and after the target word in a sentence. The model further includes a feedforward neural network configured to generate classification values of the target word for types of grammatical errors based at least in part on the context vector of the target word. The at least one processor is further configured to detect grammatical errors in the sentence based at least in part on the target word and the estimated classification of the target word.

다른 예에서, 문법 오류 검출 시스템은 메모리 및 메모리에 결합된 적어도 하나의 프로세서를 포함한다. 적어도 하나의 프로세서는, 문법 오류 유형에 대하여 문장 내의 목표 단어의 분류를 추정하기 위한 인공 신경망 모델을 제공하도록 구성된다. 모델은 문장 내의 목표 단어 전의 적어도 하나의 단어와 목표 단어 후의 적어도 하나의 단어에 적어도 부분적으로 기초하여 목표 단어의 문맥 벡터를 출력하도록 구성된 2개의 순환 신경망을 포함한다. 모델은 목표 단어의 문맥 벡터에 적어도 부분적으로 기초하여 목표 단어의 분류값을 출력하도록 구성된 피드포워드 신경망을 더 포함한다. 적어도 하나의 프로세서는, 훈련 샘플 세트를 획득하도록 더 구성된다. 훈련 샘플 세트 내의 각각의 훈련 샘플은 문법 오류 유형에 대한 목표 단어를 포함하는 문장과 문법 오류 유형에 대한 목표 단어의 실제 분류를 포함한다. 적어도 하나의 프로세서는, 각각의 훈련 샘플에서의 목표 단어의 실제 분류 및 추정된 분류 사이의 차이에 적어도 부분적으로 기초하여 순환 신경망과 연관된 제1 파라미터 세트 및 피드포워드 신경망과 연관된 제2 파라미터 세트를 공동으로 조정하도록 더 구성된다.In another example, a grammatical error detection system includes a memory and at least one processor coupled to the memory. At least one processor is configured to provide an artificial neural network model for estimating a classification of a target word in a sentence with respect to the type of grammatical error. The model includes two recurrent neural networks configured to output a context vector of a target word based at least in part on at least one word before and after the target word in a sentence. The model further includes a feedforward neural network configured to output a classification value of the target word based at least in part on the context vector of the target word. The at least one processor is further configured to obtain a set of training samples. Each training sample in the set of training samples includes a sentence containing the target word for the grammatical error type and an actual classification of the target word for the grammatical error type. The at least one processor jointly determines a first set of parameters associated with the recurrent neural network and a second set of parameters associated with the feedforward neural network based at least in part on the difference between the actual classification and the estimated classification of the target word in each training sample. It is further configured to adjust to

다른 개념은 문법 오류 검출 및 인공 신경망 모델 훈련을 위한 소프트웨어에 관한 것이다. 이 개념에 따라, 소프트웨어 제품은 적어도 하나의 컴퓨터 판독 가능하고 비일시적인 장치 및 장치가 반송하는 정보를 포함한다. 장치가 반송하는 정보는 요청 또는 동작 파라미터와 연관하는 파라미터에 관한 실행 가능한 명령어일 수 있다.Another concept relates to software for grammatical error detection and training of artificial neural network models. In accordance with this concept, a software product includes at least one computer-readable, non-transitory device and information carried by the device. The information returned by the device may be executable instructions relating to parameters associated with requests or operational parameters.

일례에서, 유형의(tangible) 컴퓨터 판독 가능하고 비일시적인(non-transitory) 장치는 문법 오류 검출을 위하여 기록된 명령어를 가지며, 명령어는 컴퓨터에 의해 실행될 때 컴퓨터가 일련의 동작을 수행하게 한다. 문장이 수신된다. 문장 내의 하나 이상의 목표 단어는 하나 이상의 문법 오류 유형에 적어도 부분적으로 기초하여 식별된다. 하나 이상의 목표 단어의 각각은 하나 이상의 문법 오류 유형의 적어도 하나에 대응한다. 하나 이상의 목표 단어의 적어도 하나에 대하여, 대응하는 문법 오류 유형에 대한 목표 단어의 분류가 문법 오류 유형에 대하여 훈련된 인공 신경망 모델을 이용하여 추정된다. 모델은, 문장 내의 목표 단어 전의 적어도 하나의 단어와 목표 단어 후의 적어도 하나의 단어에 적어도 부분적으로 기초하여 목표 단어의 문맥 벡터를 출력하도록 구성된 2개의 순환 신경망을 포함한다. 모델은 목표 단어의 문맥 벡터에 적어도 부분적으로 기초하여 문법 오류 유형에 대한 목표 단어의 분류값을 출력하도록 구성된 피드포워드 신경망을 더 포함한다. 문장 내의 문법 오류가 목표 단어 및 목표 단어의 추정된 분류에 적어도 부분적으로 기초하여 검출된다.In one example, a tangible computer readable, non-transitory device has instructions recorded thereon for grammatical error detection, which when executed by a computer cause the computer to perform a series of actions. sentence is received. One or more target words in the sentence are identified based at least in part on the one or more grammatical error types. Each of the one or more target words corresponds to at least one of the one or more grammatical error types. For at least one of the one or more target words, a classification of the target word for the corresponding grammatical error type is estimated using an artificial neural network model trained for the grammatical error type. The model includes two recurrent neural networks configured to output a context vector of a target word based at least in part on at least one word before and after the target word in a sentence. The model further includes a feedforward neural network configured to output classification values of the target word for types of grammatical errors based at least in part on the context vector of the target word. A grammatical error within the sentence is detected based at least in part on the target word and the estimated classification of the target word.

다른 예에서, 유형의 컴퓨터 판독 가능하고 비일시적인 장치는 인공 신경망 모델 훈련을 위하여 기록된 명령어를 가지며, 명령어는 컴퓨터에 의해 실행될 때 컴퓨터가 일련의 동작을 수행하게 한다. 문법 오류 유형에 대하여 문장 내의 목표 단어의 분류를 추정하기 위한 인공 신경망 모델이 제공된다. 모델은 문장 내의 목표 단어 전의 적어도 하나의 단어와 목표 단어 후의 적어도 하나의 단어에 적어도 부분적으로 기초하여 목표 단어의 문맥 벡터를 출력하도록 구성된 2개의 순환 신경망을 포함한다. 모델은 목표 단어의 문맥 벡터에 적어도 부분적으로 기초하여 목표 단어의 분류값을 출력하도록 구성된 피드포워드 신경망을 더 포함한다. 훈련 샘플 세트가 획득된다. 훈련 샘플 세트 내의 각각의 훈련 샘플은 문법 오류 유형에 대한 목표 단어를 포함하는 문장과 문법 오류 유형에 대한 목표 단어의 실제 분류를 포함한다. 순환 신경망과 연관된 제1 파라미터 세트 및 피드포워드 신경망과 연관된 제2 파라미터 세트가 각각의 훈련 샘플에서의 목표 단어의 실제 분류 및 추정된 분류 사이의 차이에 적어도 부분적으로 기초하여 공동으로 훈련된다.In another example, a tangible computer-readable, non-transitory device has instructions recorded thereon for training an artificial neural network model, which when executed by a computer cause the computer to perform a series of actions. An artificial neural network model for estimating the classification of a target word in a sentence with respect to the type of grammatical error is provided. The model includes two recurrent neural networks configured to output a context vector of a target word based at least in part on at least one word before and after the target word in a sentence. The model further includes a feedforward neural network configured to output a classification value of the target word based at least in part on the context vector of the target word. A set of training samples is obtained. Each training sample in the set of training samples includes a sentence containing the target word for the grammatical error type and an actual classification of the target word for the grammatical error type. A first set of parameters associated with the recurrent neural network and a second set of parameters associated with the feedforward neural network are jointly trained based at least in part on differences between the actual and estimated classifications of the target word in each training sample.

본 [발명의 내용]은 단지 본 명세서에 설명된 내용에 대한 이해를 제공하기 위하여 일부 실시예들을 예시하는 목적으로만 제공된다. 따라서, 전술한 특징들은 단지 예일 뿐이며, 본 개시 내용에서의 내용의 범위나 사상을 좁히는 것으로 고려되어서는 안 된다. 본 개시 내용의 다른 특징들, 양태들 및 이점들은 이어지는 [발명을 실시하기 위한 구체적인 내용], [도면] 및 [청구범위]로부터 명백하게 될 것이다.This [Summary of the Invention] is provided solely for the purpose of illustrating some embodiments in order to provide an understanding of the subject matter described herein. Accordingly, the foregoing features are merely examples and should not be considered to narrow the scope or spirit of the subject matter in this disclosure. Other features, aspects and advantages of the present disclosure will become apparent from the [Details for Carrying Out the Invention], [Drawings] and [Claims] that follow.

본 명세서에 포함되어 명세서의 일부를 형성하는 첨부된 도면은 본 개시 내용을 도시하며, 설명과 함께, 본 개시 내용의 원리를 설명하고 관련 기술 분야에서의 통상의 기술자가 본 개시 내용을 실시 및 사용할 수 있게 하는 역할을 한다.
도 1은 일 실시예에 따른 문법 오류 정정(GEC) 시스템을 도시하는 블록도이다;
도 2는 도 1의 시스템에 의해 수행되는 자동화된 문법 오류 정정의 일례에 대한 도면이다;
도 3은 일 실시예에 따른 문법 오류 정정 방법의 일례를 도시하는 순서도이다;
도 4는 일 실시예에 따른 도 1의 시스템의 분류 기반 GEC 모듈의 일례를 도시하는 블록도이다;
도 5는 일 실시예에 따른 도 1의 시스템을 이용하여 문장 내의 목표 단어의 ]분류를 제공하는 일례에 대한 도면이다;
도 6은 일 실시예에 따른 문법 오류 정정을 위한 인공 신경망(artificial neural network(ANN)) 모델의 일례를 도시하는 개략도이다;
도 7은 일 실시예에 따른 문법 오류 정정을 위한 ANN 모델의 다른 예를 도시하는 개략도이다;
도 8은 일 실시예에 따른 도 6의 ANN 모델의 일례를 도시하는 상세 개략도이다;
도 9는 일 실시예에 따른 문장의 문법 오류 정정 방법의 일례를 도시하는 순서도이다;
도 10은 일 실시예에 따른 문법 오류 유형에 대하여 목표 단어를 분류하는 방법의 일례를 도시하는 순서도이다;
도 11은 일 실시예에 따른 문법 오류 유형에 대하여 목표 단어를 분류하는 방법의 다른 예를 도시하는 순서도이다;
도 12는 일 실시예에 따른 문법 점수를 제공하는 방법의 일례를 도시하는 순서도이다;
도 13은 일 실시예에 따른 ANN 모델 훈련 시스템을 도시하는 블록도이다;
도 14는 도 13의 시스템에 의해 사용되는 훈련 샘플의 일례에 대한 도면이다;
도 15는 일 실시예에 따른 문법 오류 정정을 위한 ANN 모델 훈련 방법의 일례를 도시하는 순서도이다;
도 16은 일 실시예에 따른 문법 오류 정정을 위하여 ANN 모델을 훈련시키는 일례를 도시하는 개략도이다; 그리고,
도 17은 본 개시 내용에서 설명된 다양한 실시예를 구현하는데 유용한 컴퓨터 시스템이 일례를 도시하는 블록도이다.
본 개시 내용은 첨부된 도면을 참조하여 설명된다. 도면에서, 일반적으로, 유사한 참조 번호는 동일하거나 기능적으로 유사한 요소를 나타낸다. 또한, 일반적으로, 참조 번호의 가장 왼쪽의 숫자(들)는 참조 번호가 처음 나타나는 도면을 식별한다.The accompanying drawings, which are incorporated in and form a part of this specification, illustrate the present disclosure and, together with the description, explain the principles of the present disclosure and enable those skilled in the art to practice and use the disclosure. plays a role in enabling
1 is a block diagram illustrating a grammatical error correction (GEC) system according to one embodiment;
Figure 2 is a diagram of an example of automated grammatical error correction performed by the system of Figure 1;
3 is a flowchart illustrating an example of a method for correcting grammatical errors according to an embodiment;
4 is a block diagram illustrating an example of a classification-based GEC module of the system of FIG. 1 according to one embodiment;
5 is a diagram of an example of providing a ]class of a target word in a sentence using the system of FIG. 1 according to one embodiment;
Fig. 6 is a schematic diagram illustrating an example of an artificial neural network (ANN) model for grammatical error correction according to an embodiment;
Fig. 7 is a schematic diagram illustrating another example of an ANN model for grammatical error correction according to an embodiment;
Fig. 8 is a detailed schematic diagram illustrating an example of the ANN model of Fig. 6 according to an embodiment;
9 is a flowchart illustrating an example of a method for correcting grammatical errors in sentences according to an embodiment;
10 is a flowchart illustrating an example of a method of classifying target words with respect to grammatical error types according to an embodiment;
11 is a flowchart illustrating another example of a method of classifying target words with respect to grammatical error types according to an exemplary embodiment;
12 is a flow chart illustrating an example of a method for providing grammar scores, according to one embodiment;
Figure 13 is a block diagram illustrating an ANN model training system according to one embodiment;
Figure 14 is a diagram of an example of a training sample used by the system of Figure 13;
15 is a flowchart illustrating an example of a method for training an ANN model for grammatical error correction according to an embodiment;
Fig. 16 is a schematic diagram illustrating an example of training an ANN model for grammatical error correction according to an embodiment; And,
17 is a block diagram illustrating an example of a computer system useful for implementing various embodiments described in this disclosure.
The present disclosure is described with reference to the accompanying drawings. In the drawings, like reference numbers generally indicate identical or functionally similar elements. Also, in general, the leftmost digit(s) of a reference number identifies the drawing in which the reference number first appears.

다음의 상세한 설명에서, 관련 개시 내용의 완전한 이해를 제공하기 위하여 다양한 특정 상세가 예로서 설명된다. 그러나, 본 개시 내용이 이러한 상세 없이 실시될 수 있다는 것이 본 기술 분야에서의 통상의 기술자에게 명백하여야 한다. 다른 경우에, 본 개시 내용의 양태들을 불필요하게 흐리게 하는 것을 방지하기 위하여, 잘 알려진 방법, 절차, 시스템, 컴포넌트 및/또는 회로는 상세 내용 없이 상대적으로 고수준(high-level)으로 설명되었다.In the detailed description that follows, by way of example, numerous specific details are set forth in order to provide a thorough understanding of the related disclosure. However, it should be apparent to one skilled in the art that the present disclosure may be practiced without these details. In other instances, well-known methods, procedures, systems, components, and/or circuits have been described at relatively high-level without detail in order to avoid unnecessarily obscuring aspects of the present disclosure.

명세서 및 청구범위 전체에 걸쳐, 용어들은 명시적으로 언급된 의미를 넘어 문맥에서 제안되거나 암시되는 미묘한 의미를 가질 수 있다. 유사하게, 본 명세서에 사용되는 "일 실시예/일례에서"라는 어구는 반드시 동일한 실시예를 지칭하지는 않으며, 본 명세서에 사용되는 "다른 실시예/다른 예에서"라는 어구는 반드시 상이한 실시예를 지칭하지는 않는다. 예를 들어, 청구된 대상이 전부 또는 일부의 예시적인 실시예들의 조합을 포함하는 것이 의도된다.Throughout the specification and claims, terms may have nuanced meanings suggested or implied in context beyond the meanings explicitly stated. Similarly, the phrases "in one embodiment/in one example" as used herein do not necessarily refer to the same embodiment, and the phrases "in another embodiment/in another example" as used herein do not necessarily refer to different embodiments. does not refer For example, it is intended that claimed subject matter include combinations of all or some of the illustrative embodiments.

일반적으로, 용어는 적어도 부분적으로 문맥에서의 사용으로부터 이해될 수 있다. 예를 들어, 본 명세서에서 사용되는 "및", "또는" 또는 "및/또는"과 같은 용어는 적어도 부분적으로 이러한 용어들이 사용되는 문맥에 의존할 수 있는 다양한 의미를 포함할 수 있다. 통상적으로, A, B 또는 C와 같은 목록과 연관되도록 사용되는 경우의 "또는"은 본 명세서에서 베타적 의미로 사용되는 A, B 또는 C뿐만 아니라, 본 명세서에서 포함적 의미로 사용되는 A, B 및 C를 의미하도록 의도된다. 또한, 본 명세서에서 사용되는 "하나 이상의"라는 용어는, 적어도 부분적으로 문맥에 따라, 단수 의미로의 임의의 특징, 구조 또는 특성의 조합을 설명하는데 사용될 수 있거나, 또는 복수 의미로의 특징들, 구조들 또는 특성들의 조합을 설명하는데 사용될 수 있다. 유사하게, "a", "an" 또는 "the"와 같은 단수 용어, 적어도 부분적으로 문맥에 따라, 단수 사용을 시사하거나 복수 사용을 시사하도록 이해될 수 있다. 또한, "~ 기초하여"라는 용어는 반드시 인자들의 배타적 집합을 시사하도록 의도되지는 않는 것으로 이해될 수 있고, 대신에, 또한, 적어도 부분적으로 문맥에 따라, 반드시 명시적으로 설명될 필요가 없는 추가 인자들의 존재를 허용할 수 있다.In general, terms can be understood at least in part from their use in context. For example, terms such as "and", "or" or "and/or" as used herein may include a variety of meanings that may depend, at least in part, on the context in which these terms are used. Typically, “or” when used in connection with a list such as A, B, or C is used herein in an exclusive sense, as well as A, B, or C, used herein in an inclusive sense; It is intended to mean B and C. Also, the term “one or more” as used herein may be used to describe any feature, structure, or combination of characteristics in the singular sense, or features in the plural sense, depending at least in part on context. can be used to describe a combination of structures or properties. Similarly, singular terms such as “a”, “an” or “the” may be construed to imply singular use or to imply plural usage, at least in part depending on the context. Further, it may be understood that the term "based on" is not necessarily intended to imply an exclusive set of factors, but instead, also, at least in part depending on the context, additional additions that do not necessarily need to be explicitly stated. The presence of arguments can be allowed.

아래에서 상세히 개시되는 바와 같이, 다른 신규한 특징들 중에서, 본 명세서에 개시된 자동화된 GEC 시스템 및 방법은 원시 텍스트(native text) 데이터로부터 훈련될 수 있는 심층 문맥 모델(deep context model)을 이용하여 문법 오류를 효과적이고 효율적으로 검출 및 정정하는 능력을 제공한다. 일부 실시예에서, 특정 문법 오류 유형에 대하여, 오류 정정 작업은 문법 문맥 표현이 주로 사용 가능한 원시 텍스트 데이터로부터 학습될 수 있는 분류 문제로서 취급될 수 있다. 전통적인 분류기 방법과 비교하여, 본 명세서에 개시된 시스템 및 방법은 일반적으로 언어적 지식을 필요로 하지만 모든 문맥 패턴을 커버하지 않을 수 있는 정교한 특징 엔지니어링을 필요로 하지 않는다. 일부 실시예에서, 피상적인 특징을 이용하는 대신에, 본 명세서에 개시된 시스템 및 방법은 문맥을 표현하기 위한 순환 신경망(recurrent neural network)과 같은 심층 특징을 직접 이용할 수 있다. 일부 실시예에서, 대량의 감독 데이터가 일반적으로 필요하지만 제한적인 크기로 사용 가능한 전통적인 NLP 작업과는 달리, 본 명세서에 개시된 시스템 및 방법은 풍부한 원시 평문 코퍼스(native plain text corpus)를 활용하고 문법 오류를 효율적으로 정정하기 위하여 문맥 표현 및 분류를 종단간(end-to-end) 방식으로 공동으로(jointly) 학습할 수 있다.As described in detail below, among other novel features, the automated GEC system and method disclosed herein utilizes a deep context model that can be trained from native text data to generate a grammar using a deep context model. It provides the ability to detect and correct errors effectively and efficiently. In some embodiments, for a particular type of grammatical error, the error correction task can be treated as a classification problem in which the grammatical context representation can be learned primarily from available raw text data. Compared to traditional classifier methods, the systems and methods disclosed herein generally require linguistic knowledge but do not require sophisticated feature engineering that may not cover all contextual patterns. In some embodiments, instead of using superficial features, the systems and methods disclosed herein may directly use deep features, such as recurrent neural networks, to represent context. In some embodiments, unlike traditional NLP tasks where large amounts of supervised data are generally needed but available in limited size, the systems and methods disclosed herein utilize a rich native plain text corpus and avoid grammatical errors. Context representations and classifications can be jointly learned in an end-to-end manner to efficiently correct .

추가적인 신규한 특징들은 이어지는 설명에서 부분적으로 설명될 것이며, 부분적으로는 다음에 언급되는 것 및 첨부된 도면을 검토함에 따라 당해 기술 분야에서의 통상의 기술자에게 명백하게 될 것이거나 또는 예들의 생산 또는 동작에 의해 학습될 수 있다. 본 개시 내용의 신규한 특징들은 아래에서 논의되는 상세한 예들에서 설명되는 방법, 수단 및 조합의 다양한 양태들의 결과 또는 사용에 의해 실현되고 획득될 수 있다.Additional novel features will be set forth in part in the description that follows, and in part will become apparent to those skilled in the art upon review of the following references and accompanying drawings or in the making or operation of the examples. can be learned by The novel features of the present disclosure may be realized and obtained by use of or as a result of various aspects of the methods, means and combinations set forth in the detailed examples discussed below.

도 1은 일 실시예에 따른 문법 GEC 시스템(100)을 도시하는 블록도이다. GEC 시스템(100)은 입력 전처리 모듈(102), 파싱(parsing) 모듈(104), 목표 단어 디스패칭(dispatching) 모듈(106) 및 각각이 심층 문맥(deep context)를 이용하여 분류 기반 문법 오류 검출 및 정정을 수행하도록 구성된 복수의 분류 기반 GEC 모듈(108)을 포함한다. 일부 실시예에서, GEC 시스템(100)은 GEC 시스템(100)의 수행을 더 개선하도록 기계 번역 및 사전 정의 규칙 기반 방법과 같은 다른 GEC 방법을 분류 기반 방법과 결합하기 위하여 파이프라인 아키텍처를 이용하여 구현될 수 있다. 도 1에 도시된 바와 같이, GEC 시스템(100)은 기계 번역 기반 GEC 모듈(110), 규칙 기반 GEC 모듈(112) 및 채점(scoring)/정정 모듈(114)을 더 포함할 수 있다.1 is a block diagram illustrating a grammar GEC system 100 according to one embodiment. The GEC system 100 includes an input pre-processing module 102, a parsing module 104, a target word dispatching module 106, and classification-based grammatical error detection using deep context, respectively. and a plurality of classification-based GEC modules 108 configured to perform corrections. In some embodiments, GEC system 100 is implemented using a pipelined architecture to combine classification-based methods with other GEC methods, such as machine translation and predefined rule-based methods, to further improve the performance of GEC system 100. It can be. As shown in FIG. 1 , the GEC system 100 may further include a machine translation based GEC module 110 , a rules based GEC module 112 and a scoring/correction module 114 .

입력 전처리 모듈(102)은 입력 텍스트(116)를 수신하고 입력 텍스트(116)를 전처리하도록 구성된다. 입력 텍스트(116)는 적어도 하나의 영어 문장, 예를 들어, 단일 문장, 문단, 글(article) 또는 임의의 텍스트 코퍼스(text corpus)를 포함할 수 있다. 입력 텍스트(116)는 손글씨(hand writing), 타이핑 또는 복사/붙여넣기를 통해 직접 수신될 수 있다. 입력 텍스트(116)는, 예를 들어, 음성 인식 또는 화상 인식을 통해, 간접적으로도 수신될 수 있다. 예를 들어, 임의의 적합한 음성 인식 기술이 음성 입력을 입력 텍스트(116)로 변환하는데 사용될 수 있다. 다른 예에서, 임의의 적합한 광학 문자 인식(optical character recognition(OCR)) 기술이 화상 내에 포함된 텍스트를 입력 텍스트(116)로 변환하는데 사용될 수 있다.Input preprocessing module 102 is configured to receive input text 116 and preprocess input text 116 . The input text 116 may include at least one English sentence, for example, a single sentence, paragraph, article, or any text corpus. Input text 116 may be received directly via hand writing, typing, or copy/paste. The input text 116 may also be received indirectly, for example through voice recognition or image recognition. For example, any suitable speech recognition technology may be used to convert spoken input into input text 116 . In another example, any suitable optical character recognition (OCR) technique may be used to convert text contained within an image to input text 116 .

입력 전처리 모듈(102)은 다양한 방식으로 입력 텍스트(116)를 전처리할 수 있다. 일부 실시예에서, 문법 오류가 일반적으로 특정 문장의 문맥과 연계하여 분석되기 때문에, 입력 전처리 모듈(102)은 각각의 문장이 후속 과정을 위한 단위로서 취급될 수 있도록 입력 텍스트(116)를 문장으로 분할할 수 있다. 입력 텍스트(116)를 문장으로 분할하는 것은 문장의 시작 및/또는 끝을 인식함으로써 수행될 수 있다. 예를 들어, 입력 전처리 모듈(102)은 문장의 끝의 표시자로서 마침표, 세미 콜론, 물음표 또는 느낌표와 같은 소정의 구두점 기호를 검색할 수 있다. 또한, 입력 전처리 모듈(102)은 문장의 시작에 대한 표시자로서 첫 글자가 대문자로 표시된 단어를 검색할 수 있다. 일부 실시예에서, 입력 전처리 모듈(102)은, 예를 들어, 입력 텍스트(116) 내의 임의의 대문자를 소문자로 변환함으로써, 후속 과정을 용이하게 하기 위하여 입력 텍스트(116)를 소문자화로 바꿀 수 있다. 또한, 일부 실시예에서, 입력 전처리 모듈(102)은 어휘 데이터베이스(118)에 있지 않은 임의의 토큰(token)을 판단하기 위하여 어휘 데이터베이스(118)에 대하여 입력 텍스트(116) 내의 토큰(단어, 구(phrase) 또는 문자열(text string))을 검사할 수 있다. 일치하지 않은 토큰은 특수 토큰, 예를 들어, 단일 unk 토큰(알려지지 않은 토큰(unknown token))으로서 취급될 수 있다. 어휘 데이터베이스(118)는 GEC 시스템(100)에 의해 처리될 수 있는 모든 단어를 포함한다. 어휘 데이터베이스(118)에 있지 않은 임의의 단어 또는 다른 토큰은 GEC 시스템(100)에 의해 무시되거나 다르게 취급될 수 있다.Input preprocessing module 102 may preprocess input text 116 in a variety of ways. In some embodiments, since grammatical errors are generally analyzed in relation to the context of a particular sentence, input preprocessing module 102 converts input text 116 into sentences so that each sentence can be treated as a unit for subsequent processing. can be divided Segmenting the input text 116 into sentences may be performed by recognizing the beginning and/or end of the sentence. For example, the input pre-processing module 102 may retrieve certain punctuation marks, such as periods, semicolons, question marks, or exclamation marks, as indicators of the end of a sentence. Also, the input preprocessing module 102 may search for a word in which the first letter is capitalized as an indicator for the beginning of a sentence. In some embodiments, input preprocessing module 102 may convert input text 116 to lowercase to facilitate subsequent processing, for example, by converting any uppercase letters in input text 116 to lowercase. . Further, in some embodiments, input pre-processing module 102 may perform tokens (words, phrases) in input text 116 against vocabulary database 118 to determine any tokens that are not in vocabulary database 118. (phrase) or text string). A non-matching token may be treated as a special token, eg a single unk token (an unknown token). Vocabulary database 118 contains all words that can be processed by GEC system 100. Any words or other tokens that are not in the vocabulary database 118 may be ignored or otherwise treated by the GEC system 100 .

파싱 모듈(104)은 입력 텍스트(116)의 각각의 문장에서 하나 이상의 목표 단어를 식별하기 위하여 입력 텍스트(116)를 파싱하도록 구성된다. 통일된 모든 문법 오류를 고려하고 부정확한 텍스트를 정확한 텍스트로 변환하려고 시도하는 알려진 시스템과는 다르게, GEC 시스템(100)은 아래에서 상세히 설명되는 바와 같이 각각의 특정 문법 오류 유형에 대하여 훈련된 모델을 이용한다. 따라서, 일부 실시예에서, 파싱 모듈(104)은 각각의 목표 단어가 적어도 하나의 문법 오류 유형에 대응하도록 미리 정의된 문법 오류 유형에 기초하여 각각의 문장에서 텍스트 토큰으로부터 목표 단어를 식별할 수 있다. 문법 오류 유형은, 관사 오류, 주격 관련 일치(subjective agreement) 오류, 동사 형태 오류, 전치사 오류 및 명사 수 오류를 포함하지만 이에 한정되지 않는다. 문법 오류 유형이 전술한 예들에 한정되지 않으며 임의의 다른 유형을 포함할 수 있다는 것이 이해되어야 한다. 일부 실시예에서, 파싱 모듈(104)은 각각의 문장을 토큰화하고, GEC 시스템(100)에 알려진 어휘 정보 및 지식을 포함하는 어휘 데이터베이스(118)와 관련하여 토큰으로부터 목표 단어를 식별할 수 있다.The parsing module 104 is configured to parse the input text 116 to identify one or more target words in each sentence of the input text 116 . Unlike known systems that consider all grammatical errors unified and attempt to convert incorrect text to correct text, GEC system 100 uses a model trained for each specific grammatical error type, as detailed below. use Accordingly, in some embodiments, parsing module 104 may identify target words from text tokens in each sentence based on predefined grammatical error types such that each target word corresponds to at least one grammatical error type. . Types of grammatical errors include, but are not limited to, article errors, subjective agreement errors, verb form errors, preposition errors, and noun number errors. It should be understood that the types of grammatical errors are not limited to the foregoing examples and may include any other types. In some embodiments, parsing module 104 may tokenize each sentence and identify target words from the tokens relative to vocabulary database 118 containing vocabulary information and knowledge known to GEC system 100. .

예를 들어, 주격 관련 일치 오류에 대하여, 파싱 모듈(104)은 비3인칭 단수 현재형 단어 및 3인칭 단수 현재형 단어 맵 관계를 미리 추출할 수 있다. 그 다음, 파싱 모듈(104)은 목표 단어로서 동사의 위치를 찾을 수 있다. 관사 오류에 대하여, 파싱 모듈(104)은 목표 단어로서 명사 및 명사구(명사 단어와 형용사 단어의 조합)의 위치를 찾을 수 있다. 동사 형태 오류에 대하여, 파싱 모듈(104)은 목표 단어로서 기본형, 동명사 또는 현재 분사, 또는 과거 분사로 있는 동사의 위치를 찾을 수 있다. 전치사 오류에 대하여, 파싱 모듈(104)은 목표 단어로서 전치사의 위치를 찾을 수 있다. 명사 수 오류에 대하여, 파싱 모듈(104)은 목표 단어로서 명사의 위치를 찾을 수 있다. 하나의 단어가 다수의 문법 오류 유형에 대응하는 것으로 파싱 모듈(104)에 의해 식별될 수 있다는 것이 이해되어야 한다. 예를 들어, 동사는 주격 관련 일치 오류 및 동사 형태 오류에 대하여 목표 단어로서 식별될 수 있고, 명사 또는 명사구는 관사 오류 및 명사 수 오류에 대하여 목표 단어로서 식별될 수 있다. 또한, 목표 단어가 명사구와 같은 다수의 단어의 조합인 구를 포함할 수 있다는 것이 이해되어야 한다.For example, for a nominative-related matching error, the parsing module 104 may pre-extract a non-third person singular present tense word and a third person singular present tense word map relationship. Then, the parsing module 104 can locate the verb as the target word. For article errors, the parsing module 104 can locate nouns and noun phrases (combinations of noun and adjective words) as target words. For verb form errors, the parsing module 104 can locate the verb in its base form, gerund, or present participle, or past participle as the target word. For preposition errors, the parsing module 104 can locate the preposition as the target word. For noun count errors, the parsing module 104 can locate the noun as the target word. It should be appreciated that a single word may be identified by parsing module 104 as corresponding to multiple types of grammatical errors. For example, verbs can be identified as target words for nominative agreement errors and verb morphological errors, and nouns or noun phrases can be identified as target words for article-related errors and noun number errors. It should also be understood that a target word may include a phrase that is a combination of multiple words, such as a noun phrase.

일부 실시예에서, 각각의 문법 오류 유형에 대하여, 파싱 모듈(104)은 각각의 목표 단어의 실제 분류를 결정하도록 구성될 수 있다. 파싱 모듈(104)은 목표 단어의 실제 분류값으로서 대응하는 문법 오류 유형에 대하여 각각의 목표 단어에 원 라벨(original label)을 할당할 수 있다. 예를 들어, 주격 관련 일치 오류에 대하여, 동사의 실제 분류는 3인칭 단수 현재형 또는 기본형이다. 파싱 모듈(104)은 목표 단어에 원 라벨, 예를 들어, 목표 단어가 3인칭 단수 현재형인 경우에 "1" 또는 목표 단어가 기본형인 경우에 "0"을 목표 단어에 할당할 수 있다. 관사 오류에 대하여, 목표 단어의 실제 분류는 "a/an", "the" 또는 "관사 없음"일 수 있다. 파싱 모듈(104)은 각각의 목표 단어의 실제 분류를 결정하기 위하여 목표 단어(명사 단어 또는 명사구)의 앞에 있는 관사를 검사할 수 있다. 동사 형태 오류에 관하여, 목표 단어(예를 들어, 동사)의 실제 분류는 "기본형", "동명사 또는 현재 분사" 또는 "과거 분사"일 수 있다. 전치사 오류에 대하여, 가장 자주 사용되는 전치사가 실제 분류로서 파싱 모듈(104)에 의해 사용될 수 있다. 일부 실시예에서, 실제 분류는 다음의 11개의 원 라벨을 포함한다: "about", "at", "by", "for", "from", "in", "of", "on", "to", "until", "with" 및 "against". 명사 수 오류에 관하여, 목표 단어(예를 들어, 명사)의 실제 분류는 단수형 또는 복수형일 수 있다. 일부 실시예에서, 파싱 모듈(104)은 어휘 데이터베이스(118)와 관련된 음성 부분(part of speech(PoS)) 태그에 기초하여 대응하는 문법 오류 유형에 대하여 각각의 목표 단어의 원 라벨을 결정할 수 있다.In some embodiments, for each type of grammatical error, parsing module 104 may be configured to determine the actual classification of each target word. The parsing module 104 may assign an original label to each target word for the corresponding grammatical error type as an actual classification value of the target word. For example, for nominative-related congruence errors, the actual class of verbs is third person singular present or base. The parsing module 104 may assign an original label to the target word, eg, "1" if the target word is in the third person singular present tense or "0" if the target word is in the basic form. For article errors, the actual classification of the target word could be "a/an", "the" or "no article". The parsing module 104 may examine the article preceding the target word (noun word or noun phrase) to determine the actual class of each target word. Regarding verb morphological errors, the actual classification of the target word (eg, verb) may be “base form,” “gerund or present participle,” or “past participle.” For preposition errors, the most frequently used preposition may be used by the parsing module 104 as the actual classification. In some embodiments, the actual classification includes the following 11 original labels: "about", "at", "by", "for", "from", "in", "of", "on", "to", "until", "with" and "against". Regarding noun count errors, the actual classification of the target word (eg, noun) can be singular or plural. In some embodiments, parsing module 104 may determine the original label of each target word for the corresponding grammatical error type based on a part of speech (PoS) tag associated with vocabulary database 118. .

목표 단어 디스패칭 모듈(106)은 대응하는 문법 오류 유형에 대하여 각각의 목표 단어를 분류 기반 GEC 모듈(108)에 디스패치하도록 구성된다. 일부 실시예에서, 각각의 문법 오류 유형에 대하여, ANN 모델(120)은 대응하는 분류 기반 GEC 모듈(108)에 의해 독립적으로 훈련되어 사용된다. 따라서, 각각의 분류 기반 GEC 모듈(108)은 특정 문법 오류 유형과 연관되고, 동일한 문법 오류 유형에 대하여 목표 단어를 다루도록 구성된다. 예를 들어, (전치사 오류 유형에 대하여) 전치사인 목표 단어에 대하여, 목표 단어는 전치사 오류를 다루는 분류 기반 GEC 모듈(108)에 전치사를 전송할 수 있다. 하나의 단어가 다수의 문법 오류 유형에 대한 목표 단어로서 결정될 수 있기 때문에, 목표 단어 디스패칭 모듈(106)은 동일한 목표 단어를 다수의 분류 기반 GEC 모듈(108)로 전송할 수 있다는 것이 이해되어야 한다. 또한, 일부 실시예에서, GEC 시스템(100)에 의해 각각의 분류 기반 GEC 모듈(108)에 할당된 리소스는 동일하지 않을 수 있다는 것도 이해되어야 한다. 예를 들어, 소정의 사용자 집단(cohort) 내에서 또는 특정 사용자에 대하여 각각의 문법 오류 유형이 발생한 빈도에 따라, 목표 단어 디스패칭 모듈(106)은 가장 빈번하게 발생된 문법 오류 유형에 대한 목표 단어를 가장 높은 우선 순위로 디스패치할 수 있다. 큰 크기, 예를 들어, 많은 수의 문장 및/또는 각각의 문장 내의 많은 수의 목표 단어를 갖는 입력 텍스트(116)에 대하여, 목표 단어 디스패칭 모듈(106)은 대기 시간(latency)을 감소시키기 위하여 각각의 분류 기반 GEC 모듈(108)의 작업 부하의 관점에서 최적 방식으로 각각의 문장 내의 각각의 목표 단어의 처리를 스케줄링할 수 있다.The target word dispatching module 106 is configured to dispatch each target word to the classification-based GEC module 108 for the corresponding grammatical error type. In some embodiments, for each grammatical error type, the ANN model 120 is independently trained and used by the corresponding classification-based GEC module 108. Accordingly, each classification-based GEC module 108 is associated with a particular grammatical error type, and configured to handle target words for the same grammatical error type. For example, for a target word that is a preposition (for the preposition error type), the target word may send the preposition to the classification-based GEC module 108 to deal with the preposition error. It should be appreciated that since one word may be determined as the target word for multiple grammatical error types, the target word dispatching module 106 may send the same target word to multiple classification-based GEC modules 108. It should also be appreciated that, in some embodiments, the resources allocated by the GEC system 100 to each classification-based GEC module 108 may not be the same. For example, according to the frequency with which each grammatical error type occurs within a certain user cohort or for a specific user, the target word dispatching module 106 determines the target word for the most frequently occurring grammatical error type. can be dispatched with the highest priority. For input text 116 having a large size, e.g., a large number of sentences and/or a large number of target words within each sentence, the target word dispatching module 106 may reduce latency. to schedule the processing of each target word in each sentence in an optimal way in terms of the workload of each classification-based GEC module 108.

각각의 분류 기반 GEC 모듈(108)은 대응하는 문법 오류 유형에 대하여 훈련된 대응하는 ANN 모델(120)을 포함한다. 분류 기반 GEC 모듈(108)은 대응하는 ANN 모델(120)을 이용하여 대응하는 문법 오류 유형에 대하여 목표 단어의 분류를 추정하도록 구성된다. 아래에서 상세하게 설명되는 바와 같이, 일부 실시예에서, ANN 모델(120)은 문장 내의 목표 단어 전의 적어도 하나의 단어 및 목표 단어 후의 적어도 하나에 기초하여 목표 단어의 문맥 벡터(context vector)를 출력하도록 구성된 2개의 순환 신경망(recurrent neural network)을 포함한다. ANN 모델(120)은 목표 단어의 문맥 벡터에 기초하여 문법 오류 유형에 대한 목표 단어의 분류값을 출력하도록 구성된 피드포워드 신경망(feedforward neural network)을 더 포함한다.Each classification-based GEC module 108 includes a corresponding ANN model 120 trained for the corresponding grammatical error type. The classification-based GEC module 108 is configured to estimate the classification of the target word for the corresponding grammatical error type using the corresponding ANN model 120. As described in detail below, in some embodiments, ANN model 120 is configured to output a context vector of a target word based on at least one word before and at least one after the target word in a sentence. It includes two recurrent neural networks constructed. The ANN model 120 further includes a feedforward neural network configured to output classification values of the target word for grammatical error types based on the context vector of the target word.

분류 기반 GEC 모듈(108)은 목표 단어 및 목표 단어의 추정된 분류에 기초하여 문장에서의 문법 오류를 검출하도록 더 구성된다. 위에서 설명된 바와 같이, 일부 실시예에서, 각각의 목표 단어의 실제 분류는 파싱 모듈(104)에 의해 결정될 수 있다. 그 다음, 분류 기반 GEC 모듈(108)은 목표 단어의 추정된 분류를 목표 단어의 실제 분류와 비교하고, 실제 분류가 목표 단어의 추정된 분류와 일치하지 않을 때 문장 내의 문법 오류를 검출할 수 있다. 예를 들어, 소정의 문법 오류 유형에 대하여, 대응하는 ANN 모델(120)은 목표 단어를 둘러싸는 가변 길이 문맥의 임베딩(embedding) 함수를 학습할 수 있으며, 대응하는 분류 기반 GEC 모듈(108)은 문맥 임베딩을 이용하여 목표 단어의 분류를 예측할 수 있다. 예측된 분류 라벨이 목표 단어의 원 라벨과 상이하다면, 목표 단어는 오류로서 표시될 수 있고, 예측은 정정으로서 사용될 수 있다.The classification-based GEC module 108 is further configured to detect grammatical errors in the sentence based on the target word and the estimated classification of the target word. As described above, in some embodiments, the actual classification of each target word may be determined by parsing module 104 . The classification-based GEC module 108 then compares the estimated classification of the target word with the actual classification of the target word, and may detect grammatical errors in the sentence when the actual classification does not match the estimated classification of the target word. . For example, for a given type of grammatical error, the corresponding ANN model 120 can learn an embedding function of the variable-length context surrounding the target word, and the corresponding classification-based GEC module 108 can The classification of the target word can be predicted using context embedding. If the predicted classification label differs from the target word's original label, the target word may be marked as an error, and the prediction may be used as a correction.

도 1에 도시된 바와 같이, 일부 실시예에서, 다양한 문법 오류 유형에 대한 문법 오류를 동시에 검출하기 위하여 다수의 분류 기반 GEC 모듈(108)이 GEC 시스템(100)에서 병렬로 적용될 수 있다. 위에서 설명된 바와 같이, GEC 시스템(100)의 리소스는 각각의 문법 오류 유형의 발생 빈도에 기초하여 상이한 문법 오류 유형에 할당될 수 있다. 예를 들어, 다른 것보다 더 자주 발생하는 문법 오류 유형을 다루기 위하여 더 많은 계산 리소스가 GEC 시스템(100)에 의해 할당될 수 있다. 리소스의 할당은 각각의 분류 기반 GEC 모듈(108)의 작업 부하 및/또는 빈도 변화의 관점에서 동적으로 조정될 수 있다.As shown in FIG. 1 , in some embodiments, multiple classification-based GEC modules 108 may be applied in parallel in the GEC system 100 to simultaneously detect grammatical errors for various grammatical error types. As described above, resources of the GEC system 100 may be allocated to different grammatical error types based on the frequency of occurrence of each grammatical error type. For example, more computational resources may be allocated by the GEC system 100 to handle types of grammar errors that occur more frequently than others. The allocation of resources may be dynamically adjusted in view of changes in workload and/or frequency of each classification-based GEC module 108 .

기계 번역 기반 GEC 모듈(110)은 구문 기반 기계 번역, 신경망 기반 기계 번역 등과 같은 통계적 기계 번역에 기초하여 각각의 문장에서 하나 이상의 문법 오류를 검출하도록 구성된다. 일부 실시예에서, 기계 번역 기반 GEC 모듈(110)은 문장에 대하여 확률을 할당하는 언어 서브 모듈과 조건부 확률을 할당하는 번역 서브 모듈을 갖는 모듈을 포함한다. 언어 서브 모듈은 목표 언어에 설정된 단일 언어(monolingual) 훈련 데이터를 이용하여 훈련될 수 있다. 번역 서브 모듈의 파라미터들은 병렬 훈련 데이터 세트, 즉 외국어 문장들과 목표 언어로의 이들의 대응하는 번역의 세트로부터 추정될 수 있다. GEC 시스템(100)의 파이프라인 아키텍처에서, 기계 번역 기반 GEC 모듈(110)이 분류 기반 GEC 모듈(108)의 출력에 적용될 수 있거나, 분류 기반 GEC 모듈(108)이 기계 번역 기반 GEC 모듈(110)에 적용될 수 있다는 것이 이해되어야 한다. 또한, 일부 실시예에서, 기계 번역 기반 GEC 모듈(110)을 파이프라인에 추가함으로써, 기계 번역 기반 GEC 모듈(110)이 능가할 수 있는 소정의 분류 기반 GEC 모듈(108)은 파이프라인에 포함되지 않을 수 있다.Machine translation-based GEC module 110 is configured to detect one or more grammatical errors in each sentence based on statistical machine translation, such as phrase-based machine translation, neural network-based machine translation, and the like. In some embodiments, machine translation-based GEC module 110 includes a module having a language submodule that assigns probabilities to sentences and a translation submodule that assigns conditional probabilities. The language submodule may be trained using monolingual training data set in the target language. Parameters of the translation sub-module can be estimated from a parallel training data set, i.e. a set of foreign language sentences and their corresponding translations into the target language. In the pipelined architecture of the GEC system 100, a machine translation based GEC module 110 may be applied to the output of a classification based GEC module 108, or a classification based GEC module 108 may be applied to a machine translation based GEC module 110. It should be understood that it can be applied to Additionally, in some embodiments, by adding a machine translation-based GEC module 110 to the pipeline, certain classification-based GEC modules 108 that the machine translation-based GEC module 110 may outperform are not included in the pipeline. may not be

규칙 기반 GEC 모듈(112)은 미리 정의된 규칙에 기초하여 각각의 문장에서 하나 이상의 문법 오류를 검출하도록 구성된다. 파이프라인에서의 규칙 기반 GEC 모듈(112)의 위치가 도 1에 도시된 마지막으로 한정되지 않고 제1 검출 모듈로서 파이프라인의 시작에 또는 분류 기반 GEC 모듈(108) 및 기계 번역 기반 GEC 모듈(110) 사이 있을 수 있다는 것이 이해되어야 한다. 또한, 일부 실시예에서, 구두점, 철자 및 대소문자 오류와 같은 다른 기계적 오류가 규칙 기반 GEC 모듈(112)에 의해 미리 정해진 규칙을 이용하여 검출되고 정정될 수 있다.The rule-based GEC module 112 is configured to detect one or more grammatical errors in each sentence based on predefined rules. The location of the rule-based GEC module 112 in the pipeline is not limited to the last one shown in FIG. 1 and at the beginning of the pipeline as the first detection module or the classification-based GEC module 108 and the machine translation-based GEC module 110 ), it should be understood that there may be between Additionally, in some embodiments, other mechanical errors such as punctuation, spelling, and capitalization errors may be detected and corrected using predetermined rules by the rules-based GEC module 112.

채점/정정 모듈(114)은 파이프라인으로부터 수신된 문법 오류 결과에 기초하여 입력 텍스트(116)의 정정된 텍스트 및/또는 문법 점수(122)를 제공하도록 구성된다. 분류 기반 GEC 모듈(108)을 예로 들면, 추정된 분류가 실제 분류와 일치하지 않아 문법 오류를 갖는 것으로 검출된 각각의 목표 단어에 대하여, 목표 단어의 문법 오류 정정은 목표 단어의 추정된 분류에 기초하여 채점/정정 모듈(114)에 의해 제공될 수 있다. 입력 텍스트(116)를 평가하기 위하여, 채점/정정 모듈(114)은 또한 채점 기능을 이용하여 파이프라인으로부터 수신된 문법 오류 결과에 기초하여 문법 점수(122)를 제공할 수 있다. 일부 실시예에서, 채점 기능은 상이한 유형의 문법 오류가 문법 점수(122)에 대하여 상이한 레벨의 영향을 가질 수 있도록 각각의 문법 오류 유형에 가중치를 할당할 수 있다. 가중치는 문법 오류 결과를 평가하는데 있어서 가중치가 주어진 인자로서 정밀도(precision) 및 리콜(recall)에 할당될 수 있다. 일부 실시예에서, 입력 텍스트(116)를 제공하는 사용자도 채점 기능에 의해 고려될 수 있다. 예를 들어, 가중치는 상이한 사용자에게 대하여 상이할 수 있거나, 또는 사용자의 정보(예를 들어, 모국어, 거주지, 교육 수준, 과거의 점수, 나이 등)가 채점 기능에 고려될 수 있다.The scoring/correction module 114 is configured to provide a corrected text and/or grammar score 122 of the input text 116 based on the grammatical error results received from the pipeline. Taking the classification-based GEC module 108 as an example, for each target word detected as having a grammatical error because the estimated classification does not match the actual classification, grammatical error correction of the target word is performed based on the estimated classification of the target word. and may be provided by the scoring/correction module 114. To evaluate the input text 116, the scoring/correction module 114 may also use a scoring function to provide a grammar score 122 based on grammatical error results received from the pipeline. In some embodiments, the scoring function may assign a weight to each type of grammatical error such that different types of grammatical errors may have different levels of impact on the grammar score 122 . Weights can be assigned to precision and recall as weighted factors in evaluating grammatical error results. In some embodiments, users who provide input text 116 may also be considered by the scoring function. For example, the weights may be different for different users, or the user's information (eg, native language, residence, education level, past scores, age, etc.) may be taken into account in the scoring function.

도 2는 도 1의 GEC 시스템(100)에 의해 수행되는 자동화된 문법 오류 정정의 일례에 대한 도면이다. 도 2에 도시된 바와 같이, 입력 텍스트(202)는 복수의 문장을 포함하고 사용자 ID(1234)의 의해 식별되는 사용자로부터 수신된다. 각각이 대응하는 문법 오류 유형에 대하여 개별적으로 훈련되는 복수의 ANN 모델(120)을 갖는 GEC 시스템(100)을 통과한 후에, 문법 점수를 갖는 정정된 텍스트(204)가 사용자에게 제공된다. 예를 들어, 입력 텍스트(202)에서의 문장 "it will just adding on their misery"에서, 동사 "adding"이 GEC 시스템(100)에 의해 동사 형태 오류에 대한 목표 단어로서 식별된다. 목표 단어 "adding"의 실제 분류는 동명사 또는 현재 분사이다. GEC 시스템(100)은 동사 형태 오류에 대하여 훈련된 ANN 모델(120)을 적용하고 목표 단어 "adding"의 분류가 기본형 "add"라고 추정한다. 추정된 분류가 목표 단어 "adding"의 실제 분류와 일치하지 않기 때문에, 동사 형태 문법 오류가 GEC 시스템(100)에 의해 검출되고, 이는 사용자의 개인 정보 및/또는 동사 형태 오류 유형에 적용된 가중치의 관점에서 문법 점수에 영향을 미친다. 또한, 목표 단어 "adding"의 추정된 분류는 정정된 텍스트(204)에서 "adding"을 교체하도록 정정 "add"를 제공하기 위하여 GEC 시스템(100)에 의해 사용된다. 동사 형태 오류에 대한 동일한 ANN 모델(120)이 입력 텍스트(202)에서 "dishearten"과 같은 다른 동사 형태 오류를 검출하고 "disheartening"과 같이 정정하기 위하여 GEC 시스템(100)에 의해 사용된다. 다른 문법 오류 유형에 대한 ANN 모델(120)이 다른 유형의 문법 오류를 검출하기 위하여 GEC 시스템(100)에 의해 사용된다. 예를 들어, 전치사 오류에 대한 ANN 모델(120)이 입력 텍스트(202)에서 "for" 및 "to"와 같은 전치사 오류를 검출하고 "in" 및 "on"과 같이 정정하기 위하여 GEC 시스템(100)에 의해 사용된다.FIG. 2 is a diagram of an example of automated grammatical error correction performed by the GEC system 100 of FIG. 1 . As shown in FIG. 2 , input text 202 includes a plurality of sentences and is received from a user identified by a user ID 1234 . After passing through the GEC system 100 having a plurality of ANN models 120, each individually trained for a corresponding type of grammatical error, the corrected text 204 with a grammatical score is presented to the user. For example, in the sentence “it will just add on their misery” in the input text 202, the verb “adding” is identified by the GEC system 100 as the target word for the verb form error. The actual classification of the target word "adding" is either a gerund or a present participle. The GEC system 100 applies the trained ANN model 120 for verb morphological errors and assumes that the classification of the target word "adding" is the base form "add". Because the estimated classification does not match the actual classification of the target word "adding", a grammatical error in the verb form is detected by the GEC system 100, in view of the user's personal information and/or the weight applied to the type of verb form error. affects grammar scores in The presumed classification of the target word "adding" is also used by the GEC system 100 to provide a correction "add" to replace "adding" in the corrected text 204 . The same ANN model 120 for verb form errors is used by the GEC system 100 to detect other verb form errors, such as “dishearten” in input text 202, and correct them, such as “disheartening.” ANN models 120 for different types of grammatical errors are used by the GEC system 100 to detect different types of grammatical errors. For example, an ANN model for prepositional errors 120 detects prepositional errors such as "for" and "to" in input text 202 and corrects them such as "in" and "on" in the GEC system 100. ) is used by

도 3은 일 실시예에 따른 문법 오류 정정 방법(300)의 일례를 도시하는 순서도이다. 방법(300)은 하드웨어(예를 들어, 회로, 전용 로직, 프로그래머블 로직, 마이크로 코드 등), 소프트웨어(예를 들어, 처리 장치 상에서 실행되는 명령어) 또는 이들의 조합을 포함할 수 있는 처리 로직에 의해 수행될 수 있다. 모든 단계들이 본 명세서에 제공된 개시 내용을 수행하는데 필요하지 않을 수 있다는 것이 이해되어야 한다. 또한, 당해 업계의 통상의 기술자에 의해 이해되는 바와 같이, 단계들의 일부는 동시에 수행되거나, 또는 도 3에 도시된 것과 다른 순서로 수행될 수 있다.3 is a flowchart illustrating an example of a grammatical error correction method 300 according to an embodiment. The method 300 may be implemented by processing logic, which may include hardware (eg, circuitry, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or a combination thereof. can be performed It should be understood that not all steps may be necessary to carry out the disclosure provided herein. Also, as will be appreciated by those skilled in the art, some of the steps may be performed concurrently or in a different order than shown in FIG. 3 .

방법(300)은 도 1을 참조하여 설명될 것이다. 그러나, 방법(300)은 그 예시적인 실시예에 한정되지 않는다. 302에서, 입력 텍스트가 수신된다. 입력 텍스트는 적어도 하나의 문장을 포함한다. 입력 텍스트는, 예를 들어, 손글씨, 타이핑 또는 복사/붙여넣기를 통해 직접적으로, 또는, 예를 들어, 음성 인식 또는 화상 인식으로부터 간접적으로 수신될 수 있다. 304에서, 수신된 입력 텍스트는, 문장들로 분할되는 것과 같이 전처리, 즉 텍스트 토큰화될 수 있다. 일부 실시예에서, 전처리는 입력 테스트가 소문자로 되도록 대문자를 소문자로 변환하는 것으로 포함할 수 있다. 일부 실시예에서, 전처리는 어휘 데이터베이스(118)에 있지 않은 입력 텍스트 내의 임의의 토큰을 식별하고 특수 토큰으로서 이를 나타내는 것을 포함할 수 있다. 302 및 304는 GEC 시스템(100)의 입력 전처리 모듈(102)에 의해 수행될 수 있다.Method 300 will be described with reference to FIG. 1 . However, method 300 is not limited to its exemplary embodiment. At 302, input text is received. The input text includes at least one sentence. The input text may be received directly, eg through handwriting, typing or copy/pasting, or indirectly, eg from voice recognition or image recognition. At 304, the received input text may be pre-processed, ie text tokenized, such as being split into sentences. In some embodiments, preprocessing may include converting uppercase letters to lowercase letters so that the input test is lowercase. In some embodiments, preprocessing may include identifying any tokens in the input text that are not in vocabulary database 118 and marking them as special tokens. 302 and 304 may be performed by the input pre-processing module 102 of the GEC system 100.

306에서, 전처리된 입력 텍스트는 각각의 문장 내의 하나 이상의 목표 단어를 식별하기 위하여 파싱된다. 목표 단어는 각각의 목표 단어가 적어도 하나의 문법 오류 유형에 대응하도록 문법 오류 유형에 기초하여 텍스트 토큰으로부터 식별될 수 있다. 문법 오류 유형은 관사 오류, 주격 관련 일치 오류, 동사 형태 오류, 전치사 오류 및 명사 수 오류를 포함하지만 이에 한정되지 않는다. 일부 실시예에서, 대응하는 문법 오류 유형에 대한 각각의 목표 단어의 실제 분류가 결정된다. 결정은, 예를 들어, PoS 태그 및 문장 내의 텍스트 토큰에 기초하여, 자동으로 이루어질 수 있다. 일부 실시예에서, 목표 단어 식별 및 실제 분류 결정은 Stanford corenlp 툴과 같은 NLP 툴에 의해 수행될 수 있다. 306은 GEC 시스템(100)의 파싱 모듈(104)에 의해 수행될 수 있다.At 306, the preprocessed input text is parsed to identify one or more target words within each sentence. Target words may be identified from the text tokens based on grammatical error types such that each target word corresponds to at least one grammatical error type. Types of grammatical errors include, but are not limited to, article errors, nominative agreement errors, verb morphological errors, preposition errors, and noun number errors. In some embodiments, the actual classification of each target word for the corresponding grammatical error type is determined. Decisions can be made automatically, for example based on PoS tags and text tokens in sentences. In some embodiments, target word identification and actual classification determination may be performed by an NLP tool such as the Stanford corenlp tool. 306 may be performed by the parsing module 104 of the GEC system 100.

308에서, 각각의 목표 단어는 대응하는 분류 기반 GEC 모듈(108)에 디스패치된다. 각각의 분류 기반 GEC 모듈(108)은, 예를 들어, 원시 훈련 샘플(native training sample)에 관하여 대응하는 문법 오류 유형에 대하여 훈련된 ANN 모델(120)을 포함한다. 308은 GEC 시스템(100)의 목표 단어 디스패칭 모듈(106)에 의해 수행될 수 있다. 310에서, 각각의 문장 내의 하나 이상의 문법 오류가 ANN 모델(120)을 이용하여 검출된다. 일부 실시예에서, 각각의 목표 단어에 대하여, 대응하는 문법 오류 유형에 대한 목표 단어의 분류는 대응하는 ANN 모델(120)을 이용하여 추정될 수 있다. 그 다음, 문법 오류는 목표 단어와 목표 단어의 추정된 분류에 기초하여 검출될 수 있다. 예를 들어, 추정이 원 라벨과 상이하고 확률이 미리 정의된 임계값보다 더 크다면, 문법 오류가 발견될 것으로 여겨진다. 310은 GEC 시스템(100)의 분류 기반 GEC 모듈(108)에 의해 수행될 수 있다.At 308, each target word is dispatched to the corresponding classification-based GEC module 108. Each classification-based GEC module 108 includes, for example, an ANN model 120 trained for the corresponding grammatical error type with respect to native training samples. 308 may be performed by the target word dispatching module 106 of the GEC system 100. At 310, one or more grammatical errors within each sentence are detected using ANN model 120. In some embodiments, for each target word, the classification of the target word for the corresponding grammatical error type may be estimated using the corresponding ANN model 120 . Grammar errors can then be detected based on the target word and the estimated classification of the target word. For example, if the estimate differs from the original label and the probability is greater than a predefined threshold, a grammatical error is considered to be found. 310 may be performed by the classification-based GEC module 108 of the GEC system 100.

312에서, 각각의 문장에서의 하나 이상의 문법 오류가 기계 번역을 이용하여 검출될 수 있다. 312는 GEC 시스템(100)의 기계 번역 기반 GEC 모듈(110)에 의해 수행될 수 있다. 314에서, 각각의 문장에서의 하나 이상의 문법 오류가 미리 정의된 규칙에 의하여 검출될 수 있다. 314는 GEC 시스템(100)의 규칙 기반 GEC 모듈(112)에 의해 수행될 수 있다. 일부 실시예에서, GEC 시스템(100)의 수행을 더 개선하기 위하여 파이프라인 아키텍처가 임의의 적합한 기계 번역 및/또는 사전 정의 규칙 기반 방법을 본 명세서에 설명된 분류 기반 방법과 결합하는데 사용될 수 있다.At 312, one or more grammatical errors in each sentence may be detected using machine translation. 312 may be performed by the machine translation-based GEC module 110 of the GEC system 100. At 314, one or more grammatical errors in each sentence may be detected according to predefined rules. 314 may be performed by the rule-based GEC module 112 of the GEC system 100. In some embodiments, a pipelined architecture may be used to combine any suitable machine translation and/or predefined rule-based methods with the classification-based methods described herein to further improve the performance of GEC system 100.

316에서, 검출된 문법 오류에 대한 정정 및/또는 입력 텍스트의 문법 점수가 제공된다. 일부 실시예에서, 대응하는 문법 오류 유형에 기초하여 가중치가 목표 단어의 각각의 문법 오류 결과에 적용될 수 있다. 각각의 문장의 문법 점수는 각각의 문법 오류 결과에 적용된 가중치뿐만 아니라 문법 오류 결과 및 문장 내의 목표 단어에 기초하여 결정된다. 또한, 일부 실시예에서, 문법 점수는 문장이 수신되는 사용자와 연관된 정보에 기초하여 제공될 수 있다. 검출된 문법 오류에 대한 정정에 관하여, 일부 실시예에서, 대응하는 문법 오류 유형에 대한 목표 단어의 추정된 분류가 정정을 생성하는데 사용될 수 있다. 정정 및 문법 점수가 반드시 함께 제공될 필요가 없다는 것이 이해되어야 한다. 316은 GEC 시스템(100)의 채점/정정 모듈(114)에 의해 수행될 수 있다.At 316, corrections for detected grammatical errors and/or grammar scores of the input text are provided. In some embodiments, a weight may be applied to each grammatical error result in the target word based on the corresponding grammatical error type. The grammatical score of each sentence is determined based on the grammatical error result and the target word in the sentence, as well as the weight applied to each grammatical error result. Additionally, in some embodiments, a grammar score may be provided based on information associated with the user from whom the sentence is received. Regarding corrections to detected grammatical errors, in some embodiments, an estimated classification of the target word for the corresponding type of grammatical error may be used to generate the correction. It should be understood that correction and grammar scores do not necessarily have to be provided together. 316 may be performed by the scoring/correction module 114 of the GEC system 100.

도 4는 일 실시예에 따른 도 1의 GEC 시스템(100)의 분류 기반 GEC 모듈(108)의 일례를 도시하는 블록도이다. 위에서 설명된 바와 같이, 분류 기반 GEC 모듈(108)은 문장(402) 내의 목표 단어를 수신하고 목표 단어의 대응하는 문법 오류 유형에 대한 ANN 모델(120)을 이용하여 목표 단어의 분류를 추정하도록 구성된다. 또한, 문장(402) 내의 목표 단어는 목표 단어 라벨링 유닛(404)(예를 들어, 파싱 모듈(104) 내)에 의해 수신된다. 목표 단어 라벨링 유닛(404)은, 예를 들어, PoS 태그 또는 문장(402) 내의 텍스트 토큰에 기초하여 목표 단어의 실제 분류(예를 들어, 원 라벨)을 결정하도록 구성된다. 분류 기반 GEC 모듈(108)은 목표 단어의 추정된 분류 및 실제 분류에 기초하여 문법 오류 결과를 제공하도록 더 구성된다. 도 4에 도시된 바와 같이, 분류 기반 GEC 모듈(108)은 초기 문맥 생성 유닛(406), 심층 문맥 표현 유닛(408), 분류 유닛(410), 어텐션(attention) 유닛(412) 및 분류 비교 유닛(414)을 포함한다.4 is a block diagram illustrating an example of a classification-based GEC module 108 of the GEC system 100 of FIG. 1 according to one embodiment. As described above, the classification-based GEC module 108 is configured to receive a target word in sentence 402 and estimate the classification of the target word using the ANN model 120 for the target word's corresponding grammatical error type. do. Also, the target word in the sentence 402 is received by the target word labeling unit 404 (eg, in the parsing module 104). The target word labeling unit 404 is configured to determine the actual classification (eg original label) of the target word based on, for example, a PoS tag or text token within the sentence 402 . Classification-based GEC module 108 is further configured to provide grammatical error results based on the estimated classification and the actual classification of the target word. As shown in FIG. 4 , the classification-based GEC module 108 includes an initial context creation unit 406, a deep context representation unit 408, a classification unit 410, an attention unit 412 and a classification comparison unit. (414).

초기 문맥 생성 유닛(406)은 문장(402) 내의 목표 단어(문맥 단어)를 둘러싸는 단어들에 기초하여 목표 단어의 복수의 초기 문맥 벡터(초기 문맥 매트릭스) 세트를 생성하도록 구성된다. 일부 실시예에서, 초기 문맥 벡터 세트는 문장(402) 내의 목표 단어 전의 적어도 하나의 단어(포워드 문맥 단어)에 기초하여 생성된 포워드 초기 문맥 벡터(포워드 초기 문맥 매트릭스) 세트 및 문장(402) 내의 목표 단어 후의 적어도 하나의 단어(백워드 문맥 단어)에 기초하여 생성된 백워드 초기 문맥 벡터(백워드 초기 문맥 매트릭스) 세트를 포함한다. 각각의 초기 문맥 벡터는 문장 내의 하나의 문맥 단어를 나타낸다. 일부 실시예에서, 초기 문맥 벡터는 원-핫(one-hot) 벡터의 크기(디멘전(dimension))가 어휘 크기(예를 들어, 어휘 데이터베이스(118)에서의)와 동일하도록 원-핫 인코딩에 기초하여 단어를 표현하는 원-핫 벡터일 수 있다. 일부 실시예에서, 초기 문맥 벡터는 문맥 단어의 단어 임베딩 벡터(word embedding vector)와 같은 어휘 크기보다 작은 디멘전을 갖는 저디멘전 벡터일 수 있다. 예를 들어, 단어 임베딩 벡터는 word2vec 또는 Glove와 같지만 이에 한정되지 않는 임의의 적합한 포괄적 단어 임베딩 접근 방식에 의해 생성될 수 있다. 일부 실시예에서, 초기 문맥 생성 유닛(406)은 하나 이상의 초기 문맥 벡터 세트를 출력하도록 구성된 하나 이상의 순환 신경망을 사용할 수 있다. 초기 문맥 생성 유닛(406)에 의해 사용되는 순환 신경망(들)은 ANN 모델(120)의 일부일 수 있다.The initial context generating unit 406 is configured to generate sets of a plurality of initial context vectors (initial context matrices) of the target word based on words surrounding the target word (context words) in the sentence 402 . In some embodiments, the initial context vector set is a target within sentence 402 and a set of forward initial context vectors (forward initial context matrix) generated based on at least one word before the target word in sentence 402 (forward context word). and a set of backward initial context vectors (backword initial context matrices) generated based on at least one word after the word (backword context words). Each initial context vector represents one context word in the sentence. In some embodiments, the initial context vector is one-hot encoded such that the size (dimension) of the one-hot vector is equal to the vocabulary size (eg, in vocabulary database 118). It can be a one-hot vector representing a word based on . In some embodiments, the initial context vector may be a low-dimensional vector with dimensions smaller than the vocabulary size, such as the word embedding vectors of the context words. For example, word embedding vectors may be generated by any suitable global word embedding approach, such as but not limited to word2vec or Glove. In some embodiments, initial context generation unit 406 may use one or more recurrent neural networks configured to output one or more sets of initial context vectors. The recurrent neural network(s) used by initial context creation unit 406 may be part of ANN model 120 .

포워드 또는 백워드 초기 문맥 벡터 세트를 생성하기 위하여 사용되는 문맥 단어의 수는 제한되지 않는다는 것이 이해되어야 한다. 일부 실시예에서, 포워드 초기 문맥 벡터 세트는 문장(402) 내의 목표 단어) 전의 모든 단어에 기초하여 생성되고, 백워드 초기 문맥 벡터 세트는 문장(402) 내의 목표 단어 후의 모든 단어에 기초하여 생성된다. 각각의 분류 기반 GEC 모듈(108) 및 대응하는 ANN 모델(120)이 특정 문법 오류 유형을 다루고, 상이한 유형의 문법 오류의 정정이 상이한 단어 거리로부터의 종속성을 필요로 하기 때문에(예를 들어, 전치사는 목표 단어 근처의 단어에 의해 결정되는 반면, 동사의 상태에는 동사로부터 멀리 있는 주어가 영향을 미칠 수 있다), 일부 실시예에서, 포워드 또는 백워드 초기 문맥 벡터 세트를 생성하는데 사용되는 문맥 단어의 수(즉, 윈도우 크기)는 분류 기반 GEC 모듈(108) 및 대응하는 ANN 모델(120)과 연관된 문법 오류 유형에 기초하여 결정될 수 있다.It should be understood that the number of context words used to create either the forward or backward initial context vector set is not limited. In some embodiments, a set of forward initial context vectors is generated based on all words before the target word in sentence 402 and a set of backward initial context vectors is generated based on all words after the target word in sentence 402 . Since each classification-based GEC module 108 and corresponding ANN model 120 deal with a specific type of grammatical error, and correction of different types of grammatical errors requires dependencies from different word distances (e.g., prepositions is determined by the words near the target word, while the state of the verb may be affected by a subject far from the verb), in some embodiments, forward or backward of context words used to generate a set of initial context vectors. The number (i.e., window size) may be determined based on the type of grammatical error associated with the classification-based GEC module 108 and the corresponding ANN model 120.

일부 실시예에서, 초기 문맥 벡터는 목표 단어 자체의 단어 기본형(lemma)에 기초하여 생성될 수 있다. 단어 기본형은 단어의 기본 형태이다(예를 들어, "walk", "walks", "walked", "walking"인 단어들은 모두 동일한 단어 기본형 "walk"를 가진다). 예를 들어, 문맥 단어(즉, 문장(402) 내의 목표 단어를 둘러싸는 단어들)에 더하여 명사 수 오류와 연관되는 분류 기반 GEC 모듈(108) 및 대응하는 ANN 모델(120)에 대하여, 목표 단어가 단수 형태인지 또는 복수 형태인지의 여부가 그 자체에 밀접하게 관련되기 때문에, 목표 명사 단어의 단어 기본형 형태가 추출 문맥 정보로서 초기 단어 기본형 문맥 벡터의 형태로 도입될 수 있다. 일부 실시예에서, 목표 단어의 단어 기본형의 초기 문맥 벡터는 포워드 초기 문맥 벡터 세트의 일부 또는 백워드 초기 문맥 벡터 세트의 일부일 수 있다.In some embodiments, an initial context vector may be created based on the word lemma of the target word itself. A word base form is a base form of a word (eg, the words "walk", "walks", "walked", and "walking" all have the same word base form "walk"). For example, for classification-based GEC module 108 and corresponding ANN model 120 associated with noun count errors in addition to context words (i.e., words surrounding the target word in sentence 402), the target word Since whether is a singular form or a plural form is closely related to itself, the word base form form of the target noun word can be introduced as extraction context information in the form of an initial word base form context vector. In some embodiments, the initial context vector of the word lemma of the target word may be part of a set of forward initial context vectors or part of a set of backward initial context vectors.

일부 알려진 GEC 시스템에서, 언어의 복잡성 때문에 모든 상황을 커버하기 어려운 특징 벡터(feature vector)를 생성하기 위하여 의미 특징(semantic feature)이 설계되어 문장으로부터 수동으로 추출될 필요가 있으며, 이는 언어의 복잡성 때문에 모든 상황을 커버하기는 어렵다. 대조적으로, 문장(402) 내의 목표 단어의 문맥 단어가 초기 문맥 정보(예를 들어, 초기 문맥 벡터 형태)로서 직접 사용되기 때문에, 복잡한 특징 엔지니어링이 본 명세서에 개시된 분류 기반 GEC 모듈(108)에 의해 요구되지 않으며, 아래에서 상세히 설명되는 바와 같이, 심층 문맥 특징 표현이 종단간 방식으로 분류와 공동으로 학습될 수 있다.In some known GEC systems, semantic features need to be designed and extracted manually from sentences in order to generate feature vectors that are difficult to cover all situations because of the complexity of the language. It is difficult to cover all situations. In contrast, since the context words of target words in sentence 402 are used directly as initial context information (e.g., in the form of initial context vectors), complex feature engineering is performed by the classification-based GEC module 108 disclosed herein. It is not required, and as detailed below, deep contextual feature representations can be jointly learned with classification in an end-to-end manner.

도 5를 참조하면, 본 예에서, 문장은 목표 단어 i를 포함하는 n개의 단어 1 내지 n으로 이루어진다. 목표 단어 i 전의 각각의 단어, 즉 단어 1, 단어 2, ... 또는 단어 i-1에 대하여, 대응하는 초기 문맥 벡터 1, 2 또는 i-1이 생성된다. 초기 문맥 벡터 1, 2, ... 및 i-1은 이들이 목표 단어 i 전의 단어로부터 생성되고 순방향으로(즉, 문장의 시작, 즉 맨 처음의 단어 1로부터) 이후 스테이지로 공급되기 때문에 "포워드" 벡터이다. 목표 단어 i 후의 각각의 단어, 즉 단어 i+1, 단어 i+2, ... 또는 단어 n에 대하여, 대응하는 초기 문맥 벡터 i+1, i+2 또는 n이 생성된다. 초기 문맥 벡터 n, ..., i+2 및 i+1은 이들이 목표 단어 i 후의 단어로부터 생성되고 역방향으로(즉, 문장의 마지막, 즉 마지막 단어 n으로부터) 이후 스테이지로 공급되기 때문에 "백워드" 벡터이다.Referring to FIG. 5 , in this example, a sentence consists of n words 1 to n including target word i. For each word before the target word i, namely word 1, word 2, ... or word i-1, a corresponding initial context vector 1, 2 or i-1 is created. The initial context vectors 1, 2, ... and i-1 are "forward" because they are generated from the words before the target word i and are fed to subsequent stages in the forward direction (i.e. from the beginning of the sentence, i.e. word 1 at the very beginning). It is a vector. For each word after target word i, namely word i+1, word i+2, ... or word n, a corresponding initial context vector i+1, i+2 or n is generated. The initial context vectors n, ..., i+2 and i+1 are "backwards" because they are generated from the words after the target word i and are fed to subsequent stages in the reverse direction (i.e. from the end of the sentence, i.e. from the last word n). "It's a vector.

본 예에서, 포워드 초기 문맥 벡터 세트는 단어 임베딩의 디멘전과 동일한 개수의 열(column) 과 목표 단어 i 전의 단어 수와 동일한 개수의 행(row)을 갖는 포워드 초기 문맥 매트릭스로서 표현될 수 있다. 포워드 초기 문맥 매트릭스에서의 첫 번째 행은 첫 번째 단어 1의 단어 임베딩 벡터일 수 있고, 포워드 초기 문맥 매트릭스에서의 마지막 행은 목표 단어 i 바로 전의 단어 i-1의 단어 임베딩 벡터일 수 있다. 백워드 초기 문맥 벡터 세트는 단어 임베딩 디멘전과 동일한 개수의 열과 목표 단어 i 후의 단어 수와 동일한 개수의 행을 갖는 백워드 초기 문맥 매트릭스로서 표현될 수 있다. 백워드 초기 문맥 매트릭스에서의 첫 번째 행은 마지막 단어 n의 단어 임베딩 벡터일 수 있고, 백워드 초기 문맥 매트릭스에서의 마지막 행은 목표 단어 i 바로 후의 단어 i+1의 단어 임베딩 벡터일 수 있다. 각각의 단어 임베딩 벡터의 디멘전의 수는 적어도 100, 예를 들어, 300일 수 있다. 또한, 본 예에서, 단어 기본형 초기 문맥 벡터(lem)(예를 들어, 단어 임베딩 벡터)는 목표 단어 i의 단어 기본형에 기초하여 생성될 수 있다.In this example, the set of forward initial context vectors can be represented as a forward initial context matrix with a number of columns equal to the dimension of the word embedding and a number of rows equal to the number of words before target word i. The first row of the forward initial context matrix may be the word embedding vector of the first word 1, and the last row of the forward initial context matrix may be the word embedding vector of word i-1 immediately preceding the target word i. The set of backward initial context vectors can be represented as a backward initial context matrix with the number of columns equal to the word embedding dimension and the number of rows equal to the number of words after the target word i. The first row in the backward initial context matrix may be the word embedding vector of the last word n, and the last row in the backward initial context matrix may be the word embedding vector of word i+1 immediately following the target word i. The number of dimensions of each word embedding vector may be at least 100, for example 300. Also, in this example, a word lemma initial context vector lem (eg, a word embedding vector) may be generated based on the word lemma of target word i.

도 4를 다시 참조하면, 심층 문맥 표현 유닛(408)은, ANN 모델(120)을 이용하여, 문장(402) 내의 문맥 단어, 예를 들어, 초기 문맥 생성 유닛(406)에 의해 생성된 포워드 및 백워드 초기 문맥 벡터 세트에 기초하여 목표 단어의 문맥 벡터를 제공하도록 구성된다. 분류 유닛(410)은, ANN 모델(120)을 이용하여, 문장(402) 내의 목표 단어의 심층 문맥 표현, 예를 들어, 심층 문맥 표현 유닛(408)에 의해 생성된 문맥 벡터에 기초하여 문법 오류 유형에 대한 목표 단어의 분류값을 제공하도록 구성된다.Referring back to FIG. 4 , the deep context representation unit 408, using the ANN model 120, uses the context words within the sentence 402, e.g., the forwards generated by the initial context generation unit 406 and and provide a context vector of the target word based on the set of backward initial context vectors. The classification unit 410, using the ANN model 120, generates grammatical errors based on the deep contextual representation of the target word in the sentence 402, for example, the contextual vectors generated by the deep contextual representation unit 408. and provide a classification value of the target word for the type.

도 6을 참조하면, ANN 모델(120)의 일례의 개략도가 일 실시예에 따라 도시된다. 본 예에서, ANN 모델(120)은 심층 문맥 표현 유닛(408)에 의해 사용될 수 있는 심층 문맥 표현 서브 모델(602)과 분류 유닛(410)에 의해 사용될 수 있는 분류 서브 모델(604)을 포함한다. 심층 문맥 표현 서브 모델(602)과 분류 서브 모델(604)은 종단간 방식으로 공동으로 훈련될 수 있다. 심층 문맥 표현 서브 모델(602)은 포워드 순환 신경망(606)과 백워드 순환 신경망(608)인 2개의 순환 신경망을 포함한다. 각각의 순환 신경망(606 또는 608)은 LSTM(long short-term memory) 신경망, 게이트 순환 유닛(gated recurrent unit(GRU)) 신경망 또는 은닉(hidden) 유닛 사이의 연결이 방향성 사이클(directed cycle)을 형성하는 임의의 다른 적합한 순환 신경망일 수 있다.Referring to FIG. 6 , a schematic diagram of an example of an ANN model 120 is shown according to one embodiment. In this example, the ANN model 120 includes a deep context representation sub-model 602, which can be used by the deep context representation unit 408, and a classification sub-model 604, which can be used by the classification unit 410. . The deep context representation sub-model 602 and the classification sub-model 604 can be jointly trained in an end-to-end manner. The deep context representation sub-model 602 includes two recurrent neural networks, a forward recurrent neural network 606 and a backward recurrent neural network 608. In each recurrent neural network 606 or 608, the connection between a long short-term memory (LSTM) neural network, a gated recurrent unit (GRU) neural network, or a hidden unit forms a directed cycle. may be any other suitable recurrent neural network.

순환 신경망(606, 608)은 문장(402) 내의 목표 단어의 문맥 단어로부터 생성된 초기 문맥 벡터에 기초하여 목표 단어의 문맥 벡터를 출력하도록 구성된다, 일부 실시예에서, 포워드 순환 신경망(606)은 포워드 초기 문맥 벡터 세트를 수신하고 포워드 초기 문맥 벡터 세트에 기초하여 목표 단어의 포워드 문맥 벡터를 제공하도록 구성된다. 포워드 순환 신경망(606)은 순방향으로 포워드 초기 문맥 벡터 세트를 공급받을 수 있다. 백워드 순환 신경망(608)은 백워드 초기 문맥 벡터 세트를 수신하고 백워드 초기 문맥 벡터 세트에 기초하여 목표 단어의 백워드 문맥 벡터를 제공하도록 구성된다. 백워드 순환 신경망(608)은 역방향으로 백워드 초기 문맥 벡터 세트를 공급받을 수 있다. 일부 실시예에서, 포워드 및 백워드 초기 문맥 벡터 세트는 위에서 설명된 바와 같은 단어 임베딩 벡터일 수 있다. 일부 실시예에서, 목표 단어의 단어 기본형 초기 문맥 벡터가 포워드 순환 신경망(606) 및/또는 백워드 순환 신경망(608)으로 공급되어 포워드 문맥 벡터 및/또는 백워드 문맥 벡터를 생성할 수 있다는 것이 이해되어야 한다.Recurrent neural networks 606 and 608 are configured to output context vectors of target words based on initial context vectors generated from context words of target words in sentence 402. In some embodiments, forward recurrent neural networks 606 and to receive a set of forward initial context vectors and provide a forward context vector of the target word based on the set of forward initial context vectors. Forward recurrent neural network 606 may be fed a set of forward initial context vectors in the forward direction. The backward recurrent neural network 608 is configured to receive a set of backward initial context vectors and provide a backward context vector of the target word based on the set of backward initial context vectors. The backward recurrent neural network 608 may be fed a set of backward initial context vectors in the reverse direction. In some embodiments, the set of forward and backward initial context vectors may be word embedding vectors as described above. It is understood that in some embodiments, the word base-type initial context vectors of the target word may be fed to forward recurrent neural network 606 and/or backward recurrent neural network 608 to generate forward context vectors and/or backward context vectors. It should be.

이제 도 5를 참조하면, 본 예에서, 포워드 순환 신경망은 순방향으로 포워드 초기 문맥 벡터 세트(예를 들어, 포워드 초기 문맥 매트릭스의 형태)를 공급받아 포워드 문맥 벡터 for를 생성한다. 백워드 순환 신경망은 역방향으로 백워드 초기 문맥 벡터 세트(예를 들어, 백워드 초기 문맥 매트릭스의 형태)를 공급받아 백워드 문맥 벡터 back을 생성한다. 일부 실시예에서, 단어 기본형 초기 문맥 벡터 lem이 포워드 순환 신경망 및/또는 백워드 순환 신경망으로 공급될 수 있다는 것이 이해되어야 한다. 포워드 및 백워드 순환 신경망의 각각 내의 은닉 유닛의 개수는 적어도 300개, 예를 들어 600개이다. 본 예에서, 그 다음, 목표 단어 i의 심층 문맥 벡터 i가 포워드 문맥 벡터 for 및 백워드 문맥 벡터 back을 연결(concatenating)하여 생성된다. 심층 문맥 벡터 i는 목표 단어 i를 둘러싸는 문맥 단어 i 내지 i-1 및 문맥 단어 i+1 내지n(그리고 일부 실시예에서는 목표 단어 i의 단어 기본형)에 기초하여 목표 단어 i의 심층 문맥 정보를 표현한다. 다른 말로 하면, 심층 문맥 벡터 i는 목표 단어 i 주위의 연합 문장 문맥(joint sentential context)의 임베딩으로서 고려될 수 있다. 위에서 설명된 바와 같이, 심층 문맥 벡터 i는 목표 단어 i의 문맥을 표현하기 위한 의미 특징을 수동으로 설계하고 추출하는데 복잡한 특징 엔지니어링이 필요 없기 때문에 다양한 상황을 다룰 수 있는 포괄적인 표현이다.Referring now to FIG. 5 , in this example, the forward recurrent neural network is fed a set of forward initial context vectors (e.g., in the form of a forward initial context matrix) in the forward direction and generates a forward context vector for. A backward recurrent neural network is fed a set of backward initial context vectors (e.g., in the form of a backward initial context matrix) in the backward direction and produces a backward context vector back. It should be appreciated that in some embodiments, the word base-type initial context vector lem may be fed into a forward recurrent neural network and/or a backward recurrent neural network. The number of hidden units in each of the forward and backward recurrent neural networks is at least 300, for example 600. In this example, the deep context vector i of target word i is then generated by concatenating the forward context vector for and the backward context vector back. The deep context vector i is the deep contextual information of target word i based on the context words i to i−1 and context words i+1 to n (and in some embodiments the word lemma of target word i) surrounding target word i. express In other words, the deep context vector i can be considered as an embedding of the joint sentential context around the target word i. As described above, the deep context vector i is a comprehensive expression that can handle a variety of situations because it does not require complex feature engineering to manually design and extract semantic features to express the context of target word i.

도 6을 다시 참조하면, 분류 서브 모델(604)은 목표 단어의 문맥 벡터에 기초하여 문법 오류 유형에 대한 목표 단어의 분류값을 출력하도록 구성된 피드포워드 신경망(610)을 포함한다. 피드포워드 신경망(610)은 다층 퍼셉트론(multi-layer perceptron(MLP)) 신경망 또는 은닉 유닛 사이의 연결이 사이클을 형성하지 않는 임의의 다른 적합한 피드포워드 신경망을 포함할 수 있다. 예를 들어, 도 5에 도시된 바와 같이, 심층 문맥 벡터 i가 피드포워드 신경망에 공급되어 목표 단어 i의 분류값(y)을 생성한다. 다른 문법 오류 유형에 대하여, 분류값(y)은 표 1에 보여진 바와 같이 다른 방식으로 정의될 수 있다. 문법 오류 유형이 표 1에서의 5개의 예에 한정되지 않고, 분류값(y)의 정의가 표 1에 보여진 예에 의해 한정되지 않는다는 것이 이해되어야 한다. 또한, 일부 실시예에서, 분류값(y)이 문법 오류 유형과 연관된 클래스(라벨)에 대한 목표 단어의 확률 분포로서 표현될 수 있다는 것이 이해되어야 한다.Referring back to FIG. 6 , the classification sub-model 604 includes a feedforward neural network 610 configured to output a classification value of a target word for a grammatical error type based on a context vector of the target word. Feedforward neural network 610 may include a multi-layer perceptron (MLP) neural network or any other suitable feedforward neural network in which connections between hidden units do not form cycles. For example, as shown in FIG. 5 , a deep context vector i is fed into a feedforward neural network to generate a classification value y of target word i. For different grammatical error types, the classification value (y) can be defined in different ways as shown in Table 1. It should be understood that the grammatical error types are not limited to the five examples in Table 1, and the definition of the classification value y is not limited to the examples shown in Table 1. It should also be appreciated that, in some embodiments, the classification value y may be expressed as a probability distribution of target words for classes (labels) associated with grammatical error types.

오류 유형error type 분류값classification value (y)(y) 관사article 0 = a/an, 1 = the, 2 = 없음0 = a/an, 1 = the, 2 = none 전치사preposition 라벨 = 전치사 인덱스label = preposition index 동사 형태verb form 0 = 기본형, 1 = 동명사 또는 현재 분사, 2 = 과거 분사0 = base form, 1 = gerund or present participle, 2 = past participle 주격 관련 일치coincidence in the nominative case 0 = 비3인칭 단수 현재, 1 = 3인칭 단수 현재0 = non-third person singular present, 1 = third person singular present 명사 수number of nouns 0 = 단수, 1 = 복수0 = singular, 1 = plural

일부 실시예에서, 피드포워드 신경망(610)은 문맥 벡터에서의 완전 연결 선형 연산(fully connected linear operation)의 제1 활성화 함수(activation function)를 갖는 제1 층을 포함할 수 있다. 제1 층에서의 제1 활성화 함수는, 예를 들어, 정류 선형 유닛(rectified linear unit) 활성화 함수 또는 이전 층(들)으로부터의 1배(one fold) 출력의 함수인 임의의 다른 적합한 활성화 함수일 수 있다. 또한, 피드포워드 신경망(610)은 제1 층에 연결되고 분류값을 생성하기 위한 제2 활성화 함수를 갖는 제2 층을 포함할 수 있다. 제2 층에서의 제2 활성화 함수는, 예를 들어, 소프트맥스(softmax) 함수 또는 다계층(multiclass) 분류를 위하여 사용되는 임의의 다른 적합한 활성화 함수일 수 있다.도 4를 다시 참조하면, 일부 실시예에서, 어텐션 유닛(412)은, ANN 모델(120)을 사용하여, 문장(402) 내의 목표 단어 전의 적어도 하나의 단어 및 목표 단어 후의 적어도 하나의 단어에 기초하여 목표 단어의 문맥 가중치 벡터(context weight vector)를 제공하도록 구성된다. 도 7은 일 실시예에 따른 문법 오류 정정을 위한 ANN 모델(120)의 다른 예를 도시하는 개략도이다. 도 6에 도시된 예와 비교하여, 도 7의 ANN 모델(120)은 어텐션 유닛(412)에 의해 사용될 수 있는 어텐션 메커니즘 서브 모델(702)을 더 포함한다. 그 다음, 문맥 가중치 벡터를 문맥 벡터에 적용함으로써 가중치가 주어진 문맥 벡터가 계산된다. 심층 문맥 표현 서브 모델(602), 분류 서브 모델(604) 및 어텐션 메커니즘 서브 모델(702)은 종단간 방식으로 공동으로 훈련될 수 있다. 일부 실시예에서, 어텐션 메커니즘 서브 모델(702)은 목표 단어의 문맥 단어에 기초하여 목표 단어의 문맥 가중치 벡터를 생성하도록 구성된 피드포워드 신경망(704)을 포함한다. 피드포워드 신경망(704)은 문장 내의 목표 단어까지의 각각의 문맥 단어 사이의 거리에 기초하여 훈련될 수 있다. 일부 실시예에서, 문맥 가중치 벡터가 목표 단어까지의 상이한 거리에 따라 문맥 단어의 가중치를 조정할 수 있기 때문에, 초기 문맥 벡터 세트는 문장 내의 모든 둘러싸는 단어에 기초하여 생성될 수 있고, 문맥 가중치 벡터는 문법적 용례에 영향을 미치는 문맥 단어에 초점을 맞추도록 가중치가 주어진 문맥 벡터를 튜닝할 수 있다.In some embodiments, feedforward neural network 610 may include a first layer having a first activation function of fully connected linear operations on context vectors. The first activation function in the first layer can be, for example, a rectified linear unit activation function or any other suitable activation function that is a function of a one fold output from the previous layer(s). there is. Also, the feedforward neural network 610 may include a second layer connected to the first layer and having a second activation function for generating classification values. The second activation function in the second layer can be, for example, a softmax function or any other suitable activation function used for multiclass classification. Referring again to FIG. 4, some implementations In an example, attention unit 412 uses ANN model 120 to determine the context weight vector of the target word based on at least one word before and at least one word after the target word in sentence 402 . weight vector). 7 is a schematic diagram illustrating another example of an ANN model 120 for grammatical error correction according to one embodiment. Compared to the example shown in FIG. 6 , the ANN model 120 of FIG. 7 further includes an attention mechanism sub-model 702 that can be used by the attention unit 412 . A weighted context vector is then calculated by applying the context weight vector to the context vector. The deep context representation sub-model 602, the classification sub-model 604, and the attention mechanism sub-model 702 can be jointly trained in an end-to-end manner. In some embodiments, the attention mechanism submodel 702 includes a feedforward neural network 704 configured to generate a context weight vector of the target word based on the context words of the target word. Feedforward neural network 704 can be trained based on the distance between each context word to the target word in the sentence. In some embodiments, an initial set of context vectors may be generated based on all surrounding words in a sentence, since the context weight vectors may adjust the weights of context words according to different distances to the target word, and the context weight vectors may be We can tune the weighted context vectors to focus on context words that affect grammatical usage.

도 4를 다시 참조하면, 분류 비교 유닛(414)은 분류 유닛(410)에 의해 제공된 추정된 분류값을 목표 단어 라벨링 유닛(404)에 의해 제공된 실제 분류값과 비교하여 문법 오류 유형의 임의의 오류의 존재를 검출한다. 실제 분류값이 추정된 분류값과 동일하다면, 문법 오류 유형의 오류가 목표 단어에 대하여 검출되지 않는다. 그렇지 않으면, 문법 오류 유형의 오류가 검출되고, 추정된 분류값이 정정을 제공하기 위하여 사용된다. 예를 들어, 도 2에 관하여 위에서 설명된 예에서, 동사 형태 오류에 대한 목표 단어 "adding"의 추정된 분류값은 "0"(기본형)인 반면, 목표 단어 "adding"의 실제 분류값은 "1"(동명사 또는 현재 분사)이다. 따라서, 동사 형태 오류가 검출되고, 정정은 목표 단어 "adding"의 기본형이다.Referring again to FIG. 4 , the classification comparison unit 414 compares the estimated classification value provided by the classification unit 410 with the actual classification value provided by the target word labeling unit 404 to determine any errors of the type grammatical errors. detect the presence of If the actual classification value is the same as the estimated classification value, errors of the grammatical error type are not detected for the target word. Otherwise, an error of the type grammatical error is detected and the estimated classification value is used to provide correction. For example, in the example described above with respect to FIG. 2, the estimated classification value of the target word "adding" for verb morphological errors is "0" (default), whereas the actual classification value of the target word "adding" is " 1" (gerund or present participle). Thus, verb morphological errors are detected, and the correction is the base form of the target word "adding".

도 8은 일 실시예에 따른 도 6의 ANN 모델(120)의 일례를 도시하는 상세 개략도이다. 본 예에서, ANN 모델(120)은 공동으로 훈련되는 포워드 GRU 신경망, 백워드 GRU 신경망 및 MLP 신경망을 포함한다. 문장 "I go to school everyday"에서의 목표 단어 "go"에 대하여, 전방 문맥 단어 "I"는 왼쪽에서 오른쪽으로(순방향으로) 포워드 GRU 신경망에 공급되고, 후방 문맥 단어들 "to school everyday"는 오른쪽에서 왼쪽으로(역방향으로) 백워드 GRU 신경망으로 공급된다. 문맥 w_1:n을 고려하면, 목표 단어 w_i에 대한 문맥 벡터는 수학식 1로서 정의될 수 있다:8 is a detailed schematic diagram illustrating an example of the ANN model 120 of FIG. 6 according to one embodiment. In this example, ANN model 120 includes a jointly trained forward GRU neural network, backward GRU neural network, and MLP neural network. For the target word "go" in the sentence "I go to school everyday", the forward context word "I" is fed to the forward GRU neural network from left to right (forward direction), and the backward context words "to school everyday" are fed to the forward GRU neural network. It is fed into the backward GRU neural network from right to left (reverse direction). Considering the context w _1:n , the context vector for the target word w _i can be defined as Equation 1:

여기에서, lGRU는 주어진 문맥에서 왼쪽에서 오른쪽(순방향)으로 단어들을 읽는 GRU이고, rGRU는 주어진 문맥에서 오른쪽에서 왼쪽(역방향)으로 단어들을 읽는 정반대U이다. l/f는 문맥 단어의 별개의 왼쪽에서 오른쪽/오른쪽에서 왼쪽 단어 임베딩을 나타낸다. 그 후, 양측의 상호 종속성을 캡처하기 위하여, 연결된(concatenated) 벡터가 MLP 신경망에 공급된다. MLP 신경망에서의 제2 층에서, 소프트맥스 층이 목표 단어의 분류(예를 들어, 목표 단어 또는 목표 단어의 상태, 예를 들어, 단수 또는 복수)를 예측하기 위하여 사용될 수 있다:Here, lGRU is a GRU that reads words from left to right (forward) in a given context, and rGRU is the opposite U that reads words from right to left (backward) in a given context. l/f denotes distinct left-to-right/right-to-left word embeddings of context words. The concatenated vectors are then fed into the MLP neural network to capture the mutual dependencies of both sides. In the second layer in the MLP neural network, a softmax layer can be used to predict the target word's class (e.g., the target word or its state, e.g., singular or plural):

여기에서, ReLU는 정류 선형 유닛 활성화 함수이고, ReLU(x) = max(0, x)이며, L(x) = W(x)+b는 완전 연결 선형 연산이다. 본 예에서의 ANN 모델(120)의 최종 출력은 다음과 같다:where ReLU is the rectifying linear unit activation function, ReLU(x) = max(0, x), and L(x) = W(x)+b is a fully connected linear operation. The final output of the ANN model 120 in this example is:

여기에서, y는 위에서 설명된 바와 같은 분류값이다.Here, y is a classification value as described above.

도 9는 일 실시예에 따른 문장의 문법 오류 정정 방법(900)의 일례를 도시하는 순서도이다. 방법(900)은 하드웨어(예를 들어, 회로, 전용 로직, 프로그래머블 로직, 마이크로 코드 등), 소프트웨어(예를 들어, 처리 장치 상에서 실행되는 명령어) 또는 이들의 조합을 포함할 수 있는 처리 로직에 의해 수행될 수 있다. 모든 단계들이 본 명세서에 제공된 개시 내용을 수행하는데 필요하지 않을 수 있다는 것이 이해되어야 한다. 또한, 당해 업계의 통상의 기술자에 의해 이해되는 바와 같이, 단계들의 일부는 동시에 수행되거나, 또는 도 9에 도시된 것과 다른 순서로 수행될 수 있다.9 is a flowchart illustrating an example of a method 900 for correcting grammatical errors in sentences according to an embodiment. The method 900 is implemented by processing logic, which may include hardware (eg, circuitry, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or a combination thereof. can be performed It should be understood that not all steps may be necessary to carry out the disclosure provided herein. Also, as will be appreciated by those skilled in the art, some of the steps may be performed concurrently or in a different order than shown in FIG. 9 .

방법(900)은 도 1 및 4를 참조하여 설명될 것이다. 그러나, 방법(900)은 그 예시적인 실시예에 한정되지 않는다. 902에서, 문장이 수신된다. 문장은 입력 텍스트의 일부일 수 있다. 902는 GEC 시스템(100)의 입력 전처리 모듈(102)에 의해 수행될 수 있다. 904에서, 문장 내의 하나 이상의 목표 단어가 하나 이상의 문법 오류 유형에 기초하여 식별된다. 각각의 목표 단어는 하나 이상의 문법 오류 유형에 대응한다. 904는 GEC 시스템(100)의 파싱 모듈(104)에 의해 수행될 수 있다. 906에서, 대응하는 문법 오류 유형에 대한 하나의 목표 단어의 분류가 문법 오류 유형에 대하여 훈련된 ANN 모델(120)을 이용하여 추정된다. 908에서, 목표 단어 및 목표 단어의 추정된 분류에 기초하여 문법 오류가 검출된다. 검출은 목표 단어의 실제 분류를 목표 단어의 추정된 분류와 비교함으로써 이루어질 수 있다. 906 및 908은 GEC 시스템(100)의 분류 기반 GEC 모듈(108)에 의해 수행될 수 있다.Method 900 will be described with reference to FIGS. 1 and 4 . However, method 900 is not limited to its exemplary embodiment. At 902, a sentence is received. Sentences can be part of the input text. 902 may be performed by the input pre-processing module 102 of the GEC system 100. At 904, one or more target words in the sentence are identified based on one or more grammatical error types. Each target word corresponds to one or more types of grammatical errors. 904 may be performed by the parsing module 104 of the GEC system 100. At 906, the classification of one target word for the corresponding grammatical error type is estimated using the ANN model 120 trained for the grammatical error type. At 908, a grammatical error is detected based on the target word and the estimated classification of the target word. Detection may be done by comparing the actual classification of the target word to the estimated classification of the target word. 906 and 908 may be performed by classification-based GEC module 108 of GEC system 100.

910에서, 문장에서 아직 처리되지 않은 더 많은 목표 단어가 있는지 판단된다. 결과가 '예'이면, 방법(900)은 문장 내의 다음 목표 단어를 처리하기 위하여 904로 다시 이동한다. 문장 내의 모든 목표 단어가 처리되면, 912에서, 문장에 대한 문법 오류 정정이 문법 오류 결과에 기초하여 제공된다. 각각의 목표 단어의 추정된 분류는 문법 오류 정정을 생성하기 위하여 사용될 수 있다. 또한, 문법 점수가 문법 오류 결과에 기초하여 제공될 수 있다. 912는 GEC 시스템(100)의 채점/정정 모듈(114)에 의해 수행될 수 있다.At 910, it is determined if there are more target words in the sentence that have not yet been processed. If the result is 'yes', the method 900 moves back to 904 to process the next target word in the sentence. Once all target words in the sentence have been processed, at 912 grammatical error correction for the sentence is provided based on the grammatical error results. The estimated classification of each target word can be used to generate grammatical error corrections. Additionally, a grammar score may be provided based on the result of the grammatical error. 912 may be performed by the scoring/correction module 114 of the GEC system 100.

도 10은 일 실시예에 따른 문법 오류 유형에 대하여 목표 단어를 분류하는 방법(1000)의 일례를 도시하는 순서도이다. 방법(1000)은 하드웨어(예를 들어, 회로, 전용 로직, 프로그래머블 로직, 마이크로 코드 등), 소프트웨어(예를 들어, 처리 장치 상에서 실행되는 명령어) 또는 이들의 조합을 포함할 수 있는 처리 로직에 의해 수행될 수 있다. 모든 단계들이 본 명세서에 제공된 개시 내용을 수행하는데 필요하지 않을 수 있다는 것이 이해되어야 한다. 또한, 당해 업계의 통상의 기술자에 의해 이해되는 바와 같이, 단계들의 일부는 동시에 수행되거나, 또는 도 10에 도시된 것과 다른 순서로 수행될 수 있다.10 is a flow chart illustrating an example of a method 1000 of classifying target words for grammatical error types according to an embodiment. The method 1000 may be implemented by processing logic, which may include hardware (eg, circuitry, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or a combination thereof. can be performed It should be understood that not all steps may be necessary to carry out the disclosure provided herein. Also, as will be appreciated by those skilled in the art, some of the steps may be performed concurrently or in a different order than shown in FIG. 10 .

방법(1000)은 도 1 및 4를 참조하여 설명될 것이다. 그러나, 방법(1000)은 그 예시적인 실시예에 한정되지 않는다. 1002에서, 목표 단어의 문맥 벡터가 문장 내의 문맥 단어에 기초하여 제공된다. 문맥 단어는 문장 내에서 목표 단어를 둘러싸는 임의의 개수의 단어일 수 있다. 일부 실시예에서, 문맥 단어는 목표 단어를 제외한 문장 내의 모든 단어를 포함한다. 일부 실시예에서, 문맥 단어는 또한 목표 단어의 단어 기본형을 포함한다. 문맥 벡터는 문장으로부터 추출된 의미 특징을 포함하지 않는다. 1002는 분류 기반 GEC 모듈(108)의 심층 문맥 표현 유닛(408)에 의해 수행될 수 있다.Method 1000 will be described with reference to FIGS. 1 and 4 . However, method 1000 is not limited to its exemplary embodiment. At 1002, a context vector of the target word is provided based on the context word in the sentence. A context word can be any number of words surrounding the target word within a sentence. In some embodiments, context words include all words in the sentence except the target word. In some embodiments, the context word also includes the word base form of the target word. Context vectors do not include semantic features extracted from sentences. 1002 may be performed by the deep context representation unit 408 of the classification-based GEC module 108 .

1004에서, 문맥 가중치 벡터가 문장 내의 문맥 단어에 기초하여 제공된다. 1006에서, 문맥 가중치 벡터가 문맥 벡터에 적용되어 가중치가 주어진 문맥 벡터를 생성한다. 문맥 가중치 벡터는 목표 단어까지의 문맥 단어의 거리에 기초하여 해당하는 가중치를 문장 내의 각각의 문맥 단어에 적용할 수 있다. 1004 및 1006은 분류 기반 GEC 모듈(108)의 어텐션 유닛(412)에 의해 수행될 수 있다.At 1004, a context weight vector is provided based on the context words within the sentence. At 1006, a context weight vector is applied to the context vector to generate a weighted context vector. The context weight vector may apply a corresponding weight to each context word in the sentence based on the distance of the context word to the target word. 1004 and 1006 may be performed by the attention unit 412 of the classification-based GEC module 108.

1008에서, 문법 오류 유형에 대한 목표 단어의 분류값이 목표 단어의 가중치가 주어진 문맥 벡터에 기초하여 제공된다. 분류값은 문법 오류 유형과 연관된 다수의 클래스 중 하나를 나타낸다. 분류값은 문법 오류 유형과 연관된 클래스에 대한 목표 단어의 확률 분포일 수 있다. 1008은 분류 기반 GEC 모듈(108)의 분류 유닛(410)에 의해 수행될 수 있다.At 1008, a classification value of the target word for grammatical error type is provided based on the weighted context vector of the target word. The classification value represents one of a number of classes associated with the type of grammatical error. The classification value may be a probability distribution of target words for classes associated with grammatical error types. 1008 may be performed by the classification unit 410 of the classification-based GEC module 108 .

도 11은 일 실시예에 따른 문법 오류 유형에 대하여 목표 단어를 분류하는 방법(1100)의 다른 예를 도시하는 순서도이다. 방법(1100)은 하드웨어(예를 들어, 회로, 전용 로직, 프로그래머블 로직, 마이크로 코드 등), 소프트웨어(예를 들어, 처리 장치 상에서 실행되는 명령어) 또는 이들의 조합을 포함할 수 있는 처리 로직에 의해 수행될 수 있다. 모든 단계들이 본 명세서에 제공된 개시 내용을 수행하는데 필요하지 않을 수 있다는 것이 이해되어야 한다. 또한, 당해 업계의 통상의 기술자에 의해 이해되는 바와 같이, 단계들의 일부는 동시에 수행되거나, 또는 도 11에 도시된 것과 다른 순서로 수행될 수 있다.11 is a flowchart illustrating another example of a method 1100 of classifying target words for grammatical error types according to an embodiment. The method 1100 may be implemented by processing logic, which may include hardware (eg, circuitry, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or a combination thereof. can be performed It should be understood that not all steps may be necessary to carry out the disclosure provided herein. Also, as will be appreciated by those skilled in the art, some of the steps may be performed concurrently or in a different order than shown in FIG. 11 .

방법(1100)은 도 1 및 4를 참조하여 설명될 것이다. 그러나, 방법(1100)은 그 예시적인 실시예에 한정되지 않는다. 1102에서, 목표 단어의 문법 오류 유형이, 예를 들어, 미리 정의된 복수의 문법 오류 유형으로부터 결정된다. 1104에서, 문맥 단어의 윈도우 크기가 문법 오류 유형에 기초하여 결정된다. 윈도우 크기는 문맥 단어로서 고려될 문장 내의 목표 단어 전의 최대 개수의 단어와 목표 단어 전의 최대 개수의 단어를 나타낸다. 윈도우 크기는 상이한 문법 오류 유형에 대하여 다를 수 있다. 예를 들어, 주격 관련 일치 오류 및 동사 형태 오류에 대하여, 이러한 2개의 오류 유형이 일반적으로 목표 단어로부터 멀리 있는 문맥 단어로부터의 종속성을 필요로 하기 때문에, 전체 문장이 문맥으로서 고려될 수 있다. 관사 오류, 전치사 오류 및 명사 수 오류에 대하여, 윈도우 크기는, 관사 오류에 대한 3, 5 또는 10, 전치사 오류에 대한 3, 5 또는 10, 및 명사 수 오류에 대한 10, 15 또는 20과 같이, 전체 문장보다 더 작을 수 있다.Method 1100 will be described with reference to FIGS. 1 and 4 . However, method 1100 is not limited to its exemplary embodiment. At 1102, a grammatical error type of the target word is determined, for example, from a plurality of predefined grammatical error types. At 1104, the window size of the context word is determined based on the type of grammatical error. The window size represents the maximum number of words before the target word and the maximum number of words before the target word in a sentence to be considered as context words. The window size may be different for different types of grammatical errors. For example, for nominative related agreement errors and verb morphological errors, the entire sentence can be considered as context, since these two types of errors generally require dependencies from context words that are far from the target word. For article errors, preposition errors and noun number errors, the window size is 3, 5 or 10 for article errors, 3, 5 or 10 for preposition errors, and 10, 15 or 20 for noun number errors, It can be smaller than a full sentence.

1106에서, 포워드 단어 임베딩 벡터 세트가 목표 단어 전의 문맥 단어들에 기초하여 생성된다. 각각의 포워드 단어 임베딩 벡터의 디멘전의 수는 적어도 100, 예를 들어 300일 수 있다. 포워드 단어 임베딩 벡터 세트가 생성되는 순서는 윈도우 크기 내의 첫 번째 단어로부터 목표 단어 바로 전까지(순방향)일 수 있다. 1108에서, 병렬로, 백워드 단어 임베딩 벡터 세트가 목표 단어 후의 문맥 단어들에 기초하여 생성된다. 각각의 백워드 단어 임베딩 벡터의 디멘전의 수는은 적어도 100, 예를 들어 300일 수 있다. 백워드 단어 임베딩 벡터 세트가 생성되는 순서는 윈도우 크기 내의 마지막 단어로부터 목표 단어 바로 다음까지(역방향)일 수 있다. 1102, 1104, 1106 및 1108은 분류 기반 GEC 모듈(108)의 초기 문맥 생성 유닛(406)에 의해 수행될 수 있다.At 1106, a set of forward word embedding vectors is generated based on the context words preceding the target word. The number of dimensions of each forward word embedding vector may be at least 100, for example 300. The order in which the set of forward word embedding vectors are generated may be from the first word within the window size to just before the target word (forward direction). At 1108, in parallel, a set of backward word embedding vectors is generated based on the context words following the target word. The number of dimensions of each backward word embedding vector may be at least 100, for example 300. The order in which the sets of backward word embedding vectors are generated may be from the last word within the window size to immediately following the target word (reverse direction). 1102, 1104, 1106 and 1108 may be performed by the initial context creation unit 406 of the classification-based GEC module 108.

1110에서, 포워드 문맥 벡터가 포워드 단어 임베딩 벡터 세트에 기초하여 제공된다. 포워드 단어 임베딩 벡터 세트는 원도우 크기 내의 첫 번째 단어의 포워드 단어 임베딩 벡터로부터 목표 단어 바로 전의 단어의 포워드 단어 임베딩 벡터까지(순방향)의 순서에 따라 순환 신경망에 공급될 수 있다. 1112에서, 병렬로, 백워드 문맥 벡터가 백워드 단어 임베딩 벡터 세트에 기초하여 제공된다. 백워드 단어 임베딩 벡터 세트는 원도우 크기 내의 마지막 단어의 백워드 단어 임베딩 벡터로부터 목표 단어 바로 다음의 단어의 백워드 단어 임베딩 벡터까지(역방향)의 순서에 따라 다른 순환 신경망에 공급될 수 있다. 1114에서, 문맥 벡터가 포워드 문맥 벡터 및 백워드 문맥 벡터를 연결하여 제공된다. 1110, 1112 및 1114는 분류 기반 GEC 모듈(108)의 심층 문맥 표현 유닛(408)에 의해 수행될 수 있다.At 1110, forward context vectors are provided based on the set of forward word embedding vectors. The set of forward word embedding vectors may be fed to the recurrent neural network in order from the forward word embedding vector of the first word within the window size to the forward word embedding vector of the word immediately before the target word (forward direction). At 1112, in parallel, a backward context vector is provided based on the set of backward word embedding vectors. The set of backward word embedding vectors may be fed to another recurrent neural network in order from the backward word embedding vector of the last word within the window size to the backward word embedding vector of the word immediately following the target word (backward). At 1114, a context vector is provided by concatenating the forward context vector and the backward context vector. 1110, 1112 and 1114 may be performed by the deep context representation unit 408 of the classification-based GEC module 108.

1116에서, 완전 연결 선형 연산이 문맥 벡터에 적용된다. 1118에서, 제1 층의 활성화 함수, 예를 들어, MLP 신경망의 제1 층의 활성화 함수가 완전 연결 선형 연산의 출력에 적용된다. 활성화 함수는 정류 선형 유닛 활성화 함수일 수 있다. 1120에서, 제2 층의 다른 활성화 함수, 예를 들어, MLP 신경망의 제2 층의 다른 활성화 함수가 제1 층의 활성화 함수의 출력에 적용되어 문법 오류 유형에 대한 목표 단어의 분류값을 생성한다. 문법 오류 유형에 대한 목표 단어의 다계층 분류가 1116, 1118 및 1120에서 MLP 신경망에 의해 문맥 벡터에 기초하여 수행될 수 있다. 1116, 1118 및 1120는 분류 기반 GEC 모듈(108)의 분류 유닛(410)에 의해 수행될 수 있다.At 1116, a fully connected linear operation is applied to the context vector. At 1118, the activation function of the first layer, eg, the activation function of the first layer of the MLP neural network, is applied to the output of the fully connected linear operation. The activation function may be a commutative linear unit activation function. At 1120, another activation function of the second layer, eg, another activation function of the second layer of the MLP neural network, is applied to the output of the activation function of the first layer to generate a classification of the target word for the type of grammatical error. . Multi-level classification of target words for grammatical error types may be performed at 1116, 1118 and 1120 based on the context vectors by the MLP neural network. 1116, 1118 and 1120 may be performed by the classification unit 410 of the classification-based GEC module 108.

도 12는 일 실시예에 따른 문법 점수를 제공하는 방법(1200)의 일례를 도시하는 순서도이다. 방법(1200)은 하드웨어(예를 들어, 회로, 전용 로직, 프로그래머블 로직, 마이크로 코드 등), 소프트웨어(예를 들어, 처리 장치 상에서 실행되는 명령어) 또는 이들의 조합을 포함할 수 있는 처리 로직에 의해 수행될 수 있다. 모든 단계들이 본 명세서에 제공된 개시 내용을 수행하는데 필요하지 않을 수 있다는 것이 이해되어야 한다. 또한, 당해 업계의 통상의 기술자에 의해 이해되는 바와 같이, 단계들의 일부는 동시에 수행되거나, 또는 도 12에 도시된 것과 다른 순서로 수행될 수 있다.12 is a flow chart illustrating an example of a method 1200 of providing grammar scores, according to one embodiment. Method 1200 may be performed by processing logic, which may include hardware (eg, circuitry, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or a combination thereof. can be performed It should be understood that not all steps may be necessary to carry out the disclosure provided herein. Also, as will be appreciated by those skilled in the art, some of the steps may be performed concurrently or in a different order than shown in FIG. 12 .

방법(1200)은 도 1 및 4를 참조하여 설명될 것이다. 그러나, 방법(1200)은 그 예시적인 실시예에 한정되지 않는다. 1202에서, 사용자 인자(user factor)가 사용자의 정보에 기초하여 결정된다. 정보는, 예를 들어, 모국어, 거주지, 교육 수준, 나이, 과거의 점수 등을 포함한다. 1204에서, 정밀도 및 리콜의 가중치가 결정된다. 정밀도와 리콜은 GEC를 위한 주요 평가 기준으로서 조합하여 일반적으로 사용된다. 정밀도(P)와 리콜(R)은 다음과 같이 정의된다:Method 1200 will be described with reference to FIGS. 1 and 4 . However, method 1200 is not limited to its exemplary embodiment. At 1202, a user factor is determined based on the user's information. The information includes, for example, native language, place of residence, education level, age, past scores, and the like. At 1204, weights of precision and recall are determined. Precision and recall are commonly used in combination as the main evaluation criteria for GEC. Precision (P) and recall (R) are defined as:

여기에서, g는 특정 문법 오류 유형에 대한 2개의 인간 주석의 골드 스탠다드(gold standard)이고, e는 대응하는 시스템 편집이다. 많은 다른 문법 오류 유형과 동사 형태 오류 유형 사이에 중첩이 있을 수 있고, 따라서, 동사 형태 오류 성능을 계산할 때 g는 모든 문법 오류 유형의 주석에 기초할 수 있다. 정밀도와 리콜 사이의 가중치는 이들을 평가 기준으로서 함께 조합할 때 조정될 수 있다. 예를 들어, 수학식 5에서 정의되는 F_0.5는 정밀도와 리콜을 함께 조합하는 반면, 정확한 피드백이 일부 실시예에서 커버리지보다 더 중요할 때 정밀도에 2배의 가중치를 할당한다:where g is the gold standard of two human annotations for a particular type of grammatical error, and e is the corresponding system edit. There may be overlap between many different grammatical error types and verb morphological error types, so when calculating verb morphological error performance, g can be based on annotations of all grammatical error types. The weighting between precision and recall can be adjusted when combining them together as evaluation criteria. For example, F _0.5 defined in Equation 5 combines precision and recall together, while assigning a double weight to precision when accurate feedback is more important than coverage in some embodiments:

n이 0 내지 1 사이인 F_n이 다른 예에 적용될 수 있다는 것이 이해되어야 한다. 또한, 일부 실시예에서, 상이한 문법 오류 유형에 대하여 가중치는 다를 수 있다.It should be understood that F _n where n is between 0 and 1 may apply to other examples. Also, in some embodiments, weights may be different for different grammatical error types.

1206에서, 채점 함수가 사용자 인자 및 가중치에 기초하여 획득된다. 채점 함수는 파라미터로서 사용자 인자 및 가중치(상이한 문법 오류 유형에 대하여 동일하거나 상이함)를 사용할 수 있다. 1208에서, 문장 내의 각각의 목표 단어의 문법 오류 결과가 수신된다. 1210에서, 문법 점수가 문법 오류 결과 및 채점 함수에 기초하여 제공된다. 문법 오류 결과는 채점 함수의 변수일 수 있고, 사용자 인자 및 가중치는 채점 함수의 파라미터일 수 있다. 1202, 1204, 1206, 1208 및 1210은 GEC 시스템(100)의 채점/정정 모듈(114)에 의해 수행될 수 있다.At 1206, a scoring function is obtained based on user factors and weights. The scoring function may use user factors and weights (same or different for different grammatical error types) as parameters. At 1208, results of grammatical errors for each target word in the sentence are received. At 1210, a grammar score is provided based on the grammatical error result and the scoring function. The grammatical error result may be a variable of the scoring function, and the user factors and weights may be parameters of the scoring function. 1202, 1204, 1206, 1208 and 1210 may be performed by scoring/correction module 114 of GEC system 100.

도 13은 일 실시예에 따른 ANN 모델 훈련 시스템(1300)을 도시하는 블록도이다. ANN 모델 훈련 시스템(1300)은 훈련 알고리즘(1308)을 이용하여 목적 함수(objective function)(1306)에 기초하여 훈련 샘플(1304) 세트에 걸쳐 특정 문법 오류 유형에 대하여 각각의 ANN 모델(120)을 훈련시키도록 구성된 모델 훈련 모듈(1302)을 포함한다. 일부 실시예에서, 각각의 훈련 샘플(1304)은 원시(native) 훈련 샘플일 수 있다. 본 명세서에서 개시된 원시 훈련 샘플은, 하나 이상의 문법 오류를 갖는 문장을 포함하는 학습자 훈련 샘플에 반대로, 문법 오류가 없는 문장을 포함한다. 맞춤형 훈련을 필요로 하는, 즉 감독되는 훈련 데이터의 크기 및 능력에 의해 제한되는 훈련 샘플(예를 들어, 학습자 훈련 샘플)로서 감독되는 데이터를 이용하는 일부 알려진 GEC 시스템에 비하여, ANN 모델 훈련 시스템(1300)은 ANN 모델(120)을 더욱 효과적이고 효율적으로 훈련시키기 위하여 훈련 샘플(1304)로서 풍부한 원시 평문 코퍼스를 활용할 수 있다. 예를 들어, 훈련 샘플(1304)은 위키 덤프(wiki dump)로부터 획득될 수 있다. ANN 모델 훈련 시스템(1300)을 위한 훈련 샘플(1304)이 원시 훈련 샘플에 한정되지 않는다는 것이 이해되어야 한다. 일부 실시예에서, 소정의 문법 오류 유형에 대하여, ANN 모델 훈련 시스템(1300)은 학습자 훈련 샘플을 이용하여 또는 원시 훈련 샘플과 학습자 훈련 샘플의 조합을 이용하여 ANN 모델(120)을 훈련시킬 수 있다.13 is a block diagram illustrating an ANN model training system 1300 according to one embodiment. The ANN model training system 1300 uses a training algorithm 1308 to train each ANN model 120 for a particular type of grammar error across a set of training samples 1304 based on an objective function 1306. and a model training module 1302 configured to train. In some embodiments, each training sample 1304 may be a native training sample. The raw training samples disclosed herein include sentences without grammatical errors, as opposed to learner training samples that include sentences with one or more grammatical errors. Compared to some known GEC systems that use supervised data as training samples (e.g., learner training samples) that require custom training, i.e., are limited by the size and power of the supervised training data, the ANN model training system (1300 ) can utilize the rich raw plaintext corpus as training samples 1304 to train the ANN model 120 more effectively and efficiently. For example, training sample 1304 can be obtained from a wiki dump. It should be understood that the training samples 1304 for the ANN model training system 1300 are not limited to raw training samples. In some embodiments, for certain types of grammatical errors, ANN model training system 1300 may train ANN model 120 using learner training samples or using a combination of raw and learner training samples. .

도 14는 도 13의 ANN 모델 훈련 시스템(1300)에 의해 사용되는 훈련 샘플(1304)의 일례에 대한 도면이다. 훈련 샘플은 하나 이상의 문법 오류 유형 1, ..., n과 연관된 문장을 포함한다. 훈련 샘플이 문법 오류가 없는 원시 훈련 샘플일 수 있더라도, 전술한 바와 같이 특정 단어가 예를 들어 이의 PoS 태그에 기초하여 하나 이상의 문법 오류 유형과 연관되기 때문에, 문장은 여전히 문법 오류 유형과 연관될 수 있다. 예를 들어, 문장이 동사를 포함하는 한, 문장은, 예를 들어, 동사 형태 오류 및 주격 관련 일치 오류와 연관될 수 있다. 하나 이상의 목표 단어 1, ..., m이 각각의 문법 오류 유형과 연관될 수 있다. 예를 들어, 문장 내의 모든 동사가 훈련 샘플에서 동사 형태 오류 또는 주격 관련 일치 오류에 대한 목표 단어이다. 각각의 목표 단어에 대하여, 이는 다음의 2개의 정보와 더 연관된다: 단어 임베딩 벡터 세트(매트릭스)(x) 및 실제 분류값(y). 단어 임베딩 벡터 세트(x)는 문장 내의 목표 단어의 문맥 단어에 기초하여 생성될 수 있다. 일부 실시예에서, 단어 임베딩 벡터 세트(x)가 원-핫 벡터 세트와 같은 임의의 다른 초기 문맥 벡터 세트일 수 있다는 것이 이해되어야 한다. 위에서 설명된 바와 같이, 실제 분류값(y)은 명사 수 오류에 대하여 단수에 대한 "0" 및 복수에 대한 "1"과 같은 특정 문법 오류 유형에 대한 클래스 라벨 중 하나일 수 있다. 따라서, 훈련 샘플은 단어 임베딩 벡터 세트(x) 및 실제 분류값(y) 쌍을 포함하며, 그 각각은 문장 내의 문법 오류 유형에 대한 목표 단어에 대응한다.FIG. 14 is a diagram of an example of a training sample 1304 used by the ANN model training system 1300 of FIG. 13 . The training sample contains sentences associated with one or more grammatical error types 1, ..., n. Even if a training sample may be a raw training sample with no grammatical errors, a sentence may still be associated with a grammatical error type because, as described above, a particular word is associated with more than one grammatical error type, for example based on its PoS tag. there is. For example, as long as the sentence contains a verb, the sentence may be associated with, for example, verb morphological errors and nominative-related agreement errors. One or more target words 1, ..., m may be associated with each type of grammatical error. For example, all verbs in a sentence are target words for verb morphological errors or nominative related congruence errors in the training sample. For each target word, it is further associated with two pieces of information: a set of word embedding vectors (matrix) (x) and an actual classification value (y). A set of word embedding vectors (x) may be generated based on the context words of the target word in the sentence. It should be appreciated that in some embodiments, the set of word embedding vectors (x) may be any other initial context vector set, such as a one-hot vector set. As described above, the actual classification value y may be one of the class labels for a specific type of grammatical error, such as “0” for singular and “1” for plural for noun number errors. Thus, the training sample contains a set of word embedding vectors (x) and pairs of actual classification values (y), each corresponding to a target word for a type of grammatical error in a sentence.

도 13을 다시 참조하면, ANN 모델(120)은 훈련 샘플(1304)을 공급받을 때 모델 훈련 모듈(1302)에 의해 공동으로 조정될 수 있는 복수의 파라미터를 포함한다. 모델 훈련 모듈(1302)은 훈련 알고리즘(1308)을 이용하여 훈련 샘플(1304)에 걸쳐 목적 함수(1306)를 최소화하기 위하여 ANN 모델(120)의 파라미터들을 공동으로 조정한다. 도 8에 대하여 위에서 설명된 예에서, ANN 모델(120)을 훈련시키기 위한 목적 함수는 다음과 같다:Referring back to FIG. 13 , the ANN model 120 includes a plurality of parameters that can be jointly tuned by the model training module 1302 when supplied with the training samples 1304 . Model training module 1302 uses training algorithm 1308 to jointly adjust parameters of ANN model 120 to minimize objective function 1306 across training samples 1304 . In the example described above with respect to FIG. 8, the objective function for training the ANN model 120 is:

여기서, n은 훈련 샘플(1304)의 개수이다. 훈련 알고리즘(1308)은, 경사 하강(gradient descent) 알고리즘(예를 들어, 스토캐스틱(stochastic) 경사 하강 알고리즘)을 포함하는, 목적 함수(1306)의 최소를 찾기 위한 임의의 적합한 반복 최적화 알고리즘일 수 있다.where n is the number of training samples 1304. The training algorithm 1308 may be any suitable iterative optimization algorithm for finding a minimum of the objective function 1306, including a gradient descent algorithm (eg, a stochastic gradient descent algorithm). .

도 15는 일 실시예에 따른 문법 오류 정정을 위한 ANN 모델 훈련 방법(1500)의 일례를 도시하는 순서도이다. 방법(1500)은 하드웨어(예를 들어, 회로, 전용 로직, 프로그래머블 로직, 마이크로 코드 등), 소프트웨어(예를 들어, 처리 장치 상에서 실행되는 명령어) 또는 이들의 조합을 포함할 수 있는 처리 로직에 의해 수행될 수 있다. 모든 단계들이 본 명세서에 제공된 개시 내용을 수행하는데 필요하지 않을 수 있다는 것이 이해되어야 한다. 또한, 당해 업계의 통상의 기술자에 의해 이해되는 바와 같이, 단계들의 일부는 동시에 수행되거나, 또는 도 15에 도시된 것과 다른 순서로 수행될 수 있다.15 is a flowchart illustrating an example of a method 1500 of training an ANN model for grammatical error correction according to an embodiment. Method 1500 may be performed by processing logic, which may include hardware (eg, circuitry, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or a combination thereof. can be performed It should be understood that not all steps may be necessary to carry out the disclosure provided herein. Also, as will be appreciated by those skilled in the art, some of the steps may be performed concurrently or in a different order than shown in FIG. 15 .

방법(1500)은 도 13을 참조하여 설명될 것이다. 그러나, 방법(1500)은 그 예시적인 실시예에 한정되지 않는다. 1502에서, 문법 오류 유형에 대한 ANN 모델이 제공된다. ANN 모델은 문법 오류 유형에 대하여 문장 내의 목표 단어의 분류를 추정하기 위한 것이다. ANN 모델은, 예를 들어, 도 6 및 7에 도시된 것과 같은, 본 명세서에 개시된 임의의 ANN 모델일 수 있다. 일부 실시예에서, ANN 모델은 문장 내의 목표 단어 전의 적어도 하나의 단어 및 문장 내의 목표 단어 후의 적어도 하나의 단어에 기초하여 목표 단어의 문맥 벡터를 출력하도록 구성된 2개의 순환 신경망을 포함할 수 있다. 일부 실시예에서, 문맥 벡터는 훈련 샘플에서 문장의 의미 특징을 포함하지 않는다. 위에서 설명된 바와 같이, ANN 모델은 포워드 순환 신경망(606) 및 백워드 순환 신경망(608)으로서 파라미터화될 수 있는 심층 문맥 표현 서브 모델(602)을 포함할 수 있다. 또한, ANN 모델은 목표 단어의 문맥 벡터에 기초하여 목표 단어의 분류값을 출력하도록 구성된 피드포워드 신경망을 포함할 수 있다. 위에서 설명된 바와 같이, ANN 모델은 피드포워드 신경망(610)으로서 파라미터화될 수 있는 분류 서브 모델(604)을 포함할 수 있다.Method 1500 will be described with reference to FIG. 13 . However, method 1500 is not limited to its exemplary embodiment. At 1502, an ANN model for grammar error types is provided. The ANN model is for estimating the classification of a target word in a sentence with respect to the type of grammatical error. The ANN model may be any ANN model disclosed herein, such as shown in FIGS. 6 and 7 , for example. In some embodiments, the ANN model may include two recurrent neural networks configured to output a context vector of a target word based on at least one word before the target word in the sentence and at least one word after the target word in the sentence. In some embodiments, context vectors do not include semantic features of sentences in the training samples. As described above, the ANN model can include a deep contextual representation sub-model 602 that can be parameterized as a forward recurrent neural network 606 and a backward recurrent neural network 608 . Also, the ANN model may include a feedforward neural network configured to output classification values of the target word based on the context vector of the target word. As described above, the ANN model may include a classification sub-model 604 that may be parameterized as a feedforward neural network 610.

1504에서, 훈련 샘플이 획득된다. 각각의 훈련 샘플은 목표 단어를 갖는 문장과 문법 오류 유형에 대한 목표 단어의 실제 분류를 포함한다. 일부 실시예에서, 훈련 샘플은 포워드 단어 임베딩 벡터 세트와 백워드 단어 임베딩 벡터 세트를 포함하는 목표 단어의 단어 임베딩 매트릭스를 포함할 수 있다. 각각의 포워드 단어 임베딩 벡터는 목표 단어 전의 해당하는 문맥 단어에 기초하여 생성되고, 각각의 백워드 단어 임베딩 벡터는 목표 단어 후의 해당하는 문맥 단어에 기초하여 생성된다. 각각의 단어 임베딩 벡터의 디멘전의 수는 적어도 100, 예를 들어 300일 수 있다.At 1504, training samples are obtained. Each training sample includes a sentence with the target word and an actual classification of the target word for the type of grammatical error. In some embodiments, the training samples may include a word embedding matrix of target words comprising a set of forward word embedding vectors and a set of backward word embedding vectors. Each forward word embedding vector is generated based on the corresponding context word before the target word, and each backward word embedding vector is generated based on the corresponding context word after the target word. The number of dimensions of each word embedding vector may be at least 100, for example 300.

1506에서, ANN 모델의 파라미터들이, 예를 들어, 종단간 방식으로, 공동으로 조정된다. 일부 실시예에서, 순환 신경망(606, 608)과 연관된 심층 문맥 표현 서브 모델(602)의 제1 파라미터 세트가 각각의 훈련 샘플에서 목표 단어의 실제 분류 및 추정된 분류 사이의 차이에 기초하여 피드포워드 신경망(610)과 연관된 분류 서브 모델(604)의 제2 파라미터 세트와 함께 공동으로 조정된다. 일부 실시예에서, 포워드 순환 신경망(606)과 연관된 파라미터는 백워드 순환 신경망(608)과 연관된 파라미터와 분리된다. 또한, 일부 실시예에서, ANN 모델은 피드포워드 신경망(610)으로서 파라미터화될 수 있는 어텐션 메커니즘 서브 모델(702)을 포함할 수 있다. 피드포워드 신경망(610)과 연관된 어텐션 메커니즘 서브 모델(702)의 파라미터들도 또한 ANN 모델의 다른 파라미터와 함께 공동으로 조정될 수 있다. 일부 실시예에서, ANN 모델의 파라미터들은 훈련 알고리즘(1308)을 이용하여 목적 함수(1306)로부터 각각의 훈련 샘플 내의 목표 단어의 실제 분류와 추정된 분류 사이의 차이를 최소화하도록 공동으로 조정된다. 1502, 1504 및 1506은 ANN 모델 훈련 시스템(1300)의 모델 훈련 모듈(1302)에 의해 수행될 수 있다.At 1506, the parameters of the ANN model are tuned jointly, eg, in an end-to-end manner. In some embodiments, the first set of parameters of the deep context representation sub-model 602 associated with the recurrent neural networks 606 and 608 are fed forward based on the difference between the actual and estimated classification of the target word in each training sample. It is jointly tuned with a second set of parameters of the classification sub-model 604 associated with the neural network 610. In some embodiments, parameters associated with forward recurrent neural network 606 are separate from parameters associated with backward recurrent neural network 608 . Additionally, in some embodiments, the ANN model may include an attention mechanism submodel 702 that may be parameterized as a feedforward neural network 610 . The parameters of the attention mechanism sub-model 702 associated with the feedforward neural network 610 can also be jointly tuned along with other parameters of the ANN model. In some embodiments, the parameters of the ANN model are jointly tuned using training algorithm 1308 to minimize the difference between the actual and estimated classification of the target word in each training sample from objective function 1306 . 1502, 1504 and 1506 may be performed by the model training module 1302 of the ANN model training system 1300.

도 16은 일 실시예에 따른 문법 오류 정정을 위하여 ANN 모델(120)을 훈련시키는 일례를 도시하는 개략도이다. 본 예에서, ANN 모델(120)은 특정 문법 오류 유형에 대하여 훈련 샘플(1304)에 걸쳐 훈련된다. 훈련 샘플(1304)은 원시 텍스트로부터 유래하고 도 1에 대하여 위에서 설명된 바와 같이 전처리 및 파싱될 수 있다. 각각의 훈련 샘플(1304)은 문법 오류 유형에 대한 목표 단어를 갖는 문장과 문법 오류 유형에 대한 목표 단어의 실제 분류를 포함한다. 일부 실시예에서, 목표 단어의 단어 임베딩 매트릭스(x)와 목표 단어의 실제 분류값(y)을 포함하는 쌍이 각각의 훈련 샘플(1304)로부터 획득될 수 있다. 단어 임베딩 매트릭스(x)는 목표 단어 전의 문맥 단어에 기초하여 생성된 포워드 단어 임베딩 벡터 세트 및 목표 단어 후의 문맥 단어에 기초하여 생성된 백워드 단어 임베딩 벡터 세트를 포함할 수 있다. 따라서, 훈련 샘플(1304)은 복수의 (x, y) 쌍을 포함할 수 있다.16 is a schematic diagram illustrating an example of training an ANN model 120 for grammatical error correction according to an embodiment. In this example, ANN model 120 is trained across training samples 1304 for specific grammatical error types. Training samples 1304 may be derived from raw text and preprocessed and parsed as described above with respect to FIG. 1 . Each training sample 1304 includes a sentence with a target word for the grammatical error type and an actual classification of the target word for the grammatical error type. In some embodiments, a pair comprising the word embedding matrix (x) of the target word and the actual classification value (y) of the target word may be obtained from each training sample 1304 . The word embedding matrix x may include a set of forward word embedding vectors generated based on context words before the target word and a set of backward word embedding vectors generated based on context words after the target word. Thus, training sample 1304 may include multiple (x, y) pairs.

일부 실시예에서, ANN 모델(120)은 복수의 순환 신경망 1 내지 n(1602)과 복수의 피드포워드 신경망 1 내지 m(1604)을 포함할 수 있다. 신경망(1602, 1604)의 각각은 훈련 알고리즘(1308)을 이용하여 목적 함수(1306)에 기초하여 훈련 샘플(1304)에 걸쳐 훈련될 파라미터 세트와 연관된다. 순환 신경망(1602)은 목표 단어의 문맥 단어에 기초하여 목표 단어의 문맥 벡터를 출력하도록 구성된 포워드 순환 신경망 및 백워드 순환 신경망을 포함할 수 있다. 일부 실시예에서, 순환 신경망(1602)은 목표 단어의 문맥 단어에 기초하여 목표 단어의 단어 임베딩 매트릭스를 생성하도록 구성된 다른 하나 이상의 순환 신경망을 더 포함할 수 있다. 피드포워드 신경망(1604)은 목표 단어의 문맥 벡터에 기초하여 목표 단어의 분류값(y')을 출력하도록 구성된 피드포워드 신경망을 포함할 수 있다. 또한, 일부 실시예에서, 피드포워드 신경망(1604)은 문맥 벡터에 적용될 문맥 가중치 벡터를 출력하도록 구성된 다른 피드포워드 신경망을 포함할 수 있다. 신경망(1602, 1604)은 이들이 종단간 방식으로 공동으로 훈련될 수 있도록 연결될 수 있다. 일부 실시예에서, 문맥 벡터는 훈련 샘플(1304) 내의 문장의 의미 특징을 포함하지 않는다.In some embodiments, ANN model 120 may include a plurality of recurrent neural networks 1 through n 1602 and a plurality of feedforward neural networks 1 through m 1604 . Each of the neural networks 1602 and 1604 is associated with a set of parameters to be trained over training samples 1304 based on an objective function 1306 using a training algorithm 1308 . The recurrent neural network 1602 can include a forward recurrent neural network and a backward recurrent neural network configured to output a context vector of the target word based on context words of the target word. In some embodiments, recurrent neural network 1602 may further include one or more other recurrent neural networks configured to generate a word embedding matrix of the target word based on the context words of the target word. The feedforward neural network 1604 may include a feedforward neural network configured to output a classification value (y′) of the target word based on a context vector of the target word. Additionally, in some embodiments, feedforward neural network 1604 may include another feedforward neural network configured to output a context weight vector to be applied to the context vector. Neural networks 1602 and 1604 can be connected such that they can be jointly trained in an end-to-end manner. In some embodiments, context vectors do not include semantic features of sentences in training samples 1304 .

일부 실시예에서, 각각의 반복에 대하여, 대응하는 훈련 샘플(1304)에서의 목표 단어의 단어 임베딩 매트릭스(x)는 ANN 모델(120)로 공급되어 신경망(1602, 1604)을 통과할 수 있다. 추정된 분류값(y')은 ANN 모델(120)의 출력층(예를 들어, 피드포워드 신경망(1604)의 일부)으로부터 출력될 수 있다. 대응하는 훈련 샘플(1304)에서의 목표 단어의 추정된 분류값(y')과 실제 분류값(y)은 목적 함수(1306)로 전송될 수 있고, 추정된 분류값(y')과 실제 분류값(y) 사이의 차이는 훈련 알고리즘(1308)을 이용하여 목적 함수(1306)에 의해 사용되어 ANN 모델(120) 내의 각각의 신경망(1602, 1604)과 연관된 각각의 파라미터 세트를 공동으로 조정할 수 있다. 각각의 훈련 샘플(1304)에 걸쳐 ANN 모델(120)에서 각각의 신경망(1602, 1604)과 연관된 각각의 파라미터 세트를 반복적으로 그리고 공동으로 조정함으로써, 추정된 분류값(y')과 실제 분류값(y) 사이의 차이는 점점 더 작아지고, 목적 함수(1306)는 최적화된다.In some embodiments, for each iteration, the word embedding matrix (x) of the target word in the corresponding training sample 1304 may be fed into the ANN model 120 and passed through the neural networks 1602 and 1604 . The estimated classification value y′ may be output from an output layer of the ANN model 120 (eg, a part of the feedforward neural network 1604). The estimated classification value (y′) and the actual classification value (y) of the target word in the corresponding training sample 1304 may be transferred to the objective function 1306, and the estimated classification value (y′) and the actual classification value The difference between the values y can be used by the objective function 1306 using the training algorithm 1308 to jointly tune each set of parameters associated with each neural network 1602, 1604 in the ANN model 120. there is. By iteratively and jointly adjusting each set of parameters associated with each neural network 1602, 1604 in the ANN model 120 over each training sample 1304, the estimated classification value (y') and the actual classification value The difference between (y) becomes smaller and smaller, and the objective function 1306 is optimized.

다양한 실시예들은, 예를 들어, 도 17에 도시된 컴퓨터 시스템(1700)과 같은 하나 이상의 컴퓨터 시스템을 이용하여 구현될 수 있다. 하나 이상의 컴퓨터 시스템(1700)은, 예를 들어, 도 3의 방법(300), 도 9의 방법(900), 도 10의 방법(1000), 도 11의 방법(1100), 도 12의 방법(1200) 및 도 15의 방법(1500)을 구현하도록 사용될 수 있다. 예를 들어, 컴퓨터 시스템(1700)은, 다양한 실시예들에 따라, 문법 오류를 검출하여 정정하고 그리고/또는 문법 오류를 검출하여 정정하기 위한 인공 신경망(artificial neural network) 모델을 훈련시킬 수 있다. 컴퓨터 시스템(1700)은 본 명세서에 설명된 기능을 수행할 수 있는 임의의 컴퓨터일 수 있다.Various embodiments may be implemented using one or more computer systems, such as, for example, computer system 1700 shown in FIG. 17 . One or more computer systems 1700 may be used, for example, method 300 of FIG. 3, method 900 of FIG. 9, method 1000 of FIG. 10, method 1100 of FIG. 11, method 12 of FIG. 1200) and method 1500 of FIG. 15. For example, computer system 1700 can detect and correct grammatical errors and/or train an artificial neural network model to detect and correct grammatical errors, according to various embodiments. Computer system 1700 may be any computer capable of performing the functions described herein.

컴퓨터 시스템(1700)은 본 명세서에 설명된 기능을 수행할 수 있는 임의의 잘 알려진 컴퓨터일 수 있다. 컴퓨터 시스템(1700)은 프로세서(1704)와 같은 하나 이상의 프로세서(중앙 처리 유닛 또는 CPU라고도 함)를 포함한다. 프로세서(1704)는 통신 인프라스트럭처 또는 버스(1706)에 연결된다. 하나 이상의 프로세서(1704)는 각각 그래픽 처리 유닛(graphics processing unit(GPU))일 수 있다. 일 실시예에서, GPU는 수학적으로 집중적인 애플리케이션을 처리하도록 설계된 전문화된 전자 회로인 프로세서이다. GPU는 컴퓨터 그래픽 애플리케이션, 이미지, 비디오 등에 공통인 수학적으로 집중적인 데이터와 같은 대형 데이터 블록의 병렬 처리에 효율적인 병렬 구조를 가질 수 있다.Computer system 1700 may be any well-known computer capable of performing the functions described herein. Computer system 1700 includes one or more processors (also referred to as central processing units or CPUs), such as processor 1704. Processor 1704 is coupled to a communications infrastructure or bus 1706. Each of the one or more processors 1704 may be a graphics processing unit (GPU). In one embodiment, a GPU is a processor, a specialized electronic circuit designed to handle mathematically intensive applications. A GPU may have a parallel structure that is efficient for parallel processing of large blocks of data, such as mathematically intensive data common to computer graphics applications, images, video, and the like.

또한, 컴퓨터 시스템(1700)은 사용자 입/출력 인터페이스(들)(1702)를 통해 통신 인프라스트럭처(1706)와 통신하는 모니터, 키보드, 포인팅 장치 등과 같은 사용자 입/출력 장치(1703)를 포함한다.Computer system 1700 also includes user input/output devices 1703, such as monitors, keyboards, pointing devices, etc., that communicate with communications infrastructure 1706 via user input/output interface(s) 1702.

또한, 컴퓨터 시스템(1700)은 랜덤 액세스 메모리(random access memory(RAM))와 같은 주 또는 주요 메모리(1708)를 포함한다. 주 메모리(1708)는 하나 이상의 레벨의 캐시를 포함할 수 있다. 주 메모리(1708)는 그 안에 저장된 제어 로직(즉, 컴퓨터 소프트웨어) 및/또는 데이터를 가진다. 또한, 컴퓨터 시스템(1700)은 하나 이상의 보조 저장 장치 또는 메모리(1710)를 포함할 수 있다. 보조 메모리(1710)는, 예를 들어, 하드 디스크 드라이브(1712) 및/또는 리무버블(removable) 저장 장치 또는 드라이브(1714)를 포함할 수 있다. 리무버블 저장 드라이브(1714)는 플로피 디스크 드라이브, 자기 테이프 드라이브, 컴팩트 디스크 드라이브, 광 저장 장치, 테이프 백업 장치 및/또는 임의의 다른 저장 장치/드라이브일 수 있다. 리무버블 저장 드라이브(1714)는 리무버블 저장 유닛(1718)과 상호 작용할 수 있다. 리무버블 저장 유닛(1718)은 그 상에 저장된 컴퓨터 소프트웨어(제어 로직) 및/또는 데이터를 갖는 컴퓨터 사용 가능하거나 또는 판독 가능한 저장 장치를 포함한다. 리무버블 저장 유닛(1718)은 플로피 디스크, 자기 테이프, 컴팩트 디스크, DVD, 광 저장 디스크 및/또는 임의의 다른 컴퓨터 저장 장치일 수 있다. 리무버블 저장 드라이브(1714)는 잘 알려진 방식으로 리무버블 저장 유닛(1718)으로부터 판독하고 그리고/또는 리무버블 저장 유닛(1718)으로 기록한다.The computer system 1700 also includes a main or primary memory 1708, such as random access memory (RAM). Main memory 1708 may include one or more levels of cache. Main memory 1708 has control logic (ie, computer software) and/or data stored therein. In addition, computer system 1700 may include one or more secondary storage devices or memories 1710 . Secondary memory 1710 may include, for example, a hard disk drive 1712 and/or a removable storage device or drive 1714 . Removable storage drive 1714 may be a floppy disk drive, magnetic tape drive, compact disk drive, optical storage device, tape backup device, and/or any other storage device/drive. A removable storage drive 1714 can interact with a removable storage unit 1718 . Removable storage unit 1718 includes a computer usable or readable storage device having computer software (control logic) and/or data stored thereon. Removable storage unit 1718 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and/or any other computer storage device. Removable storage drive 1714 reads from and/or writes to removable storage unit 1718 in a well known manner.

예시적인 실시예에 따라, 보조 메모리(1710)는 컴퓨터 프로그램 및/또는 다른 명령어 및/또는 데이터가 컴퓨터 시스템(1700)에 의해 액세스될 수 있게 하도록 다른 수단, 방편 또는 다른 접근 방식을 포함할 수 있다. 이러한 수단, 방편 또는 다른 접근 방식은, 예를 들어, 리무버블 저장 유닛(1722) 및 인터페이스(1720)를 포함할 수 있다. 리무버블 저장 유닛(1722) 및 인터페이스(1720)의 예는, 프로그램 카트리지 및 카트리지 인터페이스(예를 들어, 비디오 게임 장치에서 찾아볼 수 있는 것), 리무버블 메모리 칩(예를 들어, EPROM 또는 PROM) 및 연관된 소켓, 메모리 스틱 및 USB 포트, 메모리 카드 및 연관된 메모리 카드 슬롯 그리고/또는 임의의 다른 리무버블 저장 유닛 및 연관된 인터페이스를 포함할 수 있다.According to an exemplary embodiment, secondary memory 1710 may include other means, measures, or other approaches to enable computer programs and/or other instructions and/or data to be accessed by computer system 1700. . Such means, expedients or other approaches may include, for example, a removable storage unit 1722 and an interface 1720 . Examples of removable storage unit 1722 and interface 1720 are program cartridges and cartridge interfaces (eg, those found in video game devices), removable memory chips (eg, EPROM or PROM). and associated sockets, memory stick and USB ports, memory cards and associated memory card slots and/or any other removable storage units and associated interfaces.

컴퓨터 시스템(1700)은 통신 또는 네트워크 인터페이스(1724)를 더 포함할 수 있다. 통신 인터페이스(1724)는 컴퓨터 시스템(1700)으로 하여금 원격 장치, 원격 네트워크, 원격 엔티티 등(개별적으로 그리고 집합적으로 참조 번호 1728로 참조됨)의 임의의 조합과 통신하고 상호 작용할 수 있게 한다. 예를 들어, 통신 인터페이스(1724)는 컴퓨터 시스템(1700)이 유선 및/또는 무선일 수 있고 LAN, WAN, 인터넷 등의 임의의 조합을 포함할 수 있는 통신 경로(1726)를 통해 원격 장치(1728)와 통신할 수 있게 한다. 제어 로직 및/또는 데이터는 통신 경로(1726)를 통해 컴퓨터 시스템(1700)으로 그리고 컴퓨터 시스템(1700)으로부터 전송될 수 있다.Computer system 1700 may further include a communications or network interface 1724 . Communications interface 1724 enables computer system 1700 to communicate and interact with any combination of remote devices, remote networks, remote entities, etc. (individually and collectively referenced by reference numeral 1728). For example, communication interface 1724 allows computer system 1700 to communicate with remote device 1728 via communication path 1726, which may be wired and/or wireless and may include any combination of LAN, WAN, Internet, and the like. ) to communicate with. Control logic and/or data may be transferred to and from computer system 1700 via communication path 1726 .

또한, 일 실시예에서, 그 상에 저장된 제어 로직(소프트웨어)을 갖는 유형의(tangible) 컴퓨터 사용 가능하거나 또는 판독 가능한 매체를 포함하는 유형의 제조 장치 또는 물품은 본 명세서에서 컴퓨터 프로그램 제품 또는 프로그램 저장 장치라 한다. 이것은, 컴퓨터 시스템(1700), 메인 메모리(1708), 보조 메모리(1710) 및 리무버블 저장 유닛(1718, 18722)과, 전술한 것의 임의의 조합을 구체화하는 유형의 제조 물품을 포함하지만 이에 한정되지 않는다. 이러한 제어 로직은, 하나 이상의 데이터 처리 장치(예를 들어, 컴퓨터 시스템(1700))에 의해 실행될 때, 이러한 데이터 처리 장치가 본 명세서에 설명된 바와 같이 동작하게 한다.Further, in one embodiment, a tangible manufacturing apparatus or article comprising a tangible computer usable or readable medium having control logic (software) stored thereon is herein referred to as a computer program product or program storage called a device This includes, but is not limited to, articles of manufacture of a tangible form embodying computer system 1700, main memory 1708, secondary memory 1710, and removable storage units 1718, 18722, and any combination of the foregoing. don't This control logic, when executed by one or more data processing devices (eg, computer system 1700), causes such data processing devices to operate as described herein.

본 개시 내용에 포함된 교시 내용에 기초하여, 도 17에 도시된 것과 다른 데이터 처리 장치, 컴퓨터 시스템 및/또는 컴퓨터 아키텍처를 이용하여 본 개시 내용의 실시예들을 형성하고 사용하는 방법은 관련 기술 분야(들)에서의 통상의 기술자에게 명백할 것이다. 특히, 실시예들은 본 명세서에서 설명된 것이 아닌 소프트웨어, 하드웨어 및/또는 운영 체제 구현예로 동작할 수 있다.Based on the teachings contained in this disclosure, how to make and use embodiments of the present disclosure using a data processing device, computer system, and/or computer architecture other than that shown in FIG. 17 is in the related art ( s) will be clear to those skilled in the art. In particular, embodiments may operate with software, hardware, and/or operating system implementations other than those described herein.

[발명의 내용] 및 [요약서] 부분이 아닌 [발명을 실시하기 위한 구체적인 내용] 부분이 청구범위를 해석하는데 사용되도록 의도된다. [발명의 내용] 및 [요약서] 부분은 발명자(들)에 의해 고려되는 본 개시 내용의 하나 이상의 예시적인 실시예를 설명하지만 모든 예시적인 실시예를 설명하지는 않을 수 있고, 따라서, 어떠한 방식으로도 본 개시 내용 또는 첨부된 청구범위를 제한하려고 의도되지 않는다.It is intended that the [Specific Contents for Carrying Out the Invention] section, not the [Contents of the Invention] and [Abstract] sections, be used to interpret the claims. The [Summary] and [Abstract] sections describe one or more illustrative embodiments of the present disclosure contemplated by the inventor(s), but may not describe all illustrative embodiments, and thus, in no way It is not intended to limit the disclosure or the appended claims.

본 개시 내용이 예시적인 분야 및 애플리케이션을 위한 예시적인 실시예를 참조하여 본 명세서에서 설명되었지만, 본 개시 내용이 그에 한정되지 않는다는 것이 이해되어야 한다. 그에 대한 다른 실시예 및 수정이 가능하며, 본 개시 내용의 범위 및 사상 내에 있다. 예를 들어, 그리고 본 문단의 일반성을 제한하지 않으면서, 실시예들은 도면에 도시되거나 본 명세서에서 설명된 소프트웨어, 하드웨어, 펌웨어 및/또는 엔티티에 제한되지 않는다. 또한, 실시예들은(본 명세서에 명시적으로 설명되는지 여부에 관계 없이) 본 명세서에 설명된 예들을 넘는 분야 및 애플리케이션에 대하여 상당히 활용될 수 있다.Although the present disclosure has been described herein with reference to exemplary embodiments for exemplary fields and applications, it should be understood that the present disclosure is not limited thereto. Other embodiments and modifications thereof are possible and within the scope and spirit of this disclosure. For example, and without limiting the generality of this paragraph, the embodiments are not limited to the software, hardware, firmware and/or entities shown in the drawings or described herein. Further, the embodiments (regardless of whether explicitly described herein or not) may be utilized significantly for fields and applications beyond the examples described herein.

실시예들은 본 명세서에서 특정된 기능들의 구현과 이들의 관계를 나타내는 기능 구성 블록들의 도움으로 설명되었다. 이 기능 구성 블록들의 경계는 설명의 편의를 위하여 본 명세서에서 임의로 정의되었다. 특정된 기능과 관계(또는 이들의 균등물)가 적합하게 수행되는 한 대안적인 경계가 정의될 수 있다. 또한, 대안적인 실시예는 본 명세서에 설명된 것과 다른 순서로 기능 블록, 단계, 동작, 방법 등을 수행할 수 있다.Embodiments have been described with the aid of functional building blocks that represent the implementation of the functions specified herein and their relationship. The boundaries of these functional building blocks are arbitrarily defined in this specification for convenience of description. Alternative boundaries may be defined as long as the specified functions and relationships (or equivalents thereof) are performed suitably. Also, alternative embodiments may perform functional blocks, steps, acts, methods, etc. in an order different from that described herein.

본 개시 내용의 폭과 범위는 어떠한 전술한 예시적인 실시예들에 의해서도 제한되어서는 안 되며, 이어지는 청구범위 및 이의 균등물에 따라서만 정의되어야 한다.The breadth and scope of this disclosure should not be limited by any of the foregoing exemplary embodiments, but should be defined only in accordance with the following claims and equivalents thereto.

Claims

receiving, by at least one processor, a sentence;
identifying, by the at least one processor, one or more target words in the sentence based at least in part on one or more types of grammatical errors, each of the one or more target words being one of the one or more types of grammatical errors; corresponds to at least one -;
With respect to at least one of the one or more target words, an artificial neural network model trained for the grammatical error type is used by the at least one processor to determine the corresponding grammatical error type of the target word. estimating a classification, the model comprising: (i) determining a context vector of the target word based at least in part on at least one word before and after the target word in the sentence; two recurrent neural networks configured to output and (ii) a feedforward neural network configured to output a classification value of the target word for the grammatical error type based at least in part on the context vector of the target word. neural network) -; and
detecting, by the at least one processor, a grammatical error in the sentence based at least in part on the target word and the estimated classification of the target word;
Including, grammatical error detection method.

According to claim 1,
The estimating step is
providing the context vector of the target word based at least in part on at least one word before and after the target word in the sentence using the two recurrent neural networks; and
providing the classification value of the target word for the type of grammatical error based at least in part on the context vector of the target word using the feedforward neural network;
Further comprising a grammatical error detection method.

According to claim 2,
wherein the context vector of the target word is provided based at least in part on a word lemma of the target word.

According to claim 2,
The estimating step is
Generating a first word embedding vector set, wherein each word embedding vector in the first word embedding vector set is generated based on a corresponding word among at least one word before the target word in the sentence; ; and
generating a second word embedding vector set, wherein each word embedding vector in the second word embedding vector set is generated based on a corresponding word among at least one word after the target word in the sentence;
Further comprising a grammatical error detection method.

According to claim 4,
A method for detecting grammatical errors, wherein the number of dimensions of each word embedding vector is at least 100.

According to claim 1,
the at least one word before the target word includes all words before the target word in the sentence; And
Wherein the at least one word after the target word includes all words after the target word in the sentence.

According to claim 1,
wherein the number of at least one word before the target word and/or the number of at least one word after the target word is determined based at least in part on the type of grammatical error.

According to claim 2,
The estimating step is
providing a context weight vector of the target word based at least in part on at least one word before the target word and at least one word after the target word in the sentence; and
applying the context weight vector to the context vector;
Further comprising a grammatical error detection method.

According to claim 4,
Providing the context vector,
providing a first context vector of the target word based at least in part on the first set of word embedding vectors using a first one of the two recurrent neural networks;
providing a second context vector of the target word based at least in part on the second set of word embedding vectors using a second one of the two recurrent neural networks; and
concatenating the first and second context vectors to provide the context vector;
Further comprising a grammatical error detection method.

According to claim 9,
the first set of word embedding vectors is provided to the first recurrent neural network starting from the word embedding vector of the word at the beginning of the sentence; And
wherein the second set of word embedding vectors is provided to the second recurrent neural network starting from a word embedding vector of a word at the end of the sentence.

According to claim 1,
Wherein the number of hidden units in each of the two recurrent neural networks is at least 300.

According to claim 1,
The feedforward neural network,
a first layer having a first activation function of a fully connected linear operation on the context vector; and
A second layer connected to the first layer and having a second activation function for generating the classification value.
Including, grammatical error detection method.

According to claim 1,
Wherein the classification value is a probability distribution of the target word for a plurality of classes associated with the grammatical error type.

According to claim 1,
The detection step is
determining an actual classification of the target word based on a part of speech (PoS) tag and a text token of the sentence;
comparing the estimated classification of the target word with the actual classification of the target word; and
detecting the grammatical error in the sentence when the actual classification does not match the estimated classification of the target word;
Further comprising a grammatical error detection method.

According to claim 1,
In response to detecting the grammatical error in the sentence, correcting the grammatical error using the estimated classification of the target word.

According to claim 1,
For each of the one or more target words, a corresponding classification of the target word is estimated for the corresponding grammatical error type using a corresponding artificial neural network model trained for the grammatical error type, and the target word is estimated. generating a grammatical error result of the target word by comparing the classified classification with an actual classification of the target word;
applying a weight to each of the grammatical error results of the one or more target words based at least in part on the corresponding grammatical error type; and
providing a grammar score of the sentence based on the grammatical error result of the one or more target words and the weights;
Further comprising a grammatical error detection method.

According to claim 16,
wherein the grammar score is provided based at least in part on information associated with a user from whom the sentence is received.

According to claim 1,
Wherein the model is trained by native training samples.

According to claim 1,
Wherein the two recurrent neural networks and the feedforward neural network are jointly trained.

According to claim 1,
The model is
another recurrent neural network configured to output a set of initial context vectors to be input to the two recurrent neural networks to generate the context vectors; and
Another feedforward neural network configured to output a context weight vector to be applied to the context vector.
Further comprising a grammatical error detection method.

According to claim 20,
wherein all the recurrent neural networks and feedforward neural networks are jointly trained by raw training samples.

Memory; and
at least one processor coupled to the memory
including,
The at least one processor,
receive a sentence;
identify one or more target words in the sentence based at least in part on one or more types of grammatical errors, each of the one or more target words corresponding to at least one of the one or more types of grammatical errors;
For at least one of the one or more target words, estimate a classification of the target word for the corresponding grammatical error type using an artificial neural network model trained for the grammatical error type, the model comprising: , (i) two recurrent neural networks configured to generate a context vector of the target word based at least in part on at least one word before the target word and at least one word after the target word in the sentence. neural network and (ii) a feedforward neural network configured to output a classification value of the target word for the type of grammatical error based at least in part on the context vector of the target word; And
detect a grammatical error in the sentence based at least in part on the target word and the estimated classification of the target word;
configured, a grammatical error detection system.

The method of claim 22,
In order to estimate the classification of the target word, the at least one processor,
providing the context vector of the target word based at least in part on at least one word before and after the target word in the sentence using the two recurrent neural networks; And
provide the classification value of the target word for the type of grammatical error based at least in part on the context vector of the target word using the feedforward neural network;
configured, a grammatical error detection system.

According to claim 23,
wherein the context vector of the target word is provided based at least in part on a word lemma of the target word.

According to claim 23,
In order to estimate the classification of the target word, the at least one processor,
generating a set of first word embedding vectors, each word embedding vector in the set of first word embedding vectors being generated based on a corresponding word of at least one word before the target word in the sentence; And
generate a second set of word embedding vectors, wherein each word embedding vector in the set of second word embedding vectors is generated based on a corresponding word of at least one word after the target word in the sentence;
configured, a grammatical error detection system.

According to claim 25,
A grammar error detection system, wherein the number of dimensions of each word embedding vector is at least 100.

The method of claim 22,
the at least one word before the target word includes all words before the target word in the sentence; And
The grammatical error detection system of claim 1 , wherein the at least one word after the target word includes all words after the target word in the sentence.

The method of claim 22,
wherein the number of at least one word before the target word and/or the number of at least one word after the target word is determined based at least in part on the type of grammatical error.

According to claim 23,
In order to estimate the classification of the target word, the at least one processor,
provide a context weight vector of the target word based at least in part on at least one word before the target word and at least one word after the target word in the sentence; And
to apply the context weight vector to the context vector;
configured, a grammatical error detection system.

According to claim 25,
To provide the context vector of the target word, the at least one processor:
providing a first context vector of the target word based at least in part on the first set of word embedding vectors using a first one of the two recurrent neural networks;
providing a second context vector of the target word based at least in part on the second set of word embedding vectors using a second one of the two recurrent neural networks; And
Concatenating the first and second context vectors to provide the context vector
configured, a grammatical error detection system.

31. The method of claim 30,
the first set of word embedding vectors is provided to the first recurrent neural network starting from the word embedding vector of the word at the beginning of the sentence; And
wherein the second set of word embedding vectors is provided to the second recurrent neural network starting from a word embedding vector of a word at the end of the sentence.

The method of claim 22,
The grammar error detection system of claim 1 , wherein the number of hidden units in each of the two recurrent neural networks is at least 300.

The method of claim 22,
The feedforward neural network,
a first layer having a first activation function of a fully connected linear operation on the context vector; and
A second layer connected to the first layer and having a second activation function for generating the classification value.
Including, grammatical error detection system.

The method of claim 22,
Wherein the classification value is a probability distribution of the target word for a plurality of classes associated with the grammatical error type.

The method of claim 22,
To detect grammatical errors, the at least one processor:
determine an actual classification of the target word based on a part of speech (PoS) tag and a text token of the sentence;
compare the estimated classification of the target word with the actual classification of the target word; And
detect the grammatical error in the sentence when the actual classification does not match the estimated classification of the target word;
configured, a grammatical error detection system.

The method of claim 22,
The at least one processor,
and in response to detecting the grammatical error in the sentence, correct the grammatical error using the estimated classification of the target word.

The method of claim 22,
The at least one processor,
For each of the one or more target words, a corresponding classification of the target word is estimated for the corresponding grammatical error type using a corresponding artificial neural network model trained for the grammatical error type, and the target word is estimated. compare the classified classification with an actual classification of the target word to generate a grammatical error result of the target word;
apply a weight to each of the grammatical error results of the one or more target words based at least in part on the corresponding grammatical error type; And
To provide a grammar score of the sentence based on the grammatical error result of the one or more target words and the weight.
Further configured, a grammatical error detection system.

38. The method of claim 37,
wherein the grammar score is provided based at least in part on information associated with a user from whom the sentence is received.

The method of claim 22,
wherein the model is trained by native training samples.

The method of claim 22,
Wherein the two recurrent neural networks and the feedforward neural network are jointly trained.

The method of claim 22,
The model is
another recurrent neural network configured to output a set of initial context vectors to be input to the two recurrent neural networks to generate the context vectors; and
Another feedforward neural network configured to output a context weight vector to be applied to the context vector.
Further comprising a grammatical error detection system.

The method of claim 41 ,
wherein all the recurrent neural networks and feedforward neural networks are jointly trained by raw training samples.

When executed by at least one computing device, the at least one computing device comprises:
receiving the sentence;
identifying one or more target words in the sentence based at least in part on one or more types of grammatical errors, each of the one or more target words corresponding to at least one of the one or more types of grammatical errors;
An operation of estimating a classification of the target word for the corresponding grammatical error type using an artificial neural network model trained for the grammatical error type, with respect to the one or more target words, the model comprising: ( i) two recurrent neural networks configured to output a context vector of the target word based at least in part on at least one word before the target word and at least one word after the target word in the sentence. ) and (ii) a feedforward neural network configured to output a classification value of the target word for the type of grammatical error based at least in part on the context vector of the target word; and
Detecting grammatical errors in the sentence based at least in part on the target word and the estimated classification of the target word.
A tangible computer readable device having stored thereon instructions for performing operations comprising:

providing, by at least one processor, an artificial neural network model for estimating a classification of a target word in a sentence with respect to a type of grammatical error, the model comprising: (i) the two recurrent neural networks configured to output a context vector of the target word based at least in part on at least one word before the target word and at least one word after the target word; and (ii) the a feedforward neural network configured to output a classification value of the target word based at least in part on the context vector of the target word;
obtaining, by the at least one processor, a set of training samples, each training sample in the set of training samples being a sentence containing a target word for the grammatical error type and an actual word of the target word for the grammatical error type. contains classification -; and
By the at least one processor, a first parameter set associated with the recurrent neural network and a first set of parameters associated with the feedforward neural network based, at least in part, on the difference between the actual and estimated classification of the target word in each training sample. jointly adjusting a second set of parameters;
Including, artificial neural network model training method.

45. The method of claim 44,
An artificial neural network model training method, wherein each training sample is a native training sample without grammatical errors.

45. The method of claim 44,
The recurrent neural network is a gated recurrent unit (GRU) neural network, and the feedforward neural network is a multi-layer perceptron (MLP) neural network.

45. The method of claim 44,
The model is
Another feedforward neural network configured to output a context weight vector to be applied to the context vector.
Further comprising, artificial neural network model training method.

The method of claim 47,
The step of jointly coordinating,
Jointly adjust the first and second parameter sets and a third parameter set associated with the different feedforward neural networks based at least in part on the difference between the actual and estimated classification of the target word in each training sample. step to do
Including, artificial neural network model training method.

45. The method of claim 44,
For each training sample,
generating a first set of word embedding vectors, wherein each word embedding vector in the first set of word embedding vectors corresponds to at least one corresponding word of at least one word before the target word in the training sample; Partially created based on -; and
generating a second set of word embedding vectors, wherein each word embedding vector in the second set of word embedding vectors is generated based at least in part on a corresponding one of at least one word after the target word in the training sample; -
Further comprising, artificial neural network model training method.

The method of claim 49,
An artificial neural network model training method, wherein the number of dimensions of each word embedding vector is at least 100.

The method of claim 49,
the at least one word before the target word includes all words before the target word in the sentence; And
The method of training an artificial neural network model, wherein the at least one word after the target word includes all words after the target word in the sentence.

The method of claim 49,
For each training sample,
providing a first context vector of the target word based at least in part on the first set of word embedding vectors using a first one of the two recurrent neural networks;
providing a second context vector of the target word based at least in part on the second set of word embedding vectors using a second one of the two recurrent neural networks; and
concatenating the first and second context vectors to provide the context vector;
Further comprising, artificial neural network model training method.

52. The method of claim 52,
the first set of word embedding vectors is provided to the first recurrent neural network starting from the word embedding vector of the word at the beginning of the sentence; And
wherein the second set of word embedding vectors is provided to the second recurrent neural network starting from a word embedding vector of a word at the end of the sentence.

52. The method of claim 52,
Wherein the first and second context vectors do not include semantic features of the sentence in the training sample.

45. The method of claim 44,
The artificial neural network model training method, wherein the number of hidden units in each of the two recurrent neural networks is at least 300.

45. The method of claim 44,
The feedforward neural network,
a first layer having a first activation function of a fully connected linear operation on the context vector; and
A second layer connected to the first layer and having a second activation function for generating the classification value.
Including, artificial neural network model training method.

Memory; and
at least one processor coupled to the memory
including,
The at least one processor,
Provide an artificial neural network model to estimate a classification of a target word in a sentence for a type of grammatical error, the model comprising: (i) at least one word before the target word in the sentence and two recurrent neural networks configured to output a context vector of the target word based at least in part on the at least one word after the target word, and (ii) the context vector of the target word at least a feedforward neural network configured to output a classification value of the target word based in part;
obtaining a set of training samples, each training sample in the set of training samples comprising a sentence containing a target word for the grammatical error type and an actual classification of the target word for the grammatical error type; and
A first set of parameters associated with the recurrent neural network and a second set of parameters associated with the feedforward neural network are jointly based at least in part on the difference between the actual classification and the estimated classification of the target word in each training sample ( to adjust jointly
An artificial neural network model training system, comprising:

58. The method of claim 57,
An artificial neural network model training system, wherein each training sample is a native training sample without grammatical errors.

58. The method of claim 57,
The artificial neural network model training system, wherein the recurrent neural network is a GRU neural network and the feedforward neural network is an MLP neural network.

58. The method of claim 57,
The model is
Another feedforward neural network configured to output a context weight vector to be applied to the context vector.
Further comprising, artificial neural network model training system.

61. The method of claim 60,
To jointly adjust the first parameter set and the second parameter set, the at least one processor comprises:
Jointly adjust the first and second parameter sets and a third parameter set associated with the different feedforward neural networks based at least in part on the difference between the actual and estimated classification of the target word in each training sample. so
An artificial neural network model training system, comprising:

58. The method of claim 57,
The at least one processor, for each training sample,
generating a first set of word embedding vectors, each word embedding vector in the first set of word embedding vectors at least partially corresponding to a corresponding one of the at least one word preceding the target word in the training sample; Created based on -; and
generate a second set of word embedding vectors, each word embedding vector in the second set of word embedding vectors being generated based at least in part on a corresponding one of the at least one word after the target word in the training sample;
Further comprising, an artificial neural network model training system.

63. The method of claim 62,
The artificial neural network model training system, wherein the number of dimensions of each word embedding vector is at least 100.

63. The method of claim 62,
the at least one word before the target word includes all words before the target word in the sentence; And
The artificial neural network model training system, wherein the at least one word after the target word includes all words after the target word in the sentence.

63. The method of claim 62,
The at least one processor, for each training sample,
providing a first context vector of the target word based at least in part on the first set of word embedding vectors using a first one of the two recurrent neural networks;
providing a second context vector of the target word based at least in part on the second set of word embedding vectors using a second one of the two recurrent neural networks; And
Concatenating the first and second context vectors to provide the context vector
Further comprising, an artificial neural network model training system.

66. The method of claim 65,
the first set of word embedding vectors is provided to the first recurrent neural network starting from the word embedding vector of the word at the beginning of the sentence; And
wherein the second set of word embedding vectors is provided to the second recurrent neural network starting from a word embedding vector of a word at the end of the sentence.

66. The method of claim 65,
The artificial neural network model training system of claim 1 , wherein the first and second context vectors do not include semantic features of the sentences in the training samples.

58. The method of claim 57,
The artificial neural network model training system, wherein the number of hidden units in each of the two recurrent neural networks is at least 300.

58. The method of claim 57,
The feedforward neural network,
a first layer having a first activation function of a fully connected linear operation on the context vector; and
A second layer connected to the first layer and having a second activation function for generating the classification value.
Including, artificial neural network model training system.

When executed by at least one computing device, the at least one computing device comprises:
providing an artificial neural network model to estimate a classification of a target word in a sentence for a type of grammatical error, the model comprising: (i) at least one word before the target word in the sentence and (ii) two recurrent neural networks configured to output a context vector of the target word based at least in part on the at least one word after the target word, and (ii) the context vector of the target word. a feedforward neural network configured to output a classification value of the target word based at least in part on the basis;
obtaining a set of training samples, each training sample in the set of training samples comprising a sentence containing a target word for the grammatical error type and an actual classification of the target word for the grammatical error type; and
A first set of parameters associated with the recurrent neural network and a second set of parameters associated with the feedforward neural network are jointly based at least in part on the difference between the actual classification and the estimated classification of the target word in each training sample ( jointly coordinating action
A tangible computer readable device having stored thereon instructions for performing operations comprising: