JP2020529666A

JP2020529666A - Deep context-based grammatical error correction using artificial neural networks

Info

Publication number: JP2020529666A
Application number: JP2020505241A
Authority: JP
Inventors: ホェィリン，; チュアンワン，; ルオビンリ，
Original assignee: Lingochamp Information Technology Shanghai Co Ltd
Current assignee: Lingochamp Information Technology Shanghai Co Ltd
Priority date: 2017-08-03
Filing date: 2017-08-03
Publication date: 2020-10-08
Anticipated expiration: 2037-08-03
Also published as: KR20200031154A; CN111226222A; CN111226222B; KR102490752B1; JP7031101B2; WO2019024050A1; MX2020001279A

Abstract

本明細書において、文法誤り訂正のための方法及びシステムが開示される。１つの例において、文が受信される。文内の１つ又は複数の対象語が１つ又は複数の文法誤りタイプに少なくとも部分的に基づいて識別される。１つ又は複数の対象語の各々は、１つ又は複数の文法誤りタイプのうちの少なくとも１つに対応する。１つ又は複数の対象語のうちの少なくとも１つについて、対応する文法誤りタイプに関連する対象語の分類が、文法誤りタイプについて訓練された人工ニューラルネットワークモデルを使用して推定される。文内の文法誤りが、対象語及び対象語の推定された分類に少なくとも部分的に基づいて検出される。【選択図】図１In this specification, methods and systems for correcting grammatical errors are disclosed. In one example, the sentence is received. One or more target words in a sentence are identified based on at least partly one or more grammatical error types. Each of the one or more subject words corresponds to at least one of the one or more grammatical error types. For at least one of one or more target words, the classification of the target words associated with the corresponding grammatical error type is estimated using an artificial neural network model trained for the grammatical error type. Grammatic errors in the sentence are detected, at least in part, on the subject word and the estimated classification of the subject word. [Selection diagram] Fig. 1

Description

background

[0001]本開示は、一般には、人工知能に関し、より詳細には、人工ニューラルネットワークを使用した文法誤り訂正に関する。 [0001] The present disclosure relates generally to artificial intelligence and, more specifically, to grammatical error correction using artificial neural networks.

[0002]自動文法誤り訂正（ＧＥＣ）は、英語を第二言語として学習する数百万の人々にとって必須で有用なツールである。英語を学習している書き手は、標準的な校正ツールによって対処されない様々な文法及び慣用法の間違いを犯す。文法誤り検出及び／又は訂正のための、高い精度及び再現率を有する自動システムを開発することは、自然言語処理（ＮＬＰ）において急速に成長している分野となっている。 [0002] Automatic Grammar Error Correction (GEC) is an essential and useful tool for millions of people learning English as a second language. Writers learning English make various grammatical and idiom mistakes that are not addressed by standard proofreading tools. Developing highly accurate and reproducible automated systems for grammatical error detection and / or correction has become a rapidly growing field in natural language processing (NLP).

[0003]当該自動システムには大きな可能性があるが、既知のシステムは、様々な文法誤りパターンをカバーすることの限界、及び、複雑な言語特徴エンジニアリング又は人間が注釈を付けた訓練サンプルの要件のような問題に直面している。 [0003] Although the automated system has great potential, known systems have limitations in covering various grammatical error patterns and requirements for complex language feature engineering or human annotated training samples. I am facing a problem like.

Overview

[0004]本開示は、一般には人工知能に関し、より詳細には、人工ニューラルネットワークを使用した文法誤り訂正に関する。 [0004] The present disclosure relates generally to artificial intelligence, and more specifically to grammatical error correction using artificial neural networks.

[0005]一例において、文法誤り検出のための方法が開示される。文が受信される。文内の１つ又は複数の対象語が１つ又は複数の文法誤りタイプに少なくとも部分的に基づいて識別される。１つ又は複数の対象語の各々は、１つ又は複数の文法誤りタイプのうちの少なくとも１つに対応する。１つ又は複数の対象語のうちの少なくとも１つについて、対応する文法誤りタイプに関連する対象語の分類が、文法誤りタイプについて訓練された人工ニューラルネットワークモデルを使用して推定される。モデルは、文内の対象語の前の少なくとも１つの単語及び対象語の後の少なくとも１つの単語に少なくとも部分的に基づいて、対象語の文脈ベクトルを出力するように構成されている２つの再帰型ニューラルネットワークを含む。モデルは、対象語の文脈ベクトルに少なくとも部分的に基づいて、文法誤りタイプに関連する対象語の分類値を出力するように構成されている順伝播型ニューラルネットワークをさらに含む。文内の文法誤りが、対象語及び対象語の推定された分類に少なくとも部分的に基づいて検出される。 [0005] In one example, a method for detecting grammatical errors is disclosed. The statement is received. One or more target words in a sentence are identified based on at least partly one or more grammatical error types. Each of the one or more subject words corresponds to at least one of the one or more grammatical error types. For at least one of one or more target words, the classification of the target words associated with the corresponding grammatical error type is estimated using an artificial neural network model trained for the grammatical error type. The model is configured to output a context vector for the target word, at least partially based on at least one word before the target word and at least one word after the target word in the sentence. Includes type neural networks. The model further includes a feedforward neural network that is configured to output the target word classification values associated with the grammatical error type, at least in part based on the target word context vector. Grammatic errors in the sentence are detected, at least in part, on the subject word and the estimated classification of the subject word.

[0006]別の例において、人工ニューラルネットワークモデルを訓練するための方法が提供される。文法誤りタイプに関連して文内の対象語の分類を推定するための人工ニューラルネットワークモデルが提供される。モデルは、文内の対象語の前の少なくとも１つの単語及び対象語の後の少なくとも１つの単語に少なくとも部分的に基づいて、対象語の文脈ベクトルを出力するように構成されている２つの再帰型ニューラルネットワークを含む。モデルは、対象語の文脈ベクトルに少なくとも部分的に基づいて、対象語の分類値を出力するように構成されている順伝播型ニューラルネットワークをさらに含む。訓練サンプルのセットが取得される。訓練サンプルのセット内の各訓練サンプルは、文法誤りタイプに関連する対象語を含む文、及び、文法誤りタイプに関連する対象語の実際の分類を含む。再帰型ニューラルネットワークと関連付けられるパラメータの第１のセット、及び、順伝播型ニューラルネットワークと関連付けられるパラメータの第２のセットが、各訓練サンプル内の対象語の実際の分類と推定された分類との間の差に少なくとも部分的に基づいて、共に訓練される。 [0006] In another example, a method for training an artificial neural network model is provided. An artificial neural network model is provided for estimating the classification of target words in a sentence in relation to grammatical error types. The model is configured to output a context vector for the target word, at least partially based on at least one word before the target word and at least one word after the target word in the sentence. Includes type neural networks. The model further includes a feedforward neural network that is configured to output the target word classification value, at least in part, based on the target word context vector. A set of training samples is obtained. Each training sample in the set of training samples contains a sentence containing the target word associated with the grammatical error type and the actual classification of the target word associated with the grammatical error type. The first set of parameters associated with the recurrent neural network and the second set of parameters associated with the feedforward neural network are the actual and estimated classifications of the target words in each training sample. Trained together, at least partially based on the differences between them.

[0007]異なる例において、文法誤り検出のためのシステムは、メモリと、メモリに結合されている少なくとも１つのプロセッサとを含む。少なくとも１つのプロセッサは、文を受信し、１つ又は複数の文法誤りタイプに少なくとも部分的に基づいて、文内の１つ又は複数の単語を識別するように構成されている。１つ又は複数の対象語の各々は、１つ又は複数の文法誤りタイプのうちの少なくとも１つに対応する。少なくとも１つのプロセッサは、１つ又は複数の対象語のうちの少なくとも１つについて、対応する文法誤りタイプに関連する対象語の分類を、文法誤りタイプについて訓練された人工ニューラルネットワークモデルを使用して推定するようにさらに構成されている。モデルは、文内の対象語の前の少なくとも１つの単語及び対象語の後の少なくとも１つの単語に少なくとも部分的に基づいて、対象語の文脈ベクトルを生成するように構成されている２つの再帰型ニューラルネットワークを含む。モデルは、対象語の文脈ベクトルに少なくとも部分的に基づいて、文法誤りタイプに関連する対象語の分類値を出力するように構成されている順伝播型ニューラルネットワークをさらに含む。少なくとも１つのプロセッサは、文内の文法誤りを、対象語及び対象語の推定された分類に少なくとも部分的に基づいて検出するようにさらに構成されている。 [0007] In a different example, the system for grammatical error detection includes memory and at least one processor attached to the memory. At least one processor is configured to receive a sentence and identify one or more words in the sentence based at least in part on one or more grammatical error types. Each of the one or more subject words corresponds to at least one of the one or more grammatical error types. At least one processor uses an artificial neural network model trained for grammatical error types to classify the target words associated with the corresponding grammatical error types for at least one of the one or more target words. It is further configured to estimate. The model is configured to generate a context vector for the target word based on at least one word before the target word and at least one word after the target word in the sentence. Includes type neural networks. The model further includes a feedforward neural network that is configured to output the target word classification values associated with the grammatical error type, at least in part based on the target word context vector. At least one processor is further configured to detect grammatical errors in a sentence based at least in part on the target word and the estimated classification of the target word.

[0008]別の例において、文法誤り検出のためのシステムは、メモリと、メモリに結合されている少なくとも１つのプロセッサとを含む。少なくとも１つのプロセッサは、文法誤りタイプに関連して文内の対象語の分類を推定するための人工ニューラルネットワークモデルを提供するように構成されている。モデルは、文内の対象語の前の少なくとも１つの単語及び対象語の後の少なくとも１つの単語に少なくとも部分的に基づいて、対象語の文脈ベクトルを出力するように構成されている２つの再帰型ニューラルネットワークを含む。モデルは、対象語の文脈ベクトルに少なくとも部分的に基づいて、対象語の分類値を出力するように構成されている順伝播型ニューラルネットワークをさらに含む。少なくとも１つのプロセッサは、訓練サンプルのセットを取得するようにさらに構成されている。訓練サンプルのセット内の各訓練サンプルは、文法誤りタイプに関連する対象語を含む文、及び、文法誤りタイプに関連する対象語の実際の分類を含む。少なくとも１つのプロセッサは、再帰型ニューラルネットワークと関連付けられるパラメータの第１のセット、及び、順伝播型ニューラルネットワークと関連付けられるパラメータの第２のセットを、各訓練サンプル内の対象語の実際の分類と推定された分類との間の差に少なくとも部分的に基づいて、共に調整するようにさらに構成されている。 [0008] In another example, the system for grammatical error detection includes memory and at least one processor attached to the memory. At least one processor is configured to provide an artificial neural network model for estimating the classification of target words in a sentence in relation to grammatical error types. The model is configured to output a context vector for the target word, at least partially based on at least one word before the target word and at least one word after the target word in the sentence. Includes type neural networks. The model further includes a feedforward neural network that is configured to output the target word classification value, at least in part, based on the target word context vector. At least one processor is further configured to obtain a set of training samples. Each training sample in the set of training samples contains a sentence containing the target word associated with the grammatical error type and the actual classification of the target word associated with the grammatical error type. At least one processor uses a first set of parameters associated with a recurrent neural network and a second set of parameters associated with a feedforward neural network with the actual classification of the target words in each training sample. It is further configured to adjust together, at least partially based on the difference from the estimated classification.

[0009]他の概念は、文法誤り検出及び人工ニューラルネットワークモデル訓練のためのソフトウェアに関する。本概念によるソフトウェア製品は、少なくとも１つのコンピュータ可読非一時的デバイス、及び、デバイスによって担持される情報を含む。デバイスによって担持される情報は、要求と関連するパラメータ又は動作パラメータに関する実行可能命令であってもよい。 [0009] Another concept relates to software for grammatical error detection and artificial neural network model training. A software product according to this concept includes at least one computer-readable non-temporary device and information carried by the device. The information carried by the device may be actionable instructions regarding parameters or operating parameters associated with the request.

[0010]一例において、有形コンピュータ可読非一時的デバイスは、文法誤り検出のための命令を記録されており、命令は、コンピュータによって実行されると、コンピュータに一連の動作を実施させる。文が受信される。文内の１つ又は複数の対象語が１つ又は複数の文法誤りタイプに少なくとも部分的に基づいて識別される。１つ又は複数の対象語の各々は、１つ又は複数の文法誤りタイプのうちの少なくとも１つに対応する。１つ又は複数の対象語のうちの少なくとも１つについて、対応する文法誤りタイプに関連する対象語の分類が、文法誤りタイプについて訓練された人工ニューラルネットワークモデルを使用して推定される。モデルは、文内の対象語の前の少なくとも１つの単語及び対象語の後の少なくとも１つの単語に少なくとも部分的に基づいて、対象語の文脈ベクトルを出力するように構成されている２つの再帰型ニューラルネットワークを含む。モデルは、対象語の文脈ベクトルに少なくとも部分的に基づいて、文法誤りタイプに関連する対象語の分類値を出力するように構成されている順伝播型ニューラルネットワークをさらに含む。文内の文法誤りが、対象語及び対象語の推定された分類に少なくとも部分的に基づいて検出される。 [0010] In one example, a tangible computer-readable non-temporary device records instructions for detecting grammatical errors, which, when executed by the computer, cause the computer to perform a series of actions. The statement is received. One or more target words in a sentence are identified based on at least partly one or more grammatical error types. Each of the one or more subject words corresponds to at least one of the one or more grammatical error types. For at least one of one or more target words, the classification of the target words associated with the corresponding grammatical error type is estimated using an artificial neural network model trained for the grammatical error type. The model is configured to output a context vector for the target word, at least partially based on at least one word before the target word and at least one word after the target word in the sentence. Includes type neural networks. The model further includes a feedforward neural network that is configured to output the target word classification values associated with the grammatical error type, at least in part based on the target word context vector. Grammatic errors in the sentence are detected, at least in part, on the subject word and the estimated classification of the subject word.

[0011]別の例において、有形コンピュータ可読非一時的デバイスは、人工ニューラルネットワークモデルを訓練するための命令を記録されており、命令は、コンピュータによって実行されると、コンピュータに一連の動作を実施させる。文法誤りタイプに関連して文内の対象語の分類を推定するための人工ニューラルネットワークモデルが提供される。モデルは、文内の対象語の前の少なくとも１つの単語及び対象語の後の少なくとも１つの単語に少なくとも部分的に基づいて、対象語の文脈ベクトルを出力するように構成されている２つの再帰型ニューラルネットワークを含む。モデルは、対象語の文脈ベクトルに少なくとも部分的に基づいて、対象語の分類値を出力するように構成されている順伝播型ニューラルネットワークをさらに含む。訓練サンプルのセットが取得される。訓練サンプルのセット内の各訓練サンプルは、文法誤りタイプに関連する対象語を含む文、及び、文法誤りタイプに関連する対象語の実際の分類を含む。再帰型ニューラルネットワークと関連付けられるパラメータの第１のセット、及び、順伝播型ニューラルネットワークと関連付けられるパラメータの第２のセットが、各訓練サンプル内の対象語の実際の分類と推定された分類との間の差に少なくとも部分的に基づいて、共に訓練される。 [0011] In another example, a tangible computer-readable non-temporary device has recorded instructions for training an artificial neural network model, and when executed by the computer, the instructions perform a series of actions on the computer. Let me. An artificial neural network model is provided for estimating the classification of target words in a sentence in relation to grammatical error types. The model is configured to output a context vector for the target word, at least partially based on at least one word before the target word and at least one word after the target word in the sentence. Includes type neural networks. The model further includes a feedforward neural network that is configured to output the target word classification value, at least in part, based on the target word context vector. A set of training samples is obtained. Each training sample in the set of training samples contains a sentence containing the target word associated with the grammatical error type and the actual classification of the target word associated with the grammatical error type. The first set of parameters associated with the recurrent neural network and the second set of parameters associated with the feedforward neural network are the actual and estimated classifications of the target words in each training sample. Trained together, at least partially based on the differences between them.

[0012]本概要は、本明細書において記載されている主題の理解を提供するためにいくつかの実施形態を例示することのみを目的として与えられている。したがって、上述した特徴は、例に過ぎず、本開示における主題の範囲又は精神を狭めるように解釈されるべきではない。本開示の他の特徴、態様、及び利点は、以下の詳細な説明、図面、及び特許請求の範囲から明らかになる。 [0012] This overview is provided solely for the purpose of exemplifying some embodiments to provide an understanding of the subject matter described herein. Therefore, the features described above are merely examples and should not be construed to narrow the scope or spirit of the subject matter in this disclosure. Other features, aspects, and advantages of the present disclosure will become apparent from the following detailed description, drawings, and claims.

[0013]本明細書に組み込まれ、本明細書の一部を形成する添付の図面は、提示されている開示を図解し、本明細書とともに、本開示の原理を説明し、当業者が本開示を作成し、使用することを可能にする役割をさらに果たす。 [0013] The accompanying drawings incorporated herein and forming part of this specification illustrate the disclosures presented, and together with this specification, explain the principles of this disclosure to those skilled in the art. It also plays a role in making disclosures and making them available for use.

[0014]図１は、一実施形態による文法誤り訂正（ＧＥＣ）システムを示すブロック図である。[0014] FIG. 1 is a block diagram showing a grammatical error correction (GEC) system according to an embodiment.

[0015]図２は、図１のシステムによって実施される自動文法誤り訂正の一例の図である。[0015] FIG. 2 is an example of automatic grammatical error correction performed by the system of FIG.

[0016]図３は、一実施形態による文法誤り訂正のための方法の一例を示す流れ図である。[0016] FIG. 3 is a flow chart showing an example of a method for correcting grammatical errors according to one embodiment.

[0017]図４は、一実施形態による、図１のシステムの分類ベースＧＥＣモジュールの一例を示すブロック図である。[0017] FIG. 4 is a block diagram showing an example of a classification-based GEC module of the system of FIG. 1 according to one embodiment.

[0018]図５は、一実施形態による、図１のシステムを使用した文内の対象語の分類の提供の一例の図である。[0018] FIG. 5 is a diagram of an example of providing classification of target words in a sentence using the system of FIG. 1 according to one embodiment.

[0019]図６は、一実施形態による文法誤り訂正のための人工ニューラルネットワーク（ＡＮＮ）モデルの一例を示す概略図である。[0019] FIG. 6 is a schematic diagram showing an example of an artificial neural network (ANN) model for correcting grammatical errors according to one embodiment.

[0020]図７は、一実施形態による文法誤り訂正のためのＡＮＮモデルの別の例を示す概略図である。[0020] FIG. 7 is a schematic diagram showing another example of the ANN model for grammatical error correction according to one embodiment.

[0021]図８は、一実施形態による図６のＡＮＮモデルの一例を示す詳細な概略図である。[0021] FIG. 8 is a detailed schematic diagram showing an example of the ANN model of FIG. 6 according to one embodiment.

[0022]図９は、一実施形態による文の文法誤り訂正のための方法の一例を示す流れ図である。[0022] FIG. 9 is a flow chart showing an example of a method for correcting grammatical errors in a sentence according to one embodiment.

[0023]図１０は、一実施形態による文法誤りタイプに関連して対象語を分類するための方法の一例を示す流れ図である。[0023] FIG. 10 is a flow chart showing an example of a method for classifying a target word in relation to a grammatical error type according to one embodiment.

[0024]図１１は、一実施形態による文法誤りタイプに関連して対象語を分類するための方法の別の例を示す流れ図である。[0024] FIG. 11 is a flow diagram showing another example of a method for classifying a target word in relation to a grammatical error type according to one embodiment.

[0025]図１２は、一実施形態による文法スコアを提供するための方法の一例を示す流れ図である。FIG. 12 is a flow chart showing an example of a method for providing a grammar score according to an embodiment.

[0026]図１３は、一実施形態によるＡＮＮモデル訓練システムを示すブロック図である。[0026] FIG. 13 is a block diagram showing an ANN model training system according to an embodiment.

[0027]図１４は、図１３のシステムによって使用される訓練サンプルの一例の図である。[0027] FIG. 14 is an example of a training sample used by the system of FIG.

[0028]図１５は、一実施形態による文法誤り訂正のためのＡＮＮモデル訓練のための方法の一例を示す流れ図である。[0028] FIG. 15 is a flow chart showing an example of a method for ANN model training for grammatical error correction according to one embodiment.

[0029]図１６は、一実施形態による文法誤り訂正のためのＡＮＮモデルの訓練の一例を示す概略図である。[0029] FIG. 16 is a schematic diagram showing an example of training of the ANN model for grammatical error correction according to one embodiment.

[0030]図１７は、本開示において記載されている様々な実施形態を実施するのに有用なコンピュータシステムの一例を示すブロック図である。[0030] FIG. 17 is a block diagram showing an example of a computer system useful for implementing the various embodiments described in the present disclosure.

[0031]本開示は、添付の図面を参照して説明される。図面において、一般に、同様の参照符号は、同一の又は機能的に同様の要素を示す。加えて、一般に、参照符号の左端の数字（複数可）は、当該参照符号が最初に現れる図面を識別する。 [0031] The present disclosure is described with reference to the accompanying drawings. In the drawings, similar reference numerals generally indicate the same or functionally similar elements. In addition, in general, the leftmost digit (s) of the reference code identifies the drawing in which the reference code first appears.

[0032]以下の詳細な説明において、関連する本開示の完全な理解を提供するために、例として多数の具体的な詳細が記載される。しかしながら、本開示は当該詳細なしに実践することができることが当業者には明らかなはずである。他の事例において、周知の方法、手順、システム、構成要素、及び／又は回路は、本開示の態様を不必要に曖昧にすることを回避するために、詳細を省いて、相対的に高いレベルにおいて説明されている。 [0032] In the following detailed description, a number of specific details are provided by way of example to provide a complete understanding of the relevant disclosure. However, it should be apparent to those skilled in the art that this disclosure can be practiced without such details. In other cases, well-known methods, procedures, systems, components, and / or circuits are at a relatively high level, omitting details, to avoid unnecessarily obscuring aspects of the present disclosure. It is explained in.

[0033]本明細書及び特許請求の範囲全体を通じて、用語は、明示的に陳述されている意味合いを超えて、文脈において示唆又は暗示される微妙な意味合いを有する場合がある。同様に、「１つの実施形態／例において」という語句は、本明細書において使用される場合、必ずしも同じ実施形態を参照するとは限らず、「別の実施形態／例において」という語句は、本明細書において使用される場合、必ずしも異なる実施形態を参照するとは限らない。例えば、特許請求される主題が、例示的な実施形態の組み合わせを全体的に又は部分的に含むことが意図される。 [0033] Throughout the specification and claims, terms may have subtle implications suggested or implied in the context beyond the expressive implications. Similarly, the phrase "in one embodiment / example", as used herein, does not necessarily refer to the same embodiment, and the phrase "in another embodiment / example" is a book. As used in the specification, it does not necessarily refer to different embodiments. For example, the claimed subject matter is intended to include, in whole or in part, a combination of exemplary embodiments.

[0034]概して、専門用語は、少なくとも部分的に文脈における使用法から理解され得る。例えば、「及び」、「又は」、又は「及び／又は」のような用語は、本明細書において使用される場合、当該用語が使用されている文脈に少なくとも部分的に依存し得る様々な意味合いを含むことができる。典型的には、「又は」は、Ａ、Ｂ又はＣなど、リストを関連付けるために使用される場合、Ａ、Ｂ、及びＣを意味するように意図され、ここでは包含的な意味で使用され、また、Ａ、Ｂ、又はＣを意味するように意図され、ここでは排他的な意味で使用される。加えて、「１つ又は複数」という用語は、本明細書において使用される場合、文脈に少なくとも部分的に依存して、単数の意味において任意の特徴、構造、若しくは特性を説明するために使用され得、又は、複数の意味において特徴、構造又は特性の組み合わせを説明するために使用され得る。同様に、「１つの」（“ａ，” “ａｎ”）又は「その」（“ｔｈｅ”）のような用語は、ここでも、文脈に少なくとも部分的に依存して、単数の用法を伝えるか、又は、複数の用法を伝えるものと理解され得る。加えて、「〜に基づいて」という用語は、必ずしも、要因の排他的な集合を伝えるように意図されているとは限らないものとして理解され得、代わりに、ここでも、文脈に少なくとも部分的に依存して、必ずしも明示的に記載されていない追加の要因の存在を許容し得る。 [0034] In general, terminology can be understood, at least in part, from its usage in context. For example, terms such as "and", "or", or "and / or", when used herein, have various implications that may at least partially depend on the context in which the term is used. Can be included. Typically, "or" is intended to mean A, B, and C when used to associate a list, such as A, B, or C, and is used herein in an inclusive sense. , Also intended to mean A, B, or C, and is used herein in an exclusive sense. In addition, the term "one or more" as used herein is used to describe any feature, structure, or property in the singular sense, at least in part depending on the context. Can be used, or can be used to describe a combination of features, structures or properties in multiple senses. Similarly, do terms like "one" ("a," "an") or "that" ("the") convey singular usage, again at least partially dependent on the context. , Or can be understood to convey multiple uses. In addition, the term "based on" can be understood as not necessarily intended to convey an exclusive set of factors, and instead, again, at least partially in context. Depending on, the presence of additional factors not necessarily explicitly stated can be tolerated.

[0035]下記に詳細に開示されるように、他の新規の特徴の中でも、本明細書において開示される自動ＧＥＣシステム及び方法は、ネイティブテキストデータから訓練することができる深層文脈（deep context, ディープコンテキスト）モデルを使用して、文法誤りを効率的且つ効果的に検出し訂正する能力を提供する。いくつかの実施形態において、特定の文法誤りタイプについて、誤り訂正作業は、文法的文脈表現を、主に利用可能であるネイティブテキストデータから学習することができる分類問題として処理することができる。従来の分類方法と比較すると、本明細書において開示されているシステム及び方法は、通常、言語知識を必要として、すべての文脈パターンをカバーすることができない、洗練された特徴エンジニアリングを必要としない。いくつかの実施形態において、表面的で浅い特徴を使用する代わりに、本明細書において開示されているシステム及び方法は、文脈を表現するための再帰型ニューラルネットワークのような、深い特徴を直接的に使用することができる。いくつかの実施形態において、大量の教師データが通常必要とされるが限られた量しか利用可能でない、従来のＮＬＰ作業とは異なり、本明細書において開示されているシステム及び方法は、文法誤りを効果的に訂正するために、冗長なネイティブ平文コーパスを活用し、文脈表現及び分類を共にエンドツーエンドの様式で学習することができる。 [0035] Among other novel features, as disclosed in detail below, the automated GE C systems and methods disclosed herein can be trained from native text data in deep context, It uses a (deep context) model to provide the ability to detect and correct grammatical errors efficiently and effectively. In some embodiments, for a particular grammatical error type, the error correction task can treat the grammatical contextual representation as a classification problem that can be learned primarily from the native text data available. Compared to traditional classification methods, the systems and methods disclosed herein typically require linguistic knowledge and do not require sophisticated feature engineering that cannot cover all contextual patterns. In some embodiments, instead of using superficial and shallow features, the systems and methods disclosed herein directly feature deep features, such as recurrent neural networks for expressing context. Can be used for. In some embodiments, the systems and methods disclosed herein are grammatically incorrect, unlike conventional NLP work, where large amounts of teacher data are typically required but only limited amounts are available. In order to effectively correct the above, a redundant native plaintext corpus can be utilized to learn both contextual expressions and classifications in an end-to-end manner.

[0036]追加の新規の特徴は、部分的には、後続する説明に記載され、部分的には、以下及び添付の図面の検討を受けて当業者に明らかになるか、又は、実施例の生成又は動作によって学習することができる。本開示の新規の特徴は、下記に論じられている詳細な例に記載されている方法、手段、及び組み合わせの様々な態様を実践又は使用することによって実現し、達成することができる。 [0036] Additional novel features will be, in part, described in subsequent description and will be partially revealed to those skilled in the art upon review of the following and accompanying drawings, or of the Examples. It can be learned by generation or action. The novel features of the present disclosure can be realized and achieved by practicing or using various aspects of the methods, means, and combinations described in the detailed examples discussed below.

[0037]図１は、一実施形態によるＧＥＣシステム１００を示すブロック図である。ＧＥＣシステム１００は、入力前処理モジュール１０２と、構文解析モジュール１０４と、対象語ディスパッチモジュール１０６と、各々が深層文脈を使用して分類ベースの文法誤り検出及び訂正を実施するように構成されている複数の分類ベースＧＥＣモジュール１０８とを含む。いくつかの実施形態において、ＧＥＣシステム１００は、ＧＥＣシステム１００の性能をさらに向上させるために、機械翻訳及び所定の規則ベースの方法のような、他のＧＥＣ方法を、分類ベースの方法と組み合わせるためのパイプラインアーキテクチャを使用して実施されてもよい。図１に示すように、ＧＥＣシステム１００は、機械翻訳ベースＧＥＣモジュール１１０と、規則ベースＧＥＣモジュール１１２と、スコア付け／訂正モジュール１１４とをさらに含むことができる。 [0037] FIG. 1 is a block diagram showing a GEC system 100 according to an embodiment. The GEC system 100 is configured to include an input pre-processing module 102, a parsing module 104, and a target word dispatch module 106, each of which performs classification-based grammatical error detection and correction using deep context. Includes a plurality of classification-based GEC modules 108. In some embodiments, the GEC system 100 combines other GEC methods, such as machine translation and predetermined rule-based methods, with classification-based methods to further improve the performance of the GEC system 100. It may be implemented using the pipeline architecture of. As shown in FIG. 1, the GEC system 100 can further include a machine translation based GEC module 110, a rule based GEC module 112, and a scoring / correction module 114.

[0038]入力前処理モジュール１０２は、入力テキスト１１６を受信し、入力テキスト１１６を前処理するように構成されている。入力テキスト１１６は、例えば、単一の文、段落、記事、又は任意のテキストコーパスなど、少なくとも１つの英文を含んでもよい。入力テキスト１１６は、例えば、手書き、タイピング、又はコピー／ペーストを介して直接的に受信することができる。入力テキスト１１６は、例えば、音声認識又は画像認識を介して間接的に受信することもできる。例えば、任意の適切な音声認識技法を使用して、音声入力を入力テキスト１１６に変換することができる。別の例において、任意の適切な光学文字認識（ＯＣＲ）技法を使用して、画像に含まれるテキストを入力テキスト１１６に転換することができる。 [0038] The input preprocessing module 102 is configured to receive the input text 116 and preprocess the input text 116. The input text 116 may include at least one English sentence, for example a single sentence, paragraph, article, or arbitrary text corpus. The input text 116 can be received directly, for example, via handwriting, typing, or copy / paste. The input text 116 can also be received indirectly, for example, via voice recognition or image recognition. For example, any suitable speech recognition technique can be used to convert speech input to input text 116. In another example, any suitable Optical Character Recognition (OCR) technique can be used to convert the text contained in the image into input text 116.

[0039]入力前処理モジュール１０２は、様々な方法で入力テキスト１１６を前処理することができる。いくつかの実施形態において、文法誤りは通常、特定の文の文脈において分析されるため、入力前処理モジュール１０２は、入力テキスト１１６を複数の文に分割することができ、結果、各文を、後の処理のための単位として処理することができる。入力テキスト１１６を複数の文に分割することは、文の始まり及び／又は終わりを認識することによって実施することができる。例えば、入力前処理モジュール１０２は、文の終わりの指標として、ピリオド、セミコロン、疑問符、又は感嘆符のような一定の句読点を探索することができる。入力前処理モジュール１０２はまた、文の開始の指標として、最初の文字が大文字になっている単語を探索することもできる。いくつかの実施形態において、入力前処理モジュール１０２は、例えば、入力テキスト１１６の任意の大文字を小文字に変換することによって、後のプロセスを容易にするために、入力テキスト１１６を小文字にすることができる。いくつかの実施形態において、入力前処理モジュール１０２はまた、語彙データベース１１８にない任意のトークンを判定するために、入力テキスト１１６のトークン（単語、語句、又は任意のテキスト文字列）を語彙データベース１１８に照らして調べることもできる。一致しないトークンは、例えば、単一のｕｎｋトークン（未知のトークン）などの特別なトークンとして処理され得る。語彙データベース１１８は、ＧＥＣシステム１００によって処理することができるすべての単語を含む。語彙データベース１１８にない任意の単語又は他のトークンは、ＧＥＣシステム１００によって無視されるか、又は、別様に処理され得る。 [0039] The input preprocessing module 102 can preprocess the input text 116 in various ways. In some embodiments, grammatical errors are usually analyzed in the context of a particular sentence, so that the input preprocessing module 102 can split the input text 116 into multiple sentences, resulting in each sentence. It can be processed as a unit for later processing. Dividing the input text 116 into a plurality of sentences can be performed by recognizing the beginning and / or end of the sentence. For example, the input preprocessing module 102 can search for certain punctuation marks such as periods, semicolons, question marks, or exclamation marks as indicators at the end of a sentence. The input preprocessing module 102 can also search for words whose first letter is capitalized as an indicator of the beginning of a sentence. In some embodiments, the input preprocessing module 102 may lowercase the input text 116 to facilitate subsequent processes, for example by converting any uppercase letter of the input text 116 to lowercase. it can. In some embodiments, the input preprocessing module 102 also uses the tokens (words, phrases, or arbitrary text strings) of the input text 116 in the vocabulary database 118 to determine any tokens that are not in the vocabulary database 118. You can also look it up in the light of. Unmatched tokens can be treated as special tokens, for example a single unk token (unknown token). The vocabulary database 118 contains all the words that can be processed by the GEC system 100. Any word or other token that is not in the vocabulary database 118 can be ignored by the GEC system 100 or processed differently.

[0040]構文解析モジュール１０４は、入力テキスト１１６の各文内の１つ又は複数の対象語を識別するために、入力テキスト１１６を構文解析（parse, パース）するように構成されている。すべての文法誤りをまとめて考慮し、正しくないテキストを正しいテキストに置き換えるようと試行する既知のシステムとは異なり、ＧＥＣシステム１００は、詳細に後述するように、各特定の文法誤りタイプについて訓練されたモデルを使用する。したがって、いくつかの実施形態において、構文解析モジュール１０４は、各対象語が文法誤りタイプのうちの少なくとも１つに対応するように、所定の文法誤りタイプに基づいて、各文内のテキストトークンから対象語を識別することができる。文法誤りタイプは、限定ではないが、冠詞誤り、主語一致誤り、動詞形態誤り、前置詞誤り、及び名詞単数複数誤りを含む。文法誤りタイプは上記の例に限定されず、任意の他のタイプを含んでもよいことは諒解されたい。いくつかの実施形態において、構文解析モジュール１０４は、各文をトークン化し、語彙情報及びＧＥＣシステム１００が分かっている知識を含む語彙データベース１１８と協働して、トークンから対象語を識別することができる。 [0040] The parsing module 104 is configured to parse the input text 116 in order to identify one or more target words in each sentence of the input text 116. Unlike known systems that consider all grammatical errors together and try to replace incorrect text with correct text, the GEC system 100 is trained for each particular grammatical error type, as described in detail below. Use the model. Thus, in some embodiments, the parsing module 104 is based on a predetermined grammatical error type and from a text token within each sentence so that each target word corresponds to at least one of the grammatical error types. The target word can be identified. Grammatic error types include, but are not limited to, article errors, subject matching errors, verb morphology errors, prepositional errors, and noun singular plural errors. It should be understood that the grammatical error types are not limited to the above examples and may include any other type. In some embodiments, the parsing module 104 can tokenize each sentence and work with a vocabulary database 118 containing vocabulary information and knowledge known to the GEC system 100 to identify the target word from the token. it can.

[0041]例えば、主語一致誤りについて、構文解析モジュール１０４は、三人称でない単数現在形の単語及び三人称単数現在形の単語のマップ関係を前もって抽出することができる。構文解析モジュール１０４は、次いで、動詞を対象語として特定することができる。冠詞誤りについて、構文解析モジュール１０４は、名詞及び名詞句（名詞語と形容詞語との組み合わせ）を対象語として特定することができる。動詞形態誤りについて、構文解析モジュール１０４は、原形、動名詞若しくは現在分詞、又は過去分詞の形態の動詞を、対象語として特定することができる。前置詞誤りに関して、構文解析モジュール１０４は、前置詞を対象語として特定することができる。名詞単数複数誤りに関して、構文解析モジュール１０４は、名詞を対象語として特定することができる。１つの単語が、構文解析モジュール１０４によって複数の文法誤りタイプに対応するものとして識別される場合があることは諒解されたい。例えば、動詞が、主語一致誤り及び動詞形態誤りに関連する対象語として識別される場合があり、名詞又は名詞句が、冠詞誤り及び名詞単数複数誤りに関連する対象語として識別される場合がある。対象語が、名詞句のような、複数の単語の組み合わせである句を含む場合があることも諒解されたい。 [0041] For example, for subject matching errors, the parsing module 104 can pre-extract the map relationships between non-third-person singular present-form words and third-person singular present-form words. The parsing module 104 can then identify the verb as the target word. With respect to article errors, the parsing module 104 can identify nouns and noun phrases (combinations of noun words and adjective words) as target words. For verb morphological errors, the parsing module 104 can identify verbs in the form of the original form, gerund or present participle, or past participle as the target word. Regarding the preposition error, the parsing module 104 can specify the preposition as the target word. With respect to the noun singular and plural errors, the parsing module 104 can specify the noun as the target word. It should be understood that a word may be identified by the parsing module 104 as corresponding to multiple grammatical error types. For example, a verb may be identified as a target word associated with a subject matching error and a verb morphological error, and a noun or noun phrase may be identified as a target word associated with an allusion error and a noun singular plural error. .. It should also be understood that the target word may include a phrase that is a combination of multiple words, such as a noun phrase.

[0042]いくつかの実施形態において、各文法誤りタイプについて、構文解析モジュール１０４は、各対象語の実際の分類を判定するように構成することができる。構文解析モジュール１０４は、対応する文法誤りタイプに関連する元のラベルを、対象語の実際の分類値として、各対象語に割り当てることができる。例えば、主語一致誤りについて、動詞の実際の分類は、三人称単数現在形又は原形のいずれかである。構文解析モジュール１０４は、対象語が三人称単数現在形である場合、例えば「１」の元のラベルを対象語に割り当て、又は、対象語が原形である場合、「０」を割り当てることができる。冠詞誤りについて、対象語の実際の分類は、「ａ／ａｎ」、「ｔｈｅ」又は「無冠詞」であり得る。構文解析モジュール１０４は、対象語（名詞語又は名詞句）の前の冠詞を調べて、各対象語の実際の分類を判定することができる。動詞形態誤りに関して、対象語（例えば、動詞）の実際の分類は、「原形」、「動名詞又は現在分詞」、又は「過去分詞」であり得る。前置詞誤りに関して、最も頻繁に使用される前置詞が、構文解析モジュール１０４によって実際の分類として使用され得る。いくつかの実施形態において、実際の分類は、「ａｂｏｕｔ」、「ａｔ」、「ｂｙ」、「ｆｏｒ」、「ｆｒｏｍ」、「ｉｎ」、「ｏｆ」、「ｏｎ」、「ｔｏ」、「ｕｎｔｉｌ」、「ｗｉｔｈ」及び「ａｇａｉｎｓｔ」の、１１個の元のラベルを含む。名詞単数複数誤りに関して、対象語（例えば、名詞）の実際の分類は、単数形又は複数形であり得る。いくつかの実施形態において、構文解析モジュール１０４は、語彙データベース１１８と協働して、品詞（ＰｏＳ）タグに基づいて対応する文法誤りタイプに関連する各対象語の元のラベルを判定することができる。 [0042] In some embodiments, for each grammatical error type, the parsing module 104 can be configured to determine the actual classification of each subject word. The parsing module 104 can assign the original label associated with the corresponding grammatical error type to each target word as the actual classification value of the target word. For example, for subject match errors, the actual classification of verbs is either the third-person singular present form or the original form. The parsing module 104 can assign the original label of "1" to the target word when the target word is in the third person singular present form, or "0" when the target word is the original form. For article errors, the actual classification of the target word can be "a / an," "the," or "no article." The parsing module 104 can examine the article before the target word (noun word or noun phrase) to determine the actual classification of each target word. With respect to verb morphological errors, the actual classification of the target word (eg, verb) can be "prototype," "gerund or present participle," or "past participle." With respect to prepositional errors, the most frequently used prepositions can be used as the actual classification by the parsing module 104. In some embodiments, the actual classifications are "about", "at", "by", "for", "from", "in", "of", "on", "to", "until". , "With" and "against", including 11 original labels. For noun singular plural errors, the actual classification of the target word (eg, noun) can be singular or plural. In some embodiments, the parsing module 104 may work with the vocabulary database 118 to determine the original label of each target word associated with the corresponding grammatical error type based on the part of speech (PoS) tag. it can.

[0043]対象語ディスパッチモジュール１０６は、対応する文法誤りタイプについて分類ベースＧＥＣモジュール１０８に各対象語をディスパッチするように構成されている。いくつかの実施形態において、各文法誤りタイプについて、ＡＮＮモデル１２０は、独立して訓練され、対応する分類ベースＧＥＣモジュール１０８によって使用される。したがって、各分類ベースＧＥＣモジュール１０８は、１つの特定の文法誤りタイプと関連付けられ、同じ文法誤りタイプに関連する対象語を取り扱うように構成されている。例えば、前置詞（前置詞誤りタイプに関連する）である対象語について、対象語ディスパッチモジュール１０６は、前置詞誤りを取り扱う分類ベースＧＥＣモジュール１０８に、前置詞を送信することができる。１つの単語が、複数の文法誤りタイプに関連する対象語として判定される場合があるため、対象語ディスパッチモジュール１０６は、複数の分類ベースＧＥＣモジュール１０８に同じ単語を送信する場合があることは諒解されたい。いくつかの実施形態において、ＧＥＣシステム１００によって各分類ベースＧＥＣモジュール１０８に割り当てられるリソースは等しくない場合があることも諒解されたい。例えば、一定のユーザ群内で又は特定のユーザについて各文法誤りタイプが発生した頻度に応じて、対象語ディスパッチモジュール１０６は、最も頻繁に発生した文法誤りタイプに関連する対象語を、最高の優先度をもってディスパッチすることができる。例えば、多数の文及び／又は各文内の対象語など、サイズの大きい入力テキスト１１６について、対象語ディスパッチモジュール１０６は、待ち時間を低減するために、各分類ベースＧＥＣモジュール１０８の作業負荷に照らして、各文内の各対象語の処理を最適にスケジュールすることができる。 [0043] The target word dispatch module 106 is configured to dispatch each target word to the classification-based GEC module 108 for the corresponding grammatical error type. In some embodiments, for each grammatical error type, the ANN model 120 is independently trained and used by the corresponding classification-based GEC module 108. Therefore, each classification-based GEC module 108 is configured to be associated with one particular grammatical error type and to handle target words associated with the same grammatical error type. For example, for a target word that is a preposition (related to a preposition error type), the target word dispatch module 106 can send the preposition to the classification-based GEC module 108 that handles the preposition error. It is understood that the target word dispatch module 106 may send the same word to the plurality of classification-based GEC modules 108 because one word may be determined as a target word related to a plurality of grammatical error types. I want to be. It should also be noted that in some embodiments, the resources allocated by the GEC system 100 to each classification-based GEC module 108 may not be equal. For example, depending on how often each grammatical error type occurs within a given set of users or for a particular user, the target word dispatch module 106 gives the highest priority to the target word associated with the most frequently occurring grammatical error type. It can be dispatched with a degree. For large input text 116, such as a large number of sentences and / or target words within each sentence, the target word dispatch module 106 compares to the workload of each classification-based GEC module 108 in order to reduce latency. Therefore, the processing of each target word in each sentence can be optimally scheduled.

[0044]各分類ベースＧＥＣモジュール１０８は、対応する文法誤りタイプについて訓練された対応するＡＮＮモデル１２０を含む。分類ベースＧＥＣモジュール１０８は、対応するＡＮＮモデル１２０を使用して、対応する文法誤りタイプに関連する対象語の分類を推定するように構成されている。下記に詳細に説明するように、いくつかの実施形態において、ＡＮＮモデル１２０は、文内の対象語の前の少なくとも１つの単語及び対象語の後の少なくとも１つの単語に基づいて、対象語の文脈ベクトルを出力するように構成されている２つの再帰型ニューラルネットワークを含む。ＡＮＮモデル１２０は、対象語の文脈ベクトルに基づいて、文法誤りタイプに関する対象語の分類値を出力するように構成されている順伝播型ニューラルネットワークをさらに含む。 [0044] Each classification-based GEC module 108 includes a corresponding ANN model 120 trained for the corresponding grammatical error type. The classification-based GEC module 108 is configured to use the corresponding ANN model 120 to estimate the classification of target words associated with the corresponding grammatical error type. As described in detail below, in some embodiments, the ANN model 120 is based on at least one word before the subject word in the sentence and at least one word after the subject word in the sentence. It contains two recurrent neural networks that are configured to output context vectors. The ANN model 120 further includes a feedforward neural network configured to output a target word classification value for a grammatical error type based on the target word context vector.

[0045]分類ベースＧＥＣモジュール１０８は、対象語及び対象語の推定された分類に基づいて、文内の文法誤りを検出するようにさらに構成されている。上述したように、いくつかの実施形態において、各対象語の実際の分類は、構文解析モジュール１０４によって判定することができる。その後、分類ベースＧＥＣモジュール１０８は、対象語の推定された分類を、対象語の実際の分類と比較し、実際の分類が対象語の推定された分類と一致しないとき、文内の文法誤りを検出することができる。例えば、一定の文法誤りタイプについて、対応するＡＮＮモデル１２０は、対象語の周囲の可変長文脈の埋め込み関数を学習することができ、対応する分類ベースＧＥＣモジュール１０８は、文脈埋め込みによって、対象語の分類を予測することができる。予測される分類ラベルが対象語の元のラベルと異なる場合、対象語は誤りとしてフラグ立てすることができ、予測を訂正として使用することができる。 [0045] The classification-based GEC module 108 is further configured to detect grammatical errors in sentences based on the target word and the estimated classification of the target word. As mentioned above, in some embodiments, the actual classification of each subject word can be determined by the parsing module 104. The classification-based GEC module 108 then compares the estimated classification of the target word with the actual classification of the target word, and when the actual classification does not match the estimated classification of the target word, it makes a grammatical error in the sentence. Can be detected. For example, for a given grammatical error type, the corresponding ANN model 120 can learn the embedded function of the variable length context around the target word, and the corresponding classification-based GEC module 108 can learn the embedded function of the target word by context embedding. The classification can be predicted. If the predicted classification label is different from the original label of the target word, the target word can be flagged as an error and the prediction can be used as a correction.

[0046]図１に示すように、いくつかの実施形態において、様々な文法誤りタイプについて文法誤りを同時に検出するために、複数の分類ベースＧＥＣモジュール１０８が、ＧＥＣシステム１００に並列に適用されてもよい。上述したように、ＧＥＣシステム１００のリソースは、各文法誤りタイプの発生頻度に基づいて、異なる文法誤りタイプに割り当てることができる。例えば、他よりも頻繁に発生する文法誤りタイプを取り扱うために、ＧＥＣシステム１００によってより多くの計算リソースを配分することができる。リソースの配分は、頻度変化及び／又は各分類ベースＧＥＣモジュール１０８の作業負荷に照らして動的に調整することができる。 As shown in FIG. 1, in some embodiments, a plurality of classification-based GEC modules 108 are applied in parallel to the GEC system 100 in order to simultaneously detect grammatical errors for various grammatical error types. May be good. As mentioned above, the resources of the GEC system 100 can be assigned to different grammatical error types based on the frequency of occurrence of each grammatical error type. For example, the GEC system 100 can allocate more computational resources to handle grammatical error types that occur more frequently than others. Resource allocation can be dynamically adjusted in light of frequency changes and / or the workload of each classification-based GEC module 108.

[0047]機械翻訳ベースＧＥＣモジュール１１０は、語句ベースの機械翻訳、ニューラルネットワークベースの機械翻訳などのような、統計的機械翻訳に基づいて、各文内の１つ又は複数の文法誤りを検出するように構成されている。いくつかの実施形態において、機械翻訳ベースＧＥＣモジュール１１０は、文について確率を割り当てる言語サブモデルと、条件付き確率を割り当てる翻訳サブモデルとを有するモデルを含む。言語サブモデルは、標的言語の単一言語訓練データセットを使用して訓練することができる。翻訳サブモデルのパラメータは、並列訓練データセット、すなわち、外国語文及び当該外国語文の標的言語への対応する翻訳から成るセットから推定することができる。ＧＥＣシステム１００のパイプラインアーキテクチャにおいて、機械翻訳ベースＧＥＣモジュール１１０を、分類ベースＧＥＣモジュール１０８の出力に適用することができ、又は、分類ベースＧＥＣモジュール１０８を、機械翻訳ベースＧＥＣモジュール１１０の出力に適用することができることは諒解されたい。また、いくつかの実施形態において、機械翻訳ベースＧＥＣモジュール１１０をパイプラインに追加することによって、一定の分類ベースＧＥＣモジュール１０８は、機械翻訳ベースＧＥＣモジュール１１０の方が性能が優れている場合、パイプラインに含まれなくてもよい。 [0047] Machine translation-based GEC module 110 detects one or more grammatical errors in each sentence based on statistical machine translation, such as phrase-based machine translation, neural network-based machine translation, and so on. It is configured as follows. In some embodiments, the machine translation-based GEC module 110 includes a model having a language submodel that assigns probabilities for sentences and a translation submodel that assigns conditional probabilities. Language submodels can be trained using a single language training dataset for the target language. The parameters of the translation submodel can be estimated from a parallel training dataset, i.e., a set consisting of a foreign language sentence and the corresponding translation of the foreign language sentence into the target language. In the pipeline architecture of the GEC system 100, the machine translation based GEC module 110 can be applied to the output of the classification based GEC module 108, or the classification based GEC module 108 can be applied to the output of the machine translation based GEC module 110. Please understand that you can do it. Also, in some embodiments, by adding the machine translation based GEC module 110 to the pipeline, certain classification based GEC modules 108 can be piped if the machine translation based GEC module 110 is superior in performance. It does not have to be included in the line.

[0048]規則ベースＧＥＣモジュール１１２は、所定の規則に基づいて、各文内の１つ又は複数の文法誤りを検出するように構成されている。パイプライン内の規則ベースＧＥＣモジュール１１２の位置は、図１に示すような終端部に限定されず、最初の検出モジュールとしてパイプラインの始まりにあってもよく、又は、分類ベースＧＥＣモジュール１０８と機械翻訳ベースＧＥＣモジュール１１０との間にあってもよいことは諒解されたい。いくつかの実施形態において、句読点、綴り、及び大文字化の誤りのような、他の機械的誤りを、規則ベースＧＥＣモジュール１１２によって所定の規則を使用して検出及び修正することもできる。 [0048] The rule-based GEC module 112 is configured to detect one or more grammatical errors in each sentence based on predetermined rules. The location of the rule-based GEC module 112 in the pipeline is not limited to the termination as shown in FIG. 1 and may be at the beginning of the pipeline as the first detection module, or the classification-based GEC module 108 and the machine. It should be understood that it may be with the translation-based GEC module 110. In some embodiments, other mechanical errors, such as punctuation, spelling, and capitalization errors, can also be detected and corrected by the rule-based GEC module 112 using predetermined rules.

[0049]スコア付け／訂正モジュール１１４は、パイプラインから受信される文法誤り結果に基づいて、入力テキスト１１６の訂正済みテキスト及び／又は文法スコア１２２を提供するように構成されている。分類ベースＧＥＣモジュール１０８を例に挙げると、推定された分類が実際の分類と一致しないために文法誤りがあるものとして検出される各対象語について、スコア付け／訂正モジュール１１４によって、対象語の推定された分類に基づいて、対象語の文法誤り訂正が与えられ得る。入力テキスト１１６を評価するために、スコア付け／訂正モジュール１１４はまた、スコアリング関数を使用して、パイプラインから受信される文法誤り結果に基づいて文法スコア１２２を与えることもできる。いくつかの実施形態において、スコアリング関数は、異なるタイプの文法誤りが異なるレベルの影響を文法スコア１２２に及ぼすことができるように、各文法誤りタイプに重みを割り当てることができる。重みは、文法誤り結果を評価する際の重み付け係数として、精度及び再現率に割り当てることができる。いくつかの実施形態において、入力テキスト１１６の提供元であるユーザも、スコアリング関数によって考慮することができる。例えば、重みは、異なるユーザに対しては異なり得、又は、ユーザの情報（例えば、母国語、居住地、教育レベル、履歴スコア、年齢など）が、スコアリング関数に織り込まれ得る。 [0049] The scoring / correction module 114 is configured to provide the corrected text and / or the grammar score 122 of the input text 116 based on the grammar error results received from the pipeline. Taking the classification-based GEC module 108 as an example, the scoring / correction module 114 estimates the target word for each target word that is detected as having a grammatical error because the estimated classification does not match the actual classification. Based on the classification given, grammatical error correction of the target word can be given. To evaluate the input text 116, the scoring / correction module 114 can also use a scoring function to give a grammar score 122 based on the grammar error results received from the pipeline. In some embodiments, the scoring function can assign weights to each grammar error type so that different types of grammar errors can have different levels of influence on the grammar score 122. Weights can be assigned to accuracy and recall as weighting factors when evaluating grammatical error results. In some embodiments, the user who is the source of the input text 116 can also be considered by the scoring function. For example, the weights can be different for different users, or user information (eg, native language, place of residence, education level, history score, age, etc.) can be factored into the scoring function.

[0050]図２は、図１のＧＥＣシステム１００によって実施される自動文法誤り訂正の一例の図である。図２に示すように、入力テキスト２０２は、複数の文を含み、ユーザＩＤ−１２３４によって識別されるユーザから受信される。各々が対応する文法誤りタイプについて個々に訓練された複数のＡＮＮモデル１２０によってＧＥＣシステム１００を通過した後、文法スコアを有する訂正済みテキスト２０４が、ユーザに与えられる。例えば、入力テキスト２０２内の文「ｉｔｗｉｌｌｊｕｓｔａｄｄｉｎｇｏｎｔｈｅｉｒｍｉｓｅｒｙ」において、動詞「ａｄｄｉｎｇ」は、ＧＥＣシステム１００によって動詞形態誤りに関連する対象語として識別される。対象語「ａｄｄｉｎｇ」の実際の分類は、動名詞又は現在分詞である。ＧＥＣシステム１００は、動詞形態誤りについて訓練されたＡＮＮモデル１２０を適用し、対象語「ａｄｄｉｎｇ」の分類が原形「ａｄｄ」であると推定する。推定された分類が対象語「ａｄｄｉｎｇ」の実際の分類と一致しないため、動詞形態文法誤りがＧＥＣシステム１００によって検出され、当該誤りは、動詞形態誤りタイプ及び／又はユーザの個人情報に適用される重みに照らして文法スコアに影響を及ぼす。対象語「ａｄｄｉｎｇ」の推定された分類はまた、ＧＥＣシステム１００によって、訂正済みテキスト２０４において「ａｄｄｉｎｇ」を置き換えるための訂正「ａｄｄ」を与えるためにも使用される。「ｄｉｓｈｅａｒｔｅｎ」を「ｄｉｓｈｅａｒｔｅｎｉｎｇ」にするなど、入力テキスト２０２内の他の文法形態誤りを検出し訂正するために、動詞形態誤りのための同じＡＮＮモデル１２０が、ＧＥＣシステム１００によって使用される。他のタイプの文法誤りを検出するためには、他の文法誤りタイプのためのＡＮＮモデル１２０が、ＧＥＣシステム１００によって使用される。例えば、「ｆｏｒ」を「ｉｎ」にする、及び、「ｔｏ」を「ｏｎ」など、入力テキスト２０２内の前置詞誤りを検出し訂正するために、前置詞誤りのためのＡＮＮモデル１２０が、ＧＥＣシステム１００によって使用される。 [0050] FIG. 2 is an example of automatic grammatical error correction performed by the GEC system 100 of FIG. As shown in FIG. 2, the input text 202 includes a plurality of sentences and is received from a user identified by user ID-1234. After passing through the GEC system 100 by a plurality of ANN models 120, each individually trained for the corresponding grammatical error type, a corrected text 204 with a grammatical score is given to the user. For example, in the sentence "it will just adding on their mission" in the input text 202, the verb "adding" is identified by the GEC system 100 as a target word associated with a verb morphological error. The actual classification of the subject word "adding" is a gerund or present participle. The GEC system 100 applies the ANN model 120 trained for verb morphological errors and presumes that the classification of the target word "adding" is the original form "add". Since the estimated classification does not match the actual classification of the target word "adding", a verb morphological grammatical error is detected by the GEC system 100 and the error applies to the verb morphological error type and / or the user's personal information. Affects grammar score in the light of weight. The presumed classification of the subject word "adding" is also used by the GEC system 100 to give the correction "add" to replace "adding" in the corrected text 204. The same ANN model 120 for verb morphological errors is used by the GEC system 100 to detect and correct other grammatical morphological errors in the input text 202, such as changing "disherten" to "dishertening". To detect other types of grammatical errors, the ANN model 120 for other grammatical error types is used by the GEC system 100. For example, in order to detect and correct prepositional errors in the input text 202, such as "for" to "in" and "to" to "on", the ANN model 120 for prepositional errors is a GEC system. Used by 100.

[0051]図３は、一実施形態による文法誤り訂正のための方法３００の一例を示す流れ図である。方法３００は、ハードウェア（例えば、回路、専用論理、プログラム可能論理、マイクロコードなど）、ソフトウェア（例えば、処理デバイス上で実行する命令）、又はハードウェアとソフトウェアとの組み合わせを含み得る処理論理によって実施することができる。本明細書において与えられている本開示を実施するために、すべてのステップが必要とされるとは限らない場合があることは諒解されたい。さらに、当業者には理解されるように、ステップのいくつかは、同時に実施されてもよく、又は、図３に示す順序とは異なる順序において実施されてもよい。 FIG. 3 is a flow chart showing an example of the method 300 for correcting grammatical errors according to one embodiment. Method 300 may include hardware (eg, circuits, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or processing logic that may include a combination of hardware and software. Can be carried out. It should be appreciated that not all steps may be required to implement the present disclosure given herein. Moreover, as will be appreciated by those skilled in the art, some of the steps may be performed simultaneously or in a different order than that shown in FIG.

[0052]方法３００を図１を参照しながら説明する。しかしながら、方法３００は、当該例示的実施形態には限定されない。３０２において、入力テキストが受信される。入力テキストは、少なくとも１つの文を含む。入力テキストは、例えば、手書き、タイピング、若しくはコピー／ペーストから直接的に受信されてもよく、又は、例えば、音声認識若しくは画像認識から間接的に受信されてもよい。３０４において、受信されている入力テキストが、複数の文に分割される、すなわちテキストトークン化など、前処理される。いくつかの実施形態において、前処理は、入力テキストが小文字になるように、大文字を小文字に変換することを含むことができる。いくつかの実施形態において、前処理は、語彙データベース１１８にない、入力テキスト内の任意のトークンを識別することと、それらのトークンを特別なトークンとして表現することとを含むことができる。３０２及び３０４は、ＧＥＣシステム１００の入力前処理モジュール１０２によって実施することができる。 [0052] Method 300 will be described with reference to FIG. However, method 300 is not limited to the exemplary embodiment. At 302, the input text is received. The input text contains at least one sentence. The input text may be received directly from, for example, handwriting, typing, or copy / paste, or indirectly from, for example, voice recognition or image recognition. At 304, the received input text is divided into a plurality of sentences, that is, preprocessed such as text tokenization. In some embodiments, the preprocessing can include converting uppercase letters to lowercase so that the input text is lowercase. In some embodiments, the preprocessing can include identifying any tokens in the input text that are not in the vocabulary database 118 and expressing those tokens as special tokens. 302 and 304 can be implemented by the input preprocessing module 102 of the GEC system 100.

[0053]３０６において、前処理されている入力テキストは、各文内の１つ又は複数の対象語を識別するために構文解析される。対象語は、各対象語が文法誤りタイプのうちの少なくとも１つに対応するように、文法誤りタイプに基づいて、テキストトークンから識別することができる。文法誤りタイプは、限定ではないが、冠詞の誤り、主語一致の誤り、動詞形態の誤り、前置詞の誤り、及び名詞単数複数の誤りを含む。いくつかの実施形態において、対応する文法誤りタイプに関連する各対象語の実際の分類が判定される。判定は、例えば、ＰｏＳタグ又は文内のテキストトークンに基づいて自動的に行うことができる。いくつかの実施形態において、対象語の識別及び実際の分類の判定は、ＳｔａｎｆｏｒｄｃｏｒｅｎｌｐツールのようなＮＬＰツールによって実施されてもよい。３０６は、ＧＥＣシステム１００の構文解析モジュール１０４によって実施することができる。 [0053] In 306, the preprocessed input text is parsed to identify one or more target words in each sentence. The target word can be identified from the text token based on the grammatical error type so that each target word corresponds to at least one of the grammatical error types. Grammatic error types include, but are not limited to, article errors, subject matching errors, verb form errors, prepositional errors, and noun singular plural errors. In some embodiments, the actual classification of each subject word associated with the corresponding grammatical error type is determined. The determination can be made automatically, for example, based on the PoS tag or the text token in the sentence. In some embodiments, the identification of the target word and the determination of the actual classification may be performed by an NLP tool such as the Standford corenlp tool. 306 can be implemented by the parsing module 104 of the GEC system 100.

[0054]３０８において、各対象語が、対応する分類ベースＧＥＣモジュール１０８にディスパッチされる。各分類ベースＧＥＣモジュール１０８は、例えば、ネイティブ訓練サンプルに対して、対応する文法誤りタイプについて訓練されたＡＮＮモデル１２０を含む。３０８は、ＧＥＣシステム１００の対象語ディスパッチモジュール１０６によって実施することができる。３１０において、各文内の１つ又は複数の文法誤りが、ＡＮＮモデル１２０を使用して検出される。いくつかの実施形態において、各対象語について、対応する文法誤りタイプに関連する対象語の分類を、対応するＡＮＮモデル１２０を使用して推定することができる。その後、文法誤りを、対象語及び対象語の推定された分類に基づいて検出することができる。例えば、推定が元のラベルと異なり、且つ、確率が所定の閾値よりも大きい場合、文法誤りが見つかったと考えられる。３１０は、ＧＥＣシステム１００の分類ベースＧＥＣモジュール１０８によって実施することができる。 At [0054] 308, each subject word is dispatched to the corresponding classification-based GEC module 108. Each classification-based GEC module 108 includes, for example, an ANN model 120 trained for the corresponding grammatical error type for native training samples. 308 can be implemented by the target word dispatch module 106 of the GEC system 100. At 310, one or more grammatical errors in each sentence are detected using the ANN model 120. In some embodiments, for each subject word, the classification of the subject word associated with the corresponding grammatical error type can be estimated using the corresponding ANN model 120. The grammatical error can then be detected based on the target word and the estimated classification of the target word. For example, if the estimation is different from the original label and the probability is greater than a predetermined threshold, it is considered that a grammatical error has been found. 310 can be implemented by the classification-based GEC module 108 of the GEC system 100.

[0055]３１２において、各文内の１つ又は複数の文法誤りを、機械翻訳を使用して検出することができる。３１２は、ＧＥＣシステム１００の機械翻訳ベースＧＥＣモジュール１１０によって実施することができる。３１４において、各文内の１つ又は複数の文法誤りを、所定の規則に基づいて検出することができる。３１４は、ＧＥＣシステム１００の規則ベースＧＥＣモジュール１１２によって実施することができる。いくつかの実施形態において、ＧＥＣシステム１００の性能をさらに向上させるために、パイプラインアーキテクチャを使用して、任意の適切な機械翻訳及び所定の規則ベースの方法を、本明細書において説明されているような分類ベースの方法と組み合わせることができる。 [0055] At 312, one or more grammatical errors in each sentence can be detected using machine translation. 312 can be implemented by the machine translation based GEC module 110 of the GEC system 100. In 314, one or more grammatical errors in each sentence can be detected based on a predetermined rule. 314 can be implemented by the rule-based GEC module 112 of the GEC system 100. In some embodiments, any suitable machine translation and predetermined rule-based methods are described herein using a pipeline architecture to further improve the performance of the GEC system 100. Can be combined with classification-based methods such as.

[0056]３１６において、入力テキストの検出されている文法誤り及び／又は文法スコアに対する訂正が与えられる。いくつかの実施形態において、対応する文法誤りタイプに基づいて、対象語の各文法誤り結果に、重みを適用することができる。文法誤り結果及び文内の対象語並びに各文法誤り結果に適用される重みに基づいて、各文の文法スコアを決定することができる。いくつかの実施形態において、文法スコアは、文がそこから受信されたユーザと関連付けられる情報にも基づいて与えることができる。検出されている文法誤りに対する訂正に関して、いくつかの実施形態において、対応する文法誤りタイプに関連する対象語の推定された分類を使用して、訂正を生成することができる。訂正及び文法スコアは必ずしもともに与えられるとは限らないことは諒解されたい。３１６は、ＧＥＣシステム１００のスコア付け／訂正モジュール１１４によって実施することができる。 In [0056] 316, corrections are given to the detected grammatical errors and / or grammatical scores of the input text. In some embodiments, weights can be applied to each grammatical error result of the subject word based on the corresponding grammatical error type. The grammar score of each sentence can be determined based on the grammar error result, the target word in the sentence, and the weight applied to each grammar error result. In some embodiments, the grammar score can also be given based on the information that the sentence is associated with the user received from it. With respect to corrections for detected grammatical errors, in some embodiments, the estimated classification of the target word associated with the corresponding grammatical error type can be used to generate the correction. It should be understood that corrections and grammar scores are not always given together. 316 can be carried out by the scoring / correction module 114 of the GEC system 100.

[0057]図４は、一実施形態による、図１のＧＥＣシステム１００の分類ベースＧＥＣモジュール１０８の一例を示すブロック図である。上述したように、分類ベースＧＥＣモジュール１０８は、文４０２内の対象語を受信し、対象語の対応する文法誤りタイプのためのＡＮＮモデル１２０を使用して対象語の分類を推定するように構成されている。文４０２内の対象語はまた、（例えば、構文解析モジュール１０４内の）対象語ラベリングユニット４０４によっても受信される。対象語ラベリングユニット４０４は、例えばＰｏＳタグ又は文４０２内のテキストトークンに基づいて、対象語の実際の分類（例えば、元のラベル）を判定するように構成されている。分類ベースＧＥＣモジュール１０８は、対象語の推定された分類及び実際の分類に基づいて、文法誤り結果を提供するようにさらに構成されている。図４に示すように、分類ベースＧＥＣモジュール１０８は、初期文脈生成ユニット４０６と、深層文脈表現ユニット４０８と、分類ユニット４１０と、注意ユニット（ａｔｔｅｎｔｉｏｎｕｎｉｔ）４１２と、分類比較ユニット４１４とを含む。 [0057] FIG. 4 is a block diagram showing an example of the classification-based GEC module 108 of the GEC system 100 of FIG. 1 according to one embodiment. As mentioned above, the classification-based GEC module 108 is configured to receive the target word in sentence 402 and estimate the classification of the target word using the ANN model 120 for the corresponding grammatical error type of the target word. Has been done. The target word in sentence 402 is also received by the target word labeling unit 404 (eg, in the parsing module 104). The target word labeling unit 404 is configured to determine the actual classification of the target word (eg, the original label) based on, for example, a PoS tag or a text token in sentence 402. The classification-based GEC module 108 is further configured to provide grammatical error results based on the estimated and actual classification of the target word. As shown in FIG. 4, the classification-based GEC module 108 includes an initial context generation unit 406, a deep context representation unit 408, a classification unit 410, an attention unit 412, and a classification comparison unit 414.

[0058]初期文脈生成ユニット４０６は、文４０２内の対象語の周囲の単語（文脈語）に基づいて、対象語の初期文脈ベクトル複数のセット（初期文脈行列）を生成するように構成されている。いくつかの実施形態において、初期文脈ベクトルセットは、文４０２内の対象語の前の少なくとも１つの単語（順方向文脈語）に基づいて生成される順方向初期文脈ベクトルのセット（順方向初期文脈行列）、及び、文４０２内の対象語の後の少なくとも１つの単語（逆方向文脈語）に基づいて生成される逆方向初期文脈ベクトルのセット（逆方向初期文脈行列）を含む。各初期文脈ベクトルは、文４０２内の１つの文脈語を表す。いくつかの実施形態において、初期文脈ベクトルは、ｏｎｅ−ｈｏｔベクトルであってもよく、ｏｎｅ−ｈｏｔベクトルは、ｏｎｅ−ｈｏｔベクトルのサイズ（次元）が（例えば、語彙データベース１１８内の）語彙サイズと同じであるような、ｏｎｅ−ｈｏｔ符号化に基づいて単語を表す。いくつかの実施形態において、初期文脈ベクトルは、文脈語の単語埋め込みベクトルのような、語彙サイズよりも次元が小さい低次元ベクトルであってもよい。例えば、単語埋め込みベクトルは、限定ではないがｗｏｒｄ２ｖｅｃ又はＧｌｏｖｅのような、任意の適切で一般的な単語埋め込み手法によって生成されてもよい。いくつかの実施形態において、初期文脈生成ユニット４０６は、初期文脈ベクトルの１つ又は複数のセットを出力するように構成されている１つ又は複数の再帰型ニューラルネットワークを使用することができる。初期文脈生成ユニット４０６によって使用される再帰型ニューラルネットワーク（複数可）は、ＡＮＮモデル１２０の一部であってもよい。 [0058] The initial context generation unit 406 is configured to generate a plurality of sets of initial context vectors (initial context matrix) of the target word based on the words (context words) around the target word in sentence 402. There is. In some embodiments, the initial context vector set is a set of forward initial context vectors (forward initial context) generated based on at least one word (forward context word) before the target word in sentence 402. Matrix) and a set of reverse initial context vectors (reverse initial context matrix) generated based on at least one word (reverse context word) after the target word in sentence 402. Each initial context vector represents one context word in sentence 402. In some embodiments, the initial context vector may be a one-hot vector, where the size (dimensions) of the one-hot vector is the lexical size (eg, in the vocabulary database 118). Represents a word based on one-hot encoding, which is similar. In some embodiments, the initial context vector may be a low-dimensional vector smaller than the vocabulary size, such as a word embedding vector for contextual words. For example, the word embedding vector may be generated by any suitable and general word embedding technique, such as, but not limited to, word2vec or Grove. In some embodiments, the initial context generation unit 406 can use one or more recursive neural networks configured to output one or more sets of initial context vectors. The recurrent neural network (s) used by the initial context generation unit 406 may be part of the ANN model 120.

[0059]順方向又は逆方向初期文脈ベクトルのセットを生成するのに使用される文脈語の数は限定されないことは諒解されたい。いくつかの実施形態において、順方向初期文脈ベクトルのセットは、文４０２内の対象語の前のすべての単語に基づいて生成され、逆方向初期文脈ベクトルのセットは、文４０２内の対象語の後のすべての単語に基づいて生成される。各分類ベースＧＥＣモジュール１０８及び対応するＡＮＮモデル１２０は、特定の文法誤りタイプを取り扱い、異なるタイプの分布誤りの訂正は、異なる単語距離からの依存性を必要とし得るため（例えば、前置詞は対象語の近くの単語によって判定され、一方、動詞の状態は動詞から遠く離れた主語によって影響され得る）、いくつかの実施形態において、順方向又は逆方向初期文脈ベクトルのセットを生成するために使用される文脈語の数（すなわち、ウィンドウサイズ）は、分類ベースＧＥＣモジュール１０８及び対応するＡＮＮモデル１２０と関連付けられる文法誤りタイプに基づいて判定することができる。 It should be noted that the number of contextual words used to generate a set of forward or reverse initial context vectors is not limited. In some embodiments, a set of forward initial context vectors is generated based on all words preceding the target word in sentence 402, and a set of reverse initial context vectors is of the target word in sentence 402. Generated based on all later words. Because each classification-based GEC module 108 and the corresponding ANN model 120 deal with specific grammatical error types, correction of different types of distribution errors may require dependencies from different word distances (eg, prepositions are subjects). The state of the verb can be influenced by the subject far from the verb, while being determined by a word near the verb), used in some embodiments to generate a set of forward or reverse initial context vectors. The number of contextual words (ie, window size) can be determined based on the grammatical error type associated with the classification-based GEC module 108 and the corresponding ANN model 120.

[0060]いくつかの実施形態において、初期文脈ベクトルは、対象語自体の見出し語に基づいて生成することができる。見出し語は、単語の基本形である（例えば、単語「ｗａｌｋ」、「ｗａｌｋｓ」、「ｗａｌｋｅｄ」、「ｗａｌｋｉｎｇ」はすべて同じ見出し語「ｗａｌｋ」を有する）。例えば、名詞単数複数誤りと関連付けられる分類ベースＧＥＣモジュール１０８及び対応するＡＮＮモデル１２０について、対象語が単数形であるべきか又は複数形であるべきかは、対象語自体に密接に関係するため、文脈語（すなわち、文４０２内の対象語の周囲の単語）に加えて、対象名詞語の見出し語形態を、抽出文脈情報として、初期見出し語文脈ベクトルの形態で導入することができる。いくつかの実施形態において、対象語の見出し語の初期文脈ベクトルは、順方向初期文脈ベクトルのセットの一部又は逆方向初期文脈ベクトルのセットの一部であってもよい。 [0060] In some embodiments, the initial context vector can be generated based on the headword of the subject word itself. A headword is an uninflected word (eg, the words "walk", "walks", "walked", "walking" all have the same headword "walk"). For example, with respect to the classification-based GEC module 108 and the corresponding ANN model 120 associated with noun singular and plural errors, whether the target word should be singular or plural is closely related to the target word itself. In addition to the context word (that is, the words around the target word in sentence 402), the headword form of the target noun word can be introduced as the extracted context information in the form of the initial headword context vector. In some embodiments, the initial context vector of the headword of the subject word may be part of a set of forward initial context vectors or part of a set of reverse initial context vectors.

[0061]いくつかの既知のＧＥＣシステムにおいては、意味特徴を、文から、生成される特徴ベクトルへと手動で設計及び抽出する必要があり、言語の複雑性に起因してすべての状況をカバーすることは困難である。対照的に、文４０２内の対象語の文脈語が（例えば、初期文脈ベクトルの形態の）初期文脈情報として直接的に使用され、下記に詳細に説明するように、深層文脈特徴表現を、分類と共にエンドツーエンドの様式で学習することができるため、本明細書において開示される分類ベースＧＥＣモジュール１０８は、複雑な特徴エンジニアリングを必要としない。 [0061] In some known GEC systems, semantic features need to be manually designed and extracted from sentences into generated feature vectors, covering all situations due to language complexity. It's difficult to do. In contrast, the contextual word of the subject word in sentence 402 is used directly as the initial contextual information (eg, in the form of the initial context vector) and classifies the deep contextual feature representation as detailed below. The classification-based GEC module 108 disclosed herein does not require complex feature engineering because it can be learned in an end-to-end manner with.

[0062]図５を参照すると、当該例において、文が、対象語ｉを含むｎ個の単語１〜ｎから構成される。対象語ｉの前の各単語、すなわち、単語１、単語２、．．．、又は単語ｉ−１について、対応する初期文脈ベクトル１、２、．．．、又はｉ−１が生成される。初期文脈ベクトル１、２、．．．、及びｉ−１は、対象語ｉの前の単語から生成されるため、「順方向」ベクトルであり、順方向において（すなわち、文の始まり、すなわち第１の単語１から）後の段へと供給されることになる。対象語ｉの後の各単語、すなわち、単語ｉ＋１、単語ｉ＋２、．．．、又は単語ｎについて、対応する初期文脈ベクトルｉ＋１、ｉ＋２、．．．、又はｎが生成される。初期文脈ベクトルｎ、．．．、ｉ＋２、及びｉ＋１は、対象語ｉの後の単語から生成されるため、「逆方向」ベクトルであり、逆方向方向において（すなわち、文の終端部、すなわち最後の単語ｎから）後の段へと供給されることになる。 [0062] With reference to FIG. 5, in this example, the sentence is composed of n words 1 to n including the target word i. Each word before the target word i, that is, word 1, word 2, ... .. .. , Or for the word i-1, the corresponding initial context vectors 1, 2, ... .. .. , Or i-1 is generated. Initial context vectors 1, 2, ... .. .. , And i-1 are "forward" vectors because they are generated from the word before the target word i, and in the forward direction (ie, from the beginning of the sentence, i.e. from the first word 1,) to the next stage. Will be supplied. Each word after the target word i, that is, word i + 1, word i + 2, ... .. .. , Or for the word n, the corresponding initial context vectors i + 1, i + 2, ... .. .. , Or n is generated. Initial context vector n ,. .. .. , I + 2, and i + 1 are "reverse" vectors because they are generated from the word after the target word i, and in the reverse direction (ie, from the end of the sentence, i.e. the last word n). Will be supplied to.

[0063]当該例において、順方向初期文脈ベクトルのセットは、単語埋め込みの次元と同じ数の列及び対象語ｉの前の単語の数と同じ数の行を有する順方向初期文脈行列として表すことができる。順方向初期文脈行列の第１の行は、第１の単語１の単語埋め込みベクトルであり得、順方向初期文脈行列の最後の行は、対象語ｉの直前の単語ｉ−１の単語埋め込みベクトルであり得る。逆方向初期文脈ベクトルのセットは、単語埋め込みの次元と同じ数の列及び対象語ｉの後の単語の数と同じ数の行を有する逆方向初期文脈行列として表すことができる。逆方向初期文脈行列の第１の行は、最後の単語ｎの単語埋め込みベクトルであり得、逆方向初期文脈行列の最後の行は、対象語ｉの直後の単語ｉ＋１の単語埋め込みベクトルであり得る。各単語埋め込みベクトルの次元の数は、少なくとも１００、例えば３００であってもよい。当該例において、見出し語初期文脈ベクトルｌｅｍ（例えば、単語埋め込みベクトル）も、対象語ｉの見出し語に基づいて生成され得る。 [0063] In this example, the set of forward initial context vectors should be represented as a forward initial context matrix with as many columns as the dimension of the word embedding and as many rows as the number of words before the target word i. Can be done. The first row of the forward initial context matrix can be the word embedding vector of the first word 1, and the last row of the forward initial context matrix is the word embedding vector of the word i-1 immediately preceding the target word i. Can be. A set of inverse initial context vectors can be represented as an inverse initial context matrix with as many columns as the dimension of word embedding and as many rows as the number of words after the target word i. The first row of the inverse initial context matrix can be the word embedding vector of the last word n, and the last row of the inverse initial context matrix can be the word embedding vector of the word i + 1 immediately following the target word i. .. The number of dimensions of each word embedding vector may be at least 100, eg 300. In this example, the headword initial context vector lem (eg, word embedding vector) can also be generated based on the headword of the target word i.

[0064]図４に戻って参照すると、深層文脈表現ユニット４０８は、ＡＮＮモデル１２０を使用して、例えば、初期文脈生成ユニット４０６によって生成される順方向及び逆方向初期文脈ベクトルのセットなど、文４０２内の文脈語に基づいて、対象語の文脈ベクトルを提供するように構成されている。分類ユニット４１０は、ＡＮＮモデル１２０を使用して、例えば、深層文脈表現ユニット４０８によって生成される文脈ベクトルなど、文４０２内の対象語の深層文脈表現に基づいて、文法誤りタイプに関連する対象語の分類値を提供するように構成されている。 [0064] Returning to FIG. 4, the deep context representation unit 408 uses the ANN model 120 to describe, for example, a set of forward and reverse initial context vectors generated by the initial context generation unit 406. It is configured to provide the context vector of the target word based on the context word in 402. The classification unit 410 uses the ANN model 120 to relate to the grammatical error type based on the deep contextual representation of the subject word in sentence 402, for example, the context vector generated by the deep contextual representation unit 408. It is configured to provide a classification value for.

[0065]図６を参照すると、一実施形態による、ＡＮＮモデル１２０の一例の概略図が示されている。当該例において、ＡＮＮモデル１２０は、深層文脈表現ユニット４０８によって使用することができる深層文脈表現サブモデル６０２と、分類ユニット４１０によって使用することができる分類サブモデル６０４とを含む。深層文脈表現サブモデル６０２及び分類サブモデル６０４は共に、エンドツーエンドの様式で訓練することができる。深層文脈表現サブモデル６０２は、２つの再帰型ニューラルネットワーク、すなわち、順方向再帰型ニューラルネットワーク６０６及び逆方向再帰型ニューラルネットワーク６０８を含む。各再帰型ニューラルネットワーク６０６又は６０８は、長・短期記憶（ＬＳＴＭ）ニューラルネットワーク、ゲート付き再帰型ユニット（ＧＲＵ）ニューラルネットワーク、又は、隠れユニット間の接続が有向閉路を形成する任意の他の適切な再帰型ニューラルネットワークであってもよい。 [0065] With reference to FIG. 6, a schematic diagram of an example of the ANN model 120 according to one embodiment is shown. In that example, the ANN model 120 includes a deep contextual representation submodel 602 that can be used by the deep contextual representation unit 408 and a classification submodel 604 that can be used by the classification unit 410. Both the deep contextual representation submodel 602 and the classification submodel 604 can be trained in an end-to-end fashion. The deep context representation submodel 602 includes two recurrent neural networks, namely a forward recurrent neural network 606 and a reverse recurrent neural network 608. Each recurrent neural network 606 or 608 is a long short-term memory (LSTM) neural network, a gated recurrent unit (GRU) neural network, or any other suitable in which connections between hidden units form a directed closed path. It may be a recurrent neural network.

[0066]再帰型ニューラルネットワーク６０６及び６０８は、文４０２内の対象語の文脈語から生成される初期文脈ベクトルに基づいて、対象語の文脈ベクトルを出力するように構成されている。いくつかの実施形態において、順方向再帰型ニューラルネットワーク６０６は、順方向初期文脈ベクトルのセットを受信し、順方向初期文脈ベクトルのセットに基づいて対象語の順方向文脈ベクトルを提供するように構成されている。順方向再帰型ニューラルネットワーク６０６には、順方向における順方向初期文脈ベクトルのセットを供給することができる。逆方向再帰型ニューラルネットワーク６０８は、逆方向初期文脈ベクトルのセットを受信し、逆方向初期文脈ベクトルのセットに基づいて対象語の逆方向文脈ベクトルを提供するように構成されている。逆方向再帰型ニューラルネットワーク６０８には、逆方向における逆方向初期文脈ベクトルのセットを供給することができる。いくつかの実施形態において、順方向及び逆方向初期文脈ベクトルのセットは、上述したような単語埋め込みベクトルであってもよい。いくつかの実施形態において、順方向文脈ベクトル及び／又は逆方向文脈ベクトルを生成するために、対象語の見出し語初期文脈ベクトルが、順方向再帰型ニューラルネットワーク６０６及び／又は逆方向再帰型ニューラルネットワーク６０８に供給されてもよいことは諒解されたい。 [0066] The recursive neural networks 606 and 608 are configured to output the context vector of the target word based on the initial context vector generated from the context word of the target word in the sentence 402. In some embodiments, the forward recurrent neural network 606 is configured to receive a set of forward initial context vectors and provide a forward context vector of the target word based on the set of forward initial context vectors. Has been done. The forward recurrent neural network 606 can be supplied with a set of forward initial context vectors in the forward direction. The reverse recurrent neural network 608 is configured to receive a set of reverse initial context vectors and provide the reverse context vector of the target word based on the set of reverse initial context vectors. The reverse recurrent neural network 608 can be supplied with a set of reverse initial context vectors in the reverse direction. In some embodiments, the set of forward and reverse initial context vectors may be word embedding vectors as described above. In some embodiments, the headword initial context vector of the subject is a forward recurrent neural network 606 and / or a reverse recurrent neural network to generate a forward context vector and / or a reverse context vector. Please understand that it may be supplied to 608.

[0067]ここで図５を参照すると、当該例において、順方向再帰型ニューラルネットワークは、順方向における（例えば、順方向初期文脈行列の形態の）順方向初期文脈ベクトルのセットを供給され、順方向文脈ベクトルｆｏｒを生成する。逆方向再帰型ニューラルネットワークは、逆方向における（例えば、逆方向初期文脈行列の形態の）逆方向初期文脈ベクトルのセットを供給され、逆方向文脈ベクトルｂａｃｋを生成する。いくつかの実施形態において、見出し語初期文脈ベクトルｌｅｍが、順方向再帰型ニューラルネットワーク及び／又は逆方向再帰型ニューラルネットワークに供給されてもよいことは諒解されたい。順方向及び逆方向再帰型ニューラルネットワークの各々の隠れユニットの数は少なくとも３００、例えば６００である。当該例において、次いで、対象語ｉの深層文脈ベクトルｉが、順方向文脈ベクトルｆｏｒ及び逆方向文脈ベクトルｂａｃｋを連結することによって生成される。深層文脈ベクトルｉは、対象語ｉの周囲の文脈語１〜ｉ−１及び文脈語ｉ＋１〜ｎ（並びにいくつかの実施形態においては対象語ｉの見出し語）に基づいて、対象語ｉの深層文脈情報を表現する。言い換えれば、深層文脈ベクトルｉは、対象語ｉの周りの結合文脈の埋め込みと考えられ得る。上述したように、深層文脈ベクトルｉは、対象語ｉの文脈を表現するために意味内容を手動で設計し抽出するために、複雑な特徴エンジニアリングが必要とされないため、様々な状況を取り扱うことができる一般的な表現である。 [0067] With reference to FIG. 5, in this example, the forward recurrent neural network is supplied with a set of forward initial context vectors in the forward direction (eg, in the form of a forward initial context matrix), in order. Generate a direction context vector for. A reverse recurrent neural network is fed with a set of reverse initial context vectors (eg, in the form of a reverse initial context matrix) in the reverse direction to generate a reverse context vector back. It should be appreciated that in some embodiments, the headword initial context vector lem may be fed to a forward recurrent neural network and / or a reverse recurrent neural network. The number of hidden units in each of the forward and reverse recursive neural networks is at least 300, for example 600. In this example, the deep context vector i of the target word i is then generated by concatenating the forward context vector for and the reverse context vector back. The deep context vector i is a deep layer of the target word i based on the context words 1 to i-1 and the context words i + 1 to n (and, in some embodiments, the headwords of the target word i) around the target word i. Express contextual information. In other words, the deep context vector i can be thought of as an embedding of the connected context around the target word i. As mentioned above, the deep context vector i can handle various situations because it does not require complicated feature engineering to manually design and extract the semantic content to express the context of the target word i. It is a general expression that can be done.

[0068]図６に戻って参照すると、分類サブモデル６０４は、対象語の文脈ベクトルに基づいて、文法誤りタイプに関する対象語の分類値を出力するように構成されている順伝播型ニューラルネットワーク６１０を含む。順伝播型ニューラルネットワーク６１０は、多層パーセプトロン（ＭＬＰ）ニューラルネットワーク、又は、隠れユニットの間の接続が閉路を形成しない任意の他の適切な順伝播型ニューラルネットワークを含んでもよい。例えば、図５に示すように、深層文脈ベクトルｉは、順伝播型ニューラルネットワークに供給されて、対象語ｉの分類値ｙが生成される。異なる文法誤りタイプについて、分類値ｙは、表Ｉに示すような異なる方法で定義することができる。文法誤りタイプは表Ｉの５つの例に限定されず、分類値ｙの定義もまた、表Ｉに示す例に限定されないことは諒解されたい。いくつかの実施形態において、分類値ｙは、文法誤りタイプと関連付けられるクラス（ラベル）にわたる対象語の確率分布として表現されてもよいことも諒解されたい。
Returning to FIG. 6, the classification submodel 604 is configured to output a target word classification value for a grammatical error type based on the target word context vector 610. including. The feedforward neural network 610 may include a multi-layer perceptron (MLP) neural network, or any other suitable feedforward neural network in which the connections between hidden units do not form a cycle. For example, as shown in FIG. 5, the deep context vector i is supplied to a feedforward neural network to generate a classification value y of the target word i. For different grammatical error types, the classification value y can be defined in different ways as shown in Table I. It should be understood that the grammatical error types are not limited to the five examples in Table I, and the definition of the classification value y is also not limited to the examples shown in Table I. It should also be appreciated that in some embodiments, the classification value y may be expressed as a probability distribution of the target word across the class (label) associated with the grammatical error type.

[0069]いくつかの実施形態において、順伝播型ニューラルネットワーク６１０は、文脈ベクトルに対する全結合線形演算（fully connected linear operation）の第１の活性化関数を有する第１の層を含むことができる。第１の層の第１の活性化関数は、例えば、正規化線形ユニット活性化関数、又は、先行する層（複数可）からの１倍の出力の関数である任意の他の適切な活性化関数であってもよい。順伝播型ニューラルネットワーク６１０はまた、第１の層に接続されており、分類値を生成するための第２の活性化関数を有する第２の層をも含むことができる。第２の層の第２の活性化関数は、例えば、ソフトマックス活性化関数、又は、マルチクラス分類に使用される任意の他の適切な活性化関数であってもよい。 [0069] In some embodiments, the feedforward neural network 610 can include a first layer having a first activation function of a fully connected linear operation on a context vector. The first activation function of the first layer is, for example, a normalized linear unit activation function, or any other suitable activation that is a function of 1x output from the preceding layer (s). It may be a function. The feedforward neural network 610 is also connected to a first layer and can also include a second layer having a second activation function for generating classification values. The second activation function of the second layer may be, for example, a softmax activation function, or any other suitable activation function used for multiclass classification.

[0070]図４に戻ると、いくつかの実施形態において、注意ユニット４１２は、ＡＮＮモデル１２０を使用して、文４０２内の対象語の前の少なくとも１つの単語及び対象語の後の少なくとも１つの単語に基づいて、対象語の文脈重みベクトルを提供するように構成されている。図７は、一実施形態による文法誤り訂正のためのＡＮＮモデル１２０の別の例を示す概略図である。図６に示す例と比較すると、図７のＡＮＮモデル１２０は、注意ユニット４１２によって使用することができる注意メカニズムサブモデル７０２をさらに含む。ひいては、重み付き文脈ベクトルが、文脈重みベクトルを文脈ベクトルに適用することによって計算される。深層文脈表現サブモデル６０２、分類サブモデル６０４、及び注意メカニズムサブモデル７０２は共に、エンドツーエンドの様式で訓練することができる。いくつかの実施形態において、注意メカニズムサブモデル７０２は、対象語の文脈語に基づいて、対象語の文脈重みベクトルを生成するように構成されている順伝播型ニューラルネットワーク７０４を含む。順伝播型ニューラルネットワーク７０４は、文内の各文脈語と対象語との間の距離に基づいて訓練することができる。いくつかの実施形態において、文脈重みベクトルは対象語への異なる距離によって文脈語の重みを調整することができるため、文内のすべての周囲の単語に基づいて初期文脈ベクトルのセットを生成することができ、文脈重みベクトルは、文法的用法に影響を及ぼす文脈語に焦点を当てるように重み付き文脈ベクトルを調整することができる。 [0070] Returning to FIG. 4, in some embodiments, the attention unit 412 uses the ANN model 120 to have at least one word before the target word in sentence 402 and at least one after the target word. It is configured to provide a contextual weight vector for the target word based on one word. FIG. 7 is a schematic diagram showing another example of the ANN model 120 for grammatical error correction according to one embodiment. Compared to the example shown in FIG. 6, the ANN model 120 of FIG. 7 further includes an attention mechanism submodel 702 that can be used by the attention unit 412. As a result, the weighted context vector is calculated by applying the context weight vector to the context vector. The deep contextual representation submodel 602, the classification submodel 604, and the attention mechanism submodel 702 can all be trained in an end-to-end fashion. In some embodiments, the attention mechanism submodel 702 includes a feedforward neural network 704 that is configured to generate a context weight vector for the subject word based on the context word for the subject word. The feedforward neural network 704 can be trained based on the distance between each context word and the target word in the sentence. In some embodiments, the context weight vector can adjust the weight of the context word by different distances to the target word, thus generating a set of initial context vectors based on all the surrounding words in the sentence. The weighted context vector can be adjusted to focus on the contextual words that affect the grammatical usage.

[0071]図４に戻って参照すると、分類比較ユニット４１４は、文法誤りタイプの任意の誤りの存在を検出するために、分類ユニット４１０によって与えられる推定された分類値を、対象語ラベリングユニット４０４によって与えられる実際の分類値と比較するように構成されている。実際の分類値が推定された分類値と同じである場合、文法誤りタイプの誤りは、当該対象語について検出されない。異なる場合、文法誤りタイプの誤りが検出され、推定された分類値が使用されて、訂正が与えられる。例えば、図２に関連して上述されている例において、動詞形態誤りに関連する対象語「ａｄｄｉｎｇ」の推定された分類値は「０」（原形）であり、一方、対象語「ａｄｄｉｎｇ」の実際の分類値は「１」（動名詞又は現在分詞）である。したがって、動詞形態誤りが検出され、訂正は対象語「ａｄｄｉｎｇ」の原形である。 Returning to FIG. 4, the classification comparison unit 414 uses the target word labeling unit 404 to obtain the estimated classification value given by the classification unit 410 in order to detect the presence of any error of the grammatical error type. It is configured to be compared with the actual classification value given by. If the actual classification value is the same as the estimated classification value, no grammatical error type error is detected for the target word. If they are different, a grammatical error type error is detected and the estimated classification value is used to give a correction. For example, in the example described above in relation to FIG. 2, the estimated classification value of the target word "adding" related to the gerund morphological error is "0" (original form), while the target word "adding" The actual classification value is "1" (gerund or present participle). Therefore, a verb morphological error is detected and the correction is the original form of the target word "adding".

[0072]図８は、一実施形態による図６のＡＮＮモデル１２０の一例を示す詳細な概略図である。当該例において、ＡＮＮモデル１２０は、共に訓練される順方向ＧＲＵニューラルネットワーク、逆方向ＧＲＵニューラルネットワーク、及びＭＬＰニューラルネットワークを含む。文「Ｉｇｏｔｏｓｃｈｏｏｌｅｖｅｒｙｄａｙ」内の対象語「ｇｏ」について、順方向文脈語「Ｉ」は左から右への（順方向の）順方向ＧＲＵニューラルネットワークに供給され、逆方向文脈語「ｔｏｓｃｈｏｏｌｅｖｅｒｙｄａｙ」は、右から左への（逆方向の）逆方向ＧＲＵニューラルネットワークに供給される。文脈ｗ_１：ｎを所与とすると、対象語ｗ_ｉの文脈ベクトルは、式１のように定義することができる。

式中、ｌＧＲＵは、所与の文脈において左から右へと（順方向に）単語を読み取るＧＲＵであり、ｒＧＲＵは、逆に右から左へと（逆方向に）単語を読み取るものであり、ｌ／ｆは、文脈語の個別の左から右への／右から左への単語埋め込みを表す。その後、連結されたベクトルは、２辺の相互依存性を捉えるために、ＭＬＰニューラルネットワークに供給される。ＭＬＰニューラルネットワークの第２の層において、ソフトマックス層を使用して、対象語の分類（例えば、対象語、又は、例えば単数若しくは複数などの対象語の状態）を予測することができる。
ＭＬＰ（ｘ）＝ｓｏｆｔｍａｘ（ＲｅＬＵ（ｘ）），（２）
式中、ＲｅＬＵは正規化線形ユニット活性化関数であり、ＲｅＬＵ（ｘ）＝ｍａｘ（０，ｘ）、Ｌ（ｘ）＝Ｗ（ｘ）＋ｂは全結合線形演算である。当該例におけるＡＮＮモデル１２０の最終的な出力は以下のとおりである。
ｙ＝ＭＬＰ（ｂｉＧＲＵ（ｗ_１：ｎ，ｉ）），（３）
式中、ｙは上述したような分類値である。 [0072] FIG. 8 is a detailed schematic showing an example of the ANN model 120 of FIG. 6 according to one embodiment. In that example, the ANN model 120 includes a forward GRU neural network, a reverse GRU neural network, and an MLP neural network that are trained together. For the target word "go" in the sentence "I go to school everyday", the forward context word "I" is fed to the (forward) forward GRU neural network from left to right and the reverse context word "to". "School everyday" is supplied to the (reverse) reverse GRU neural network from right to left. Context w _1: If the _n and given the context vector of the target word w _i can be defined as Equation 1.

In the equation, lGRU is a GRU that reads words from left to right (forward) in a given context, and rGRU is, conversely, that reads words from right to left (in the opposite direction). l / f represents the individual left-to-right / right-to-left word embedding of contextual words. The concatenated vector is then fed to the MLP neural network to capture the interdependence of the two sides. In the second layer of the MLP neural network, the softmax layer can be used to predict the classification of the target word (eg, the target word, or the state of the target word, such as singular or plural).
MLP (x) = softmax (ReLU (x)), (2)
In the equation, ReLU is a rectified linear unit activation function, and ReLU (x) = max (0, x) and L (x) = W (x) + b are fully combined linear operations. The final output of the ANN model 120 in this example is as follows.
y = MLP (biGRU (w _{1: n} , i)), (3)
In the formula, y is a classification value as described above.

[0073]図９は、一実施形態による文の文法誤り訂正のための方法９００の一例を示す流れ図である。方法９００は、ハードウェア（例えば、回路、専用論理、プログラム可能論理、マイクロコードなど）、ソフトウェア（例えば、処理デバイス上で実行する命令）、又はハードウェアとソフトウェアとの組み合わせを含み得る処理論理によって実施することができる。本明細書において与えられている本開示を実施するために、すべてのステップが必要とされるとは限らない場合があることは諒解されたい。さらに、当業者には理解されるように、ステップのいくつかは、同時に実施されてもよく、又は、図９に示す順序とは異なる順序において実施されてもよい。 [0073] FIG. 9 is a flow chart showing an example of a method 900 for correcting a grammatical error in a sentence according to an embodiment. Method 900 may include hardware (eg, circuits, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or processing logic that may include a combination of hardware and software. Can be carried out. It should be appreciated that not all steps may be required to implement the present disclosure given herein. Moreover, as will be appreciated by those skilled in the art, some of the steps may be performed simultaneously or in a different order than that shown in FIG.

[0074]方法９００を図１及び図４を参照しながら説明する。しかしながら、方法９００は、当該例示的実施形態には限定されない。９０２において、文が受信される。文は、入力テキストの一部であり得る。９０２は、ＧＥＣシステム１００の入力前処理モジュール１０２によって実施することができる。９０４において、文内の１つ又は複数の対象語が１つ又は複数の文法誤りタイプに基づいて識別される。各対象語は、１つ又は複数の文法誤りタイプに対応する。９０４は、ＧＥＣシステム１００の構文解析モジュール１０４によって実施することができる。９０６において、対応する文法誤りタイプに関連する１つの対象語の分類が、文法誤りタイプについて訓練されたＡＮＮモデル１２０を使用して推定される。９０８において、文法誤りが、対象語及び対象語の推定された分類に基づいて検出される。検出は、対象語の実際の分類を、対象語の推定された分類と比較することによって行うことができる。９０６及び９０８は、ＧＥＣシステム１００の分類ベースＧＥＣモジュール１０８によって実施することができる。 [0074] Method 900 will be described with reference to FIGS. 1 and 4. However, Method 900 is not limited to the exemplary embodiment. At 902, the sentence is received. The statement can be part of the input text. 902 can be implemented by the input preprocessing module 102 of the GEC system 100. In 904, one or more target words in a sentence are identified based on one or more grammatical error types. Each target word corresponds to one or more grammatical error types. 904 can be implemented by the parsing module 104 of the GEC system 100. At 906, the classification of one target word associated with the corresponding grammatical error type is estimated using the ANN model 120 trained for the grammatical error type. At 908, grammatical errors are detected based on the target word and the estimated classification of the target word. Detection can be performed by comparing the actual classification of the target word with the estimated classification of the target word. 906 and 908 can be implemented by the classification-based GEC module 108 of the GEC system 100.

[0075]９１０において、文内にまだ処理されていない対象語がさらに存在するか否かが判定される。回答が肯定である場合、方法９００は文内の次の対象語を処理するために、９０４に戻る。文内のすべての対象語が処理されると、９１２において、文に対する文法誤り訂正が、文法誤り結果に基づいて与えられる。各対象語の推定された分類が、文法誤り訂正を生成するために使用され得る。文法誤り結果に基づいて、文法スコアを与えることもできる。９１２は、ＧＥＣシステム１００のスコア付け／訂正モジュール１１４によって実施することができる。 In 910, it is determined whether or not there are more target words in the sentence that have not yet been processed. If the answer is affirmative, method 900 returns to 904 to process the next target word in the sentence. When all the target words in the sentence have been processed, in 912, grammatical error correction for the sentence is given based on the grammatical error result. The estimated classification of each subject word can be used to generate grammatical error correction. Grammar scores can also be given based on grammar error results. 912 can be implemented by the scoring / correction module 114 of the GEC system 100.

[0076]図１０は、一実施形態による文法誤りタイプに関連して対象語を分類するための方法１０００の一例を示す流れ図である。方法１０００は、ハードウェア（例えば、回路、専用論理、プログラム可能論理、マイクロコードなど）、ソフトウェア（例えば、処理デバイス上で実行する命令）、又はハードウェアとソフトウェアとの組み合わせを含み得る処理論理によって実施することができる。本明細書において与えられている本開示を実施するために、すべてのステップが必要とされるとは限らない場合があることは諒解されたい。さらに、当業者には理解されるように、ステップのいくつかは、同時に実施されてもよく、又は、図１０に示す順序とは異なる順序において実施されてもよい。 [0076] FIG. 10 is a flow chart showing an example of a method 1000 for classifying target words in relation to a grammatical error type according to one embodiment. Method 1000 may include hardware (eg, circuits, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or processing logic that may include a combination of hardware and software. Can be carried out. It should be appreciated that not all steps may be required to implement the present disclosure given herein. Moreover, as will be appreciated by those skilled in the art, some of the steps may be performed simultaneously or in a different order than that shown in FIG.

[0077]方法１０００を図１及び図４を参照しながら説明する。しかしながら、方法１０００は、当該例示的実施形態には限定されない。１００２において、対象語の文脈ベクトルが、文内の文脈語に基づいて与えられる。文脈語は、文内の対象語の周囲の任意の数の単語であり得る。いくつかの実施形態において、文脈語は、対象語を除く文内のすべての単語を含む。いくつかの実施形態において、文脈語は、対象語の見出し語も含む。文脈ベクトルは、文から抽出される意味特徴を含まない。１００２は、分類ベースＧＥＣモジュール１０８の深層文脈表現ユニット４０８によって実施することができる。 [0077] Method 1000 will be described with reference to FIGS. 1 and 4. However, Method 1000 is not limited to the exemplary embodiment. In 1002, the context vector of the target word is given based on the context word in the sentence. Contextual words can be any number of words around the target word in the sentence. In some embodiments, the contextual word includes all words in the sentence except the target word. In some embodiments, the context word also includes a headword of the subject word. The context vector does not contain the semantic features extracted from the sentence. 1002 can be implemented by the deep context representation unit 408 of the classification-based GEC module 108.

[0078]１００４において、文脈重みベクトルが、文内の文脈語に基づいて与えられる。１００６において、文脈重みベクトルが文脈ベクトルに適用されて、重み付き文脈ベクトルが生成される。文脈重みベクトルは、文脈語の対象語への距離に基づいて、それぞれの重みを文内の各文脈語に適用することができる。１００４及び１００６は、分類ベースＧＥＣモジュール１０８の注意ユニット４１２によって実施することができる。 [0078] In 1004, a context weight vector is given based on the context word in the sentence. At 1006, the context weight vector is applied to the context vector to generate a weighted context vector. The context weight vector can apply each weight to each context word in the sentence, based on the distance of the context word to the target word. 1004 and 1006 can be implemented by the attention unit 412 of the classification-based GEC module 108.

[0079]１００８において、文法誤りタイプに関する対象語の分類値が、対象語の重み付き文脈ベクトルに基づいて与えられる。分類値は、文法誤りタイプと関連付けられる複数のクラスのうちの１つを表す。分類値は、文法誤りタイプと関連付けられるクラスにわたる対象語の確率分布であってもよい。１００８は、分類ベースＧＥＣモジュール１０８の分類ユニット４１０によって実施することができる。 [0079] In 1008, the classification value of the target word for the grammatical error type is given based on the weighted context vector of the target word. The classification value represents one of a plurality of classes associated with a grammatical error type. The classification value may be the probability distribution of the target word across the class associated with the grammatical error type. 1008 can be carried out by the classification unit 410 of the classification base GEC module 108.

[0080]図１１は、一実施形態による文法誤りタイプに関連して対象語を分類するための方法１１００の別の例を示す流れ図である。方法１１００は、ハードウェア（例えば、回路、専用論理、プログラム可能論理、マイクロコードなど）、ソフトウェア（例えば、処理デバイス上で実行する命令）、又はハードウェアとソフトウェアとの組み合わせを含み得る処理論理によって実施することができる。本明細書において与えられている本開示を実施するために、すべてのステップが必要とされるとは限らない場合があることは諒解されたい。さらに、当業者には理解されるように、ステップのいくつかは、同時に実施されてもよく、又は、図１１に示す順序とは異なる順序において実施されてもよい。 FIG. 11 is a flow diagram showing another example of method 1100 for classifying target words in relation to a grammatical error type according to one embodiment. Method 1100 may include hardware (eg, circuits, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or processing logic that may include a combination of hardware and software. Can be carried out. It should be appreciated that not all steps may be required to implement the present disclosure given herein. Moreover, as will be appreciated by those skilled in the art, some of the steps may be performed simultaneously or in a different order than that shown in FIG.

[0081]方法１１００を図１及び図４を参照しながら説明する。しかしながら、方法１１００は、当該例示的実施形態には限定されない。１１０２において、対象語の文法誤りタイプが、例えば、複数の所定の文法誤りタイプから判定される。１１０４において、文脈語のウィンドウサイズが、文法誤りタイプに基づいて決定される。ウィンドウサイズは、文脈語として考えられることになる文内の対象語の前の単語の最大数、及び、対象語の後の単語の最大数を示す。ウィンドウサイズは、異なる文法誤りタイプについて変化し得る。例えば、主語一致誤り及び動詞形態誤りについて、これら２つの誤りタイプは通常、対象語から遠く離れた文脈語からの依存性を必要とするため、文全体が文脈として考慮され得る。冠詞誤り、前置詞誤り、及び名詞単数複数誤りに関して、ウィンドウサイズは、冠詞誤りについては３、５、又は１０、前置詞誤りについては３、５、又は１０、及び、名詞単数複数誤りについては１０、１５、又は２０など、文全体よりも小さくなり得る。 [0081] Method 1100 will be described with reference to FIGS. 1 and 4. However, Method 1100 is not limited to the exemplary embodiment. In 1102, the grammatical error type of the target word is determined from, for example, a plurality of predetermined grammatical error types. At 1104, the window size of the contextual word is determined based on the grammatical error type. The window size indicates the maximum number of words before the target word in the sentence that will be considered as context words, and the maximum number of words after the target word. The window size can vary for different syntax error types. For example, for subject matching errors and verb morphological errors, the entire sentence can be considered as a context, as these two error types usually require a dependency from a context word far from the target word. For article errors, prepositional errors, and noun singular multiple errors, the window size is 3, 5, or 10 for article errors, 3, 5, or 10 for prepositional errors, and 10, 15 for noun singular multiple errors. , Or 20, etc., can be smaller than the entire sentence.

[0082]１１０６において、順方向単語埋め込みベクトルのセットが、対象語の前の文脈語に基づいて生成される。各順方向単語埋め込みベクトルの次元の数は、３００など、少なくとも１００であってもよい。順方向単語埋め込みベクトルのセットが生成される順序は、ウィンドウサイズ内の第１の単語から、対象語の直前の単語への方向（順方向）であり得る。１１０８において、並行して、逆方向単語埋め込みベクトルのセットが、対象語の後の文脈語に基づいて生成される。各逆方向単語埋め込みベクトルの次元の数は、３００など、少なくとも１００であってもよい。逆方向単語埋め込みベクトルのセットが生成される順序は、ウィンドウサイズ内の最後の単語から、対象語の直後の単語への方向（逆方向）であり得る。１１０２、１１０４、１１０６及び１１０８は、分類ベースＧＥＣモジュール１０８の初期文脈生成ユニット４０６によって実施することができる。 At 1106, a set of forward word embedding vectors is generated based on the context word before the target word. The number of dimensions of each forward word embedding vector may be at least 100, such as 300. The order in which the set of forward word embedding vectors is generated can be the direction (forward) from the first word in the window size to the word immediately preceding the target word. At 1108, in parallel, a set of reverse word embedding vectors is generated based on the contextual word after the target word. The number of dimensions of each reverse word embedding vector may be at least 100, such as 300. The order in which the set of reverse word embedding vectors is generated can be from the last word in the window size to the word immediately following the target word (reverse direction). 1102, 1104, 1106 and 1108 can be implemented by the initial context generation unit 406 of the classification-based GEC module 108.

[0083]１１１０において、順方向文脈ベクトルが、順方向単語埋め込みベクトルのセットに基づいて与えられる。順方向単語埋め込みベクトルのセットは、ウィンドウサイズ内の第１の単語の順方向単語埋め込みベクトルから、対象語の直前の単語の順方向単語埋め込みベクトルへの順序（順方向）に従う、再帰型ニューラルネットワークに供給され得る。１１１２において、逆方向文脈ベクトルが、逆方向単語埋め込みベクトルのセットに基づいて与えられる。逆方向単語埋め込みベクトルのセットは、ウィンドウサイズ内の最後の単語の逆方向単語埋め込みベクトルから、対象語の直後の単語の逆方向単語埋め込みベクトルへの順序（逆方向）に従う、別の再帰型ニューラルネットワークに供給され得る。１１１４において、文脈ベクトルが、順方向文脈ベクトル及び逆方向文脈ベクトルを連結することによって与えられる。１１１０、１１１２及び１１１４は、分類ベースＧＥＣモジュール１０８の深層文脈表現ユニット４０８によって実施することができる。 [0083] At 1110, forward context vectors are given based on a set of forward word embedding vectors. A set of forward word embedding vectors is a recursive neural network that follows the order (forward) from the forward word embedding vector of the first word in the window size to the forward word embedding vector of the word immediately preceding the target word. Can be supplied to. At 1112, a reverse context vector is given based on a set of reverse word embedding vectors. A set of reverse word embedding vectors is another recursive neural that follows the order (reverse direction) from the reverse word embedding vector of the last word in the window size to the reverse word embedding vector of the word immediately following the target word. Can be supplied to the network. At 1114, the context vector is given by concatenating the forward context vector and the reverse context vector. 1110, 1112 and 1114 can be implemented by the deep context representation unit 408 of the classification-based GEC module 108.

[0084]１１１６において、全結合線形協力が、文脈ベクトルに適用される。１１１８において、例えば、ＭＬＰニューラルネットワークの第１の層の活性化関数が、全結合線形演算の出力に適用される。活性化関数は、正規化線形ユニット活性化関数であってもよい。１１２０において、文法誤りタイプに関連する対象語の分類値を生成するために、例えば、ＭＬＰニューラルネットワークの第２の層の別の活性化関数が、第１の層の活性化関数の出力に適用される。文法誤りタイプに関連する対象語のマルチクラス分類が、１１１６、１１１８、及び１１２０において、ＭＬＰニューラルネットワークによって文脈ベクトルに基づいて実施され得る。１１１６、１１１８及び１１２０は、分類ベースＧＥＣモジュール１０８の分類ユニット４１０によって実施することができる。 [0084] In 1116, the linear combination of all combinations is applied to the context vector. At 1118, for example, the activation function of the first layer of the MLP neural network is applied to the output of the fully coupled linear operation. The activation function may be a normalized linear unit activation function. In 1120, for example, another activation function of the second layer of the MLP neural network is applied to the output of the activation function of the first layer in order to generate the classification value of the target word related to the grammatical error type. Will be done. Multi-class classification of target words related to grammatical error types can be performed at 1116, 1118, and 1120 by MLP neural networks based on context vectors. 1116, 1118 and 1120 can be implemented by the classification unit 410 of the classification base GEC module 108.

[0085]図１２は、一実施形態による文法スコアを提供するための方法１２００の一例を示す流れ図である。方法１２００は、ハードウェア（例えば、回路、専用論理、プログラム可能論理、マイクロコードなど）、ソフトウェア（例えば、処理デバイス上で実行する命令）、又はハードウェアとソフトウェアとの組み合わせを含み得る処理論理によって実施することができる。本明細書において与えられている本開示を実施するために、すべてのステップが必要とされるとは限らない場合があることは諒解されたい。さらに、当業者には理解されるように、ステップのいくつかは、同時に実施されてもよく、又は、図１２に示す順序とは異なる順序において実施されてもよい。 [0085] FIG. 12 is a flow chart showing an example of a method 1200 for providing a grammar score according to one embodiment. Method 1200 may include hardware (eg, circuits, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or processing logic that may include a combination of hardware and software. Can be carried out. It should be appreciated that not all steps may be required to implement the present disclosure given herein. Moreover, as will be appreciated by those skilled in the art, some of the steps may be performed simultaneously or in a different order than that shown in FIG.

[0086]方法１２００を図１及び図４を参照しながら説明する。しかしながら、方法１２００は、当該例示的実施形態には限定されない。１２０２において、ユーザ因子（user factor, ユーザファクタ）が、ユーザの情報に基づいて判定される。情報は、例えば、母国語、居住地、教育レベル、年齢、履歴スコアなどを含む。１２０４において、精度及び再現率の重みが決定される。精度及び再現率は、一般的に、組み合わせて、ＧＥＣの主な評価の尺度として使用される。精度Ｐ及び再現率Ｒは、以下のように定義される。

式中、ｇは、特定の文法誤りタイプについての、２人の人間の注釈者の究極的な基準であり、ｅは対応するシステムエディットである。多くの他の文法誤りタイプと動詞形態誤りタイプとの間には重複があり得、そのため、ｇは、文法形態誤り性能を計算するときに、すべての文法誤りタイプの注釈に基づき得る。精度と再現率との間の重みは、評価の尺度として精度と再現率とをともに組み合わせるときに調整することができる。例えば、式５において定義されるＦ_０．５は、いくつかの実施形態において、正確なフィードバックがカバー率よりも重要であるとき、２倍の重みを精度に割り当てながら、精度と再現率の両方を組み合わせる。

Ｆｎにおいて、ｎは０〜１である、が、他の例において適用され得ることは諒解されたい。いくつかの実施形態において、異なる文法誤りタイプの重みも変化し得る。 [0086] Method 1200 will be described with reference to FIGS. 1 and 4. However, Method 1200 is not limited to the exemplary embodiment. In 1202, a user factor (user factor) is determined based on user information. The information includes, for example, native language, place of residence, education level, age, history score, and the like. At 1204, accuracy and recall weights are determined. Accuracy and recall are commonly used in combination as the primary measure of GEC assessment. The accuracy P and the recall R are defined as follows.

In the formula, g is the ultimate criterion of two human annotators for a particular grammatical error type, and e is the corresponding system edit. There can be overlap between many other grammatical error types and verb morphological error types, so g can be based on the annotations of all grammatical error types when calculating grammatical morphological error performance. The weight between accuracy and recall can be adjusted when both accuracy and recall are combined as a measure of evaluation. For example, F _0.5 , as defined in Equation 5, in some embodiments, both accuracy and recall, assigning double weights to accuracy when accurate feedback is more important than coverage. To combine.

In Fn, n is 0 to 1, but it should be understood that it can be applied in other examples. In some embodiments, the weights of different grammatical error types can also change.

[0087]１２０６において、スコアリング関数が、ユーザ因子及び重みに基づいて取得される。スコアリング関数は、パラメータとして、ユーザ因子及び重み（異なる文法誤りタイプについて同じ又は異なる）を使用することができる。１２０８において、文内の各対象語の文法誤り結果が受信される。１２１０において、文法スコアが、文法誤り結果及びスコアリング関数に基づいて与えられる。文法誤り結果は、スコアリング関数の変数であり得、ユーザ因子及び重みは、スコアリング関数のパラメータであり得る。１２０２、１２０４、１２０６、１２０８、及び１２１０は、ＧＥＣシステム１００のスコア付け／訂正モジュール１１４によって実施することができる。 [0087] At 1206, a scoring function is acquired based on user factors and weights. The scoring function can use user factors and weights (same or different for different grammatical error types) as parameters. At 1208, the grammatical error result of each target word in the sentence is received. At 1210, grammar scores are given based on grammar error results and scoring functions. The grammatical error result can be a variable of the scoring function, and the user factor and weight can be the parameters of the scoring function. 1202, 1204, 1206, 1208, and 1210 can be performed by the scoring / correction module 114 of the GEC system 100.

[0088]図１３は、一実施形態によるＡＮＮモデル訓練システム１３００を示すブロック図である。ＡＮＮモデル訓練システム１３００は、訓練アルゴリズム１３０８を使用して、目的関数１３０６に基づいて訓練サンプルのセット１３０４に対して特定の文法誤りタイプのために各ＡＮＮモデル１２０を訓練するように構成されているモデル訓練モジュール１３０２を含む。いくつかの実施形態において、各訓練サンプル１３０４は、ネイティブ訓練サンプルであってもよい。本明細書において開示されるものとしてのネイティブ訓練サンプルは、１つ又は複数の文法誤りのある文を含む学習者訓練サンプルとは対照的に、文法誤りのない文を含む。教師あり訓練データのサイズ及び可用性によって制限される、調整された訓練を必要とする、すなわち、教師ありデータを訓練サンプル（例えば、学習者訓練サンプル）として使用する何らかの既知のＧＥＣシステムと比較して、ＡＮＮモデル訓練システム１３００は、ＡＮＮモデル１２０をより効果的且つ効率的に訓練するために、冗長なネイティブ平文コーパスを、訓練サンプル１３０４として利用することができる。例えば、訓練サンプル１３０４は、ｗｉｋｉダンプから取得されてもよい。ＡＮＮモデル訓練システム１３００の訓練サンプル１３０４はネイティブ訓練サンプルに限定されないことは諒解されたい。いくつかの実施形態において、特定の文法誤りタイプについて、ＡＮＮモデル訓練システム１３００は、学習者訓練サンプル、又は、ネイティブ訓練サンプルと学習者訓練サンプルとの組み合わせを使用して、ＡＮＮモデル１２０を訓練することができる。 [0088] FIG. 13 is a block diagram showing the ANN model training system 1300 according to one embodiment. The ANN model training system 1300 is configured to use the training algorithm 1308 to train each ANN model 120 for a particular grammatical error type against a set of training samples 1304 based on the objective function 1306. Includes model training module 1302. In some embodiments, each training sample 1304 may be a native training sample. Native training samples as disclosed herein include sentences without grammatical errors, as opposed to learner training samples containing sentences with one or more grammatical errors. Requires coordinated training, limited by the size and availability of supervised training data, i.e. compared to some known GEC system that uses supervised data as a training sample (eg, a learner training sample) , ANN model training system 1300 can utilize a redundant native plain text corpus as training sample 1304 in order to train ANN model 120 more effectively and efficiently. For example, training sample 1304 may be obtained from a wiki dump. It should be understood that the training sample 1304 of the ANN model training system 1300 is not limited to the native training sample. In some embodiments, for a particular grammatical error type, the ANN model training system 1300 trains the ANN model 120 using a learner training sample or a combination of a native training sample and a learner training sample. be able to.

[0089]図１４は、図１３のＡＮＮモデル訓練システム１３００によって使用される訓練サンプル１３０４の一例の図である。訓練サンプルは、１つ又は複数の文法誤りタイプ１、．．．、ｎと関連付けられる文を含む。訓練サンプルは文法誤りのないネイティブ訓練サンプルであり得るが、上述したように、例えば、ＰｏＳタグに基づいて、特定の単語が１つ又は複数の文法誤りタイプと関連付けられるため、文は依然として文法誤りタイプと関連付けることができる。例えば、文が動詞を含む限り、文は、例えば、動詞形態誤り及び主語一致誤りと関連付けられ得る。１つ又は複数の対象語１、．．．、ｍは、各文法誤りタイプと関連付けられ得る。例えば、文内のすべての動詞は、訓練サンプルにおいて動詞形態誤り又は主語一致誤りに関連する対象語である。各対象語について、対象語は、２つの情報、すなわち、単語埋め込みベクトルセット（行列）ｘ、及び、実際の分類値ｙとさらに関連付けられる。単語埋め込みベクトルセットｘは、文内の対象語の文脈語に基づいて生成することができる。いくつかの実施形態において、単語埋め込みベクトルセットｘは、ｏｎｅ−ｈｏｔベクトルセットのような、任意の他の初期文脈ベクトルセットであってもよいことは諒解されたい。上述したように、実際の分類値ｙは、名詞単数複数誤りに関する単数形の「０」及び複数形の「１」のような、特定の文法誤りタイプに関連するクラスラベルのうちの１つであり得る。したがって、訓練サンプルは、各々が文内の文法誤りタイプに関連する対象語に対応する、単語埋め込みベクトルセットｘ及び実際の分類値ｙの対を含む。 [0089] FIG. 14 is a diagram of an example of training sample 1304 used by the ANN model training system 1300 of FIG. The training sample is one or more grammatical error types 1. .. .. Includes statements associated with, n. The training sample can be a native training sample without grammatical errors, but as mentioned above, the sentence is still grammatical error because, for example, a particular word is associated with one or more grammatical error types based on PoS tags. Can be associated with a type. For example, as long as the sentence contains verbs, the sentence can be associated with, for example, verb morphological errors and subject matching errors. One or more target words 1. .. .. , M can be associated with each grammatical error type. For example, all verbs in a sentence are the target words associated with verb morphological errors or subject matching errors in the training sample. For each target word, the target word is further associated with two pieces of information: a word embedded vector set (matrix) x and an actual classification value y. The word embedded vector set x can be generated based on the context word of the target word in the sentence. It should be noted that in some embodiments, the word embedded vector set x may be any other initial context vector set, such as the one-hot vector set. As mentioned above, the actual classification value y is one of the class labels associated with a particular grammatical error type, such as the singular "0" for the noun singular plural error and the plural "1". possible. Therefore, the training sample contains a pair of word embedded vector sets x and actual classification values y, each corresponding to the target word associated with the grammatical error type in the sentence.

[0090]図１３に戻って参照すると、ＡＮＮモデル１２０は、訓練サンプル１３０４を供給されているときにモデル訓練モジュール１３０２によって共に調整することができる複数のパラメータを含む。モデル訓練モジュール１３０２は、訓練アルゴリズム１３０８を使用して訓練サンプル１３０４に対する目的関数１３０６を最小化するために、ＡＮＮモデル１２０のパラメータを共に調整する。図８に関連して上述した例において、ＡＮＮモデル１２０を訓練するための目的関数は、以下のとおりである。

式中、ｎは訓練サンプル１３０４の数である。訓練アルゴリズム１３０８は、勾配降下アルゴリズム（例えば、確率的勾配降下アルゴリズム）を含む、目的関数１３０６の最小値を求めるための任意の適切な反復最適化アルゴリズムであってもよい。 [0090] Returning to FIG. 13, the ANN model 120 includes a plurality of parameters that can be adjusted together by the model training module 1302 when the training sample 1304 is being fed. The model training module 1302 adjusts the parameters of the ANN model 120 together to minimize the objective function 1306 for the training sample 1304 using the training algorithm 1308. In the example described above in connection with FIG. 8, the objective function for training the ANN model 120 is:

In the formula, n is the number of training samples 1304. The training algorithm 1308 may be any suitable iterative optimization algorithm for finding the minimum value of the objective function 1306, including a gradient descent algorithm (eg, a stochastic gradient descent algorithm).

[0091]図１５は、一実施形態による文法誤り訂正のためのＡＮＮモデル訓練のための方法１５００の一例を示す流れ図である。方法１５００は、ハードウェア（例えば、回路、専用論理、プログラム可能論理、マイクロコードなど）、ソフトウェア（例えば、処理デバイス上で実行する命令）、又はハードウェアとソフトウェアとの組み合わせを含み得る処理論理によって実施することができる。本明細書において与えられている本開示を実施するために、すべてのステップが必要とされるとは限らない場合があることは諒解されたい。さらに、当業者には理解されるように、ステップのいくつかは、同時に実施されてもよく、又は、図１５に示す順序とは異なる順序において実施されてもよい。 [0091] FIG. 15 is a flow chart showing an example of the method 1500 for ANN model training for grammatical error correction according to one embodiment. Method 1500 may include hardware (eg, circuits, dedicated logic, programmable logic, microcode, etc.), software (eg, instructions executed on a processing device), or processing logic that may include a combination of hardware and software. Can be carried out. It should be appreciated that not all steps may be required to implement the present disclosure given herein. Moreover, as will be appreciated by those skilled in the art, some of the steps may be performed simultaneously or in a different order than that shown in FIG.

[0092]方法１５００を、図１３を参照しながら説明する。しかしながら、方法１５００は、当該例示的実施形態には限定されない。１５０２において、文法誤りタイプのためのＡＮＮモデルが与えられる。ＡＮＮモデルは、文法誤りタイプに関連して文内の対象語の分類を推定するためのものである。ＡＮＮモデルは、例えば、図６及び図７に示すものなど、本明細書において開示されている任意のＡＮＮモデルであってもよい。いくつかの実施形態において、ＡＮＮモデルは、文内の対象語の前の少なくとも１つの単語及び対象語の後の少なくとも１つの単語に基づいて、対象語の文脈ベクトルを出力するように構成されている２つの再帰型ニューラルネットワークを含む。いくつかの実施形態において、文脈ベクトルは、訓練サンプル内の文の意味特徴を含まない。上述したように、ＡＮＮモデルは、順方向再帰型ニューラルネットワーク６０６及び逆方向再帰型ニューラルネットワーク６０８としてパラメータ化することができる深層文脈表現サブモデル６０２を含むことができる。ＡＮＮモデルはまた、対象語の文脈ベクトルに基づいて、対象語の分類値を出力するように構成されている順伝播型ニューラルネットワークをも含むことができる。上述したように、ＡＮＮモデルは、順伝播型ニューラルネットワーク６１０としてパラメータ化することができる分類サブモデル６０４を含むことができる。 [0092] Method 1500 will be described with reference to FIG. However, Method 1500 is not limited to the exemplary embodiment. At 1502, an ANN model for grammatical error types is given. The ANN model is for estimating the classification of target words in a sentence in relation to grammatical error types. The ANN model may be any ANN model disclosed herein, for example, those shown in FIGS. 6 and 7. In some embodiments, the ANN model is configured to output a context vector of the subject word based on at least one word before the subject word and at least one word after the subject word in the sentence. Includes two recurrent neural networks. In some embodiments, the context vector does not include the semantic features of the sentences in the training sample. As mentioned above, the ANN model can include a deep contextual representation submodel 602 that can be parameterized as a forward recurrent neural network 606 and a reverse recurrent neural network 608. The ANN model can also include a feedforward neural network that is configured to output the classification value of the target word based on the context vector of the target word. As mentioned above, the ANN model can include a classification submodel 604 that can be parameterized as a feedforward neural network 610.

[0093]１５０４において、訓練サンプルセットが取得される。各訓練サンプルは、対象語を有する文、及び、文法誤りタイプに関連する対象語の実際の分類を含む。いくつかの実施形態において、訓練サンプルは、順方向単語埋め込みベクトルのセット及び逆方向単語埋め込みベクトルのセットを含む、対象語の単語埋め込み行列を含むことができる。各順方向単語埋め込みベクトルは、対象語の前のそれぞれの文脈語に基づいて生成され、各逆方向単語埋め込みベクトルは、対象語の後のそれぞれの文脈語に基づいて生成される。各単語埋め込みベクトルの次元の数は、３００など、少なくとも１００であってもよい。 At 1504, a training sample set is obtained. Each training sample contains sentences with the target word and the actual classification of the target word related to the grammatical error type. In some embodiments, the training sample can include a word embedding matrix for the target word, including a set of forward word embedding vectors and a set of reverse word embedding vectors. Each forward word embedding vector is generated based on each context word before the target word, and each reverse word embedding vector is generated based on each context word after the target word. The number of dimensions of each word embedding vector may be at least 100, such as 300.

[0094]１５０６において、ＡＮＮモデルのパラメータは、例えばエンドツーエンドの様式で、共に調整される。いくつかの実施形態において、各訓練サンプル内の対象語の実際の分類と推定された分類との間の差に基づいて、再帰型ニューラルネットワーク６０６及び６０８と関連付けられる深層文脈表現サブモデル６０２のパラメータの第１のセットが、順伝播型ニューラルネットワーク６１０と関連付けられる分類サブモデル６０４のパラメータの第２のセットと共に調整される。いくつかの実施形態において、順方向再帰型ニューラルネットワーク６０６と関連付けられるパラメータは、逆方向再帰型ニューラルネットワーク６０８と関連付けられるパラメータとは別個のものである。いくつかの実施形態において、ＡＮＮモデルはまた、順伝播型ニューラルネットワーク６１０としてパラメータ化することができる注意メカニズムサブモデル７０２を含むこともできる。順伝播型ニューラルネットワーク６１０と関連付けられる、注意メカニズムサブモデル７０２のパラメータもまた、ＡＮＮモデルの他のパラメータと共に調整することができる。いくつかの実施形態において、ＡＮＮモデルのパラメータは、訓練アルゴリズム１３０８を使用して目的関数１３０６からの各訓練サンプルの対象語の実際の分類と推定された分類との間の差を最小化するために、共に調整される。１５０２、１５０４、及び１５０６は、ＡＮＮモデル訓練システム１３００のモデル訓練モジュール１３０２によって実施することができる。 [0094] In 1506, the parameters of the ANN model are adjusted together, eg, in an end-to-end fashion. In some embodiments, the parameters of the deep contextual representation submodel 602 associated with the recursive neural networks 606 and 608 are based on the difference between the actual and estimated classification of the target words in each training sample. The first set of is coordinated with the second set of parameters of the classification submodel 604 associated with the feedforward neural network 610. In some embodiments, the parameters associated with the forward recurrent neural network 606 are separate from the parameters associated with the reverse recurrent neural network 608. In some embodiments, the ANN model can also include an attention mechanism submodel 702 that can be parameterized as a feedforward neural network 610. The parameters of the attention mechanism submodel 702 associated with the feedforward neural network 610 can also be adjusted along with the other parameters of the ANN model. In some embodiments, the parameters of the ANN model use the training algorithm 1308 to minimize the difference between the actual and estimated classification of the target words of each training sample from the objective function 1306. To be adjusted together. 1502, 1504, and 1506 can be implemented by the model training module 1302 of the ANN model training system 1300.

[0095]図１６は、一実施形態による文法誤り訂正のためのＡＮＮモデル１２０の訓練の一例を示す概略図である。この例において、ＡＮＮモデル１２０は、特定の文法誤りタイプに関連して訓練サンプル１３０４に対して訓練される。訓練サンプル１３０４は、ネイティブテキストからのものであってもよく、図１を参照して上述したように前処理及び構文解析されてもよい。各訓練サンプル１３０４は、文法誤りタイプに関連する対象語を有する文、及び、文法誤りタイプに関連する対象語の実際の分類を含む。いくつかの実施形態において、対象語の単語埋め込み行列、及び、対象語の実際の分類値ｙを含む対を、各訓練サンプル１３０４について取得することができる。単語埋め込み行列ｘは、対象語の前の文脈語に基づいて生成される順方向単語埋め込みベクトルのセット、及び、対象語の後の文脈語に基づいて生成される逆方向単語埋め込みベクトルのセットを含むことができる。したがって、訓練サンプル１３０４は、複数の（ｘ，ｙ）対を含むことができる。 [0095] FIG. 16 is a schematic diagram showing an example of training of the ANN model 120 for grammatical error correction according to one embodiment. In this example, the ANN model 120 is trained against training sample 1304 in relation to a particular grammatical error type. The training sample 1304 may be from native text or may be preprocessed and syntactically analyzed as described above with reference to FIG. Each training sample 1304 contains a sentence having a target word associated with a grammatical error type and an actual classification of the target word associated with the grammatical error type. In some embodiments, a word embedding matrix of the target word and a pair containing the actual classification value y of the target word can be obtained for each training sample 1304. The word embedding matrix x is a set of forward word embedding vectors generated based on the context word before the target word and a set of reverse word embedding vectors generated based on the context word after the target word. Can include. Therefore, training sample 1304 can include multiple (x, y) pairs.

[0096]いくつかの実施形態において、ＡＮＮモデル１２０は、複数の再帰型ニューラルネットワーク１〜ｎ１６０２及び複数の順伝播型ニューラルネットワーク１〜ｍ１６０４を含むことができる。ニューラルネットワーク１６０２及び１６０４の各々は、訓練アルゴリズム１３０８を使用して目的関数１３０６に基づいて訓練サンプル１３０４に対して訓練されるために、パラメータのセットと関連付けられる。再帰型ニューラルネットワーク１６０２は、対象語の文脈語に基づいて、対象語の文脈ベクトルを出力するように構成されている順方向再帰型ニューラルネットワーク及び逆方向再帰型ニューラルネットワークを含むことができる。いくつかの実施形態において、再帰型ニューラルネットワーク１６０２は、対象語の文脈語に基づいて、対象語の単語埋め込み行列を生成するように構成されている別の１つ又は複数の再帰型ニューラルネットワークをさらに含むことができる。順伝播型ニューラルネットワーク１６０４は、対象語の文脈ベクトルに基づいて、対象語の分類値ｙ’を出力するように構成されている順伝播型ニューラルネットワークを含むことができる。いくつかの実施形態において、順伝播型ニューラルネットワーク１６０４はまた、文脈ベクトルに適用されるための文脈重みベクトルを出力するように構成されている別の順伝播型ニューラルネットワークをも含むことができる。ニューラルネットワーク１６０２及び１６０４は、共にエンドツーエンドの様式で訓練することができるように、接続することができる。いくつかの実施形態において、文脈ベクトルは、訓練サンプル１３０４内の文の意味特徴を含まない。 [0096] In some embodiments, the ANN model 120 can include a plurality of recurrent neural networks 1-n 1602 and a plurality of feedforward neural networks 1-m 1604. Each of the neural networks 1602 and 1604 is associated with a set of parameters to be trained against training sample 1304 based on objective function 1306 using training algorithm 1308. The recurrent neural network 1602 can include a forward recurrent neural network and a reverse recurrent neural network configured to output a context vector of the target word based on the context word of the target word. In some embodiments, the recurrent neural network 1602 comprises another recursive neural network configured to generate a word-embedded matrix of the subject word based on the contextual word of the subject word. Further can be included. The feedforward neural network 1604 can include a feedforward neural network configured to output the classification value y'of the target word based on the context vector of the target word. In some embodiments, the feedforward neural network 1604 can also include another feedforward neural network that is configured to output a context weight vector to be applied to the context vector. The neural networks 1602 and 1604 can both be connected so that they can be trained in an end-to-end fashion. In some embodiments, the context vector does not include the semantic features of the sentences in training sample 1304.

[0097]いくつかの実施形態において、各反復について、対応する訓練サンプル１３０４内の対象語の単語埋め込み行列ｘは、ＡＮＮモデル１２０に供給することができ、ニューラルネットワーク１６０２及び１６０４を通過する。推定された分類値ｙ’は、ＡＮＮモデル１２０の出力層（例えば、順伝播型ニューラルネットワーク１６０４の一部）から出力することができる。対応する訓練サンプル１３０４内の対象語の推定された分類値ｙ’及び実際の分類値ｙは、目的関数１３０６に送信することができ、訓練アルゴリズム１３０８を使用する目的関数１３０６によって、推定された分類値ｙ’と実際の分類値ｙとの間の差を使用して、ＡＮＮモデル１２０内のニューラルネットワーク１６０２及び１６０４の各々と関連付けられる各パラメータセットを共に調整することができる。各訓練サンプル１３０４に対して、ＡＮＮモデル１２０内のニューラルネットワーク１６０２及び１６０４の各々と関連付けられる各パラメータセットを反復的に共に調整することによって、推定された分類値ｙ’と実際の分類値ｙとの間の差は小さくなっていき、目的関数１３０６が最適化される。 [0097] In some embodiments, for each iteration, the word embedding matrix x of the target word in the corresponding training sample 1304 can be fed to the ANN model 120 and passes through the neural networks 1602 and 1604. The estimated classification value y'can be output from the output layer of the ANN model 120 (for example, a part of the feedforward neural network 1604). The estimated classification value y'and the actual classification value y of the target word in the corresponding training sample 1304 can be sent to the objective function 1306 and the classification estimated by the objective function 1306 using the training algorithm 1308. The difference between the value y'and the actual classification value y can be used together to adjust each parameter set associated with each of the neural networks 1602 and 1604 in the ANN model 120. For each training sample 1304, the estimated classification value y'and the actual classification value y are obtained by iteratively adjusting each parameter set associated with each of the neural networks 1602 and 1604 in the ANN model 120. The difference between them becomes smaller and the objective function 1306 is optimized.

[0098]例えば、図１７に示すコンピュータシステム１７００のような１つ又は複数のコンピュータシステム１７００を使用して、様々な実施形態を実施することができる。１つ又は複数のコンピュータシステム１７００を使用して、例えば、図３の方法３００、図９の方法９００、図１０の方法１０００、図１１の方法１１００、図１２の方法１２００、及び図１５の方法１５００を実施することができる。例えば、コンピュータシステム１７００は、様々な実施形態に従って、文法誤りを検出及び訂正し、並びに／又は、文法誤りを検出及び訂正するための人工ニューラルネットワークを訓練することができる。コンピュータシステム１７００は、本明細書において説明されている機能を実施することが可能な任意のコンピュータであってもよい。 [0098] For example, various embodiments can be implemented using one or more computer systems 1700, such as the computer system 1700 shown in FIG. Using one or more computer systems 1700, for example, method 300 in FIG. 3, method 900 in FIG. 9, method 1000 in FIG. 10, method 1100 in FIG. 11, method 1200 in FIG. 12, and method 15 in FIG. 1500 can be carried out. For example, computer system 1700 can detect and correct grammatical errors and / or train artificial neural networks to detect and correct grammatical errors according to various embodiments. The computer system 1700 may be any computer capable of performing the functions described herein.

[0099]コンピュータシステム１７００は、本明細書において説明されている機能を実施することが可能な任意の周知のコンピュータであってもよい。コンピュータシステム１７００は、プロセッサ１７０４のような、１つ又は複数のプロセッサ（中央処理装置、すなわちＣＰＵとも呼ばれる）を含む。プロセッサ１７０４は、通信インフラストラクチャ又はバス１７０６に接続される。１つ又は複数のプロセッサ１７０４は各々、グラフィックス処理装置（ＧＰＵ）であってもよい。一実施形態において、ＧＰＵは、数学的に処理の多いアプリケーションを処理するように設計された専用電子回路であるプロセッサである。ＧＰＵは、コンピュータグラフィックスアプリケーション画像、ビデオなどに一般的な数学的の処理量の多いデータのような、大規模なデータブロックの並列処理に効率的な並列構造を有してもよい。 [0099] The computer system 1700 may be any well-known computer capable of performing the functions described herein. The computer system 1700 includes one or more processors (also referred to as a central processing unit, or CPU), such as processor 1704. Processor 1704 is connected to the communication infrastructure or bus 1706. Each of the one or more processors 1704 may be a graphics processing unit (GPU). In one embodiment, the GPU is a processor that is a dedicated electronic circuit designed to handle mathematically processing applications. The GPU may have an efficient parallel structure for parallel processing of large data blocks, such as mathematically high-volume data commonly used in computer graphics applications such as images and videos.

[00100]コンピュータシステム１７００はまた、ユーザ入出力インターフェース（複数可）１７０２を通じて通信インフラストラクチャ１７０６と通信する、モニタ、キーボード、ポインティングデバイスなどのようなユーザ入出力デバイス（複数可）１７０３をも含む。 [00100] The computer system 1700 also includes a user input / output device (s) 1703 such as a monitor, keyboard, pointing device, etc. that communicates with the communication infrastructure 1706 through the user input / output interface (s) 1702.

[00101]コンピュータシステム１７００はまた、ランダムアクセスメモリ（ＲＡＭ）のような、メイン又は一次メモリ１７０８をも含む。メインメモリ１７０８は、１つ又は複数のレベルのキャッシュを含むことができる。メインメモリ１７０８は、制御論理（すなわち、コンピュータソフトウェア）及び／又はデータを記憶されている。コンピュータシステム１７００はまた、１つ又は複数の二次記憶デバイス又はメモリ１７１０をも含むことができる。二次メモリ１７１０は、例えば、ハードディスクドライブ１７１２及び／又はリムーバブル記憶デバイス若しくはドライブ１７１４を含んでもよい。リムーバブル記憶ドライブ１７１４は、フロッピーディスクドライブ、磁気テープドライブ、コンパクトディスクドライブ、光学記憶デバイス、テープバックアップデバイス、及び／又は、任意の他の記憶デバイス／ドライブであってもよい。リムーバブル記憶ドライブ１７１４は、リムーバブル記憶ユニット１７１８と相互作用することができる。リムーバブル記憶ユニット１７１８は、コンピュータソフトウェア（制御論理）及び／又はデータを記憶されているコンピュータ使用可能又は可読記憶デバイスを含む。リムーバブル記憶ユニット１７１８は、フロッピーディスク、磁気テープ、コンパクトディスク、ＤＶＤ、光学記憶ディスク、及び／又は任意の他のコンピュータデータ記憶デバイスであってもよい。リムーバブル記憶ドライブ１７１４は、周知の方法でリムーバブル記憶ユニット１７１８に対して読み出し及び／又は書き込みを行う。 [00101] Computer system 1700 also includes main or primary memory 1708, such as random access memory (RAM). The main memory 1708 can include one or more levels of cache. The main memory 1708 stores control logic (ie, computer software) and / or data. The computer system 1700 may also include one or more secondary storage devices or memory 1710. The secondary memory 1710 may include, for example, a hard disk drive 1712 and / or a removable storage device or drive 1714. The removable storage drive 1714 may be a floppy disk drive, a magnetic tape drive, a compact disk drive, an optical storage device, a tape backup device, and / or any other storage device / drive. The removable storage drive 1714 can interact with the removable storage unit 1718. Removable storage unit 1718 includes computer software (control logic) and / or a computer-enabled or readable storage device that stores data. The removable storage unit 1718 may be a floppy disk, magnetic tape, compact disk, DVD, optical storage disk, and / or any other computer data storage device. The removable storage drive 1714 reads and / or writes to the removable storage unit 1718 by a well-known method.

[00102]例示的な実施形態によれば、二次メモリ１７１０は、コンピュータプログラム並びに／又は他の命令及び／若しくはデータがコンピュータシステム１７００によってアクセスされることを可能にするための他の手段、機器又は他の手法を含んでもよい。そのような手段、機器又は他の手法は、例えば、リムーバブル記憶ユニット１７２２及びインターフェース１７２０を含んでもよい。リムーバブル記憶ユニット１７２２及びインターフェース１７２０の例は、プログラムカートリッジ及びカートリッジインターフェース（ビデオゲームデバイスに見られるものなど）、リムーバブルメモリチップ（ＥＰＲＯＭ又はＰＲＯＭなど）及び関連付けられるソケット、メモリスティック及びＵＳＢポート、メモリカード及び関連付けられるメモリカードスロット、並びに／又は、任意の他のリムーバブル記憶ユニット及び関連付けられるインターフェースを含んでもよい。 [00102] According to an exemplary embodiment, the secondary memory 1710 is another means, apparatus, for allowing a computer program and / or other instructions and / or data to be accessed by the computer system 1700. Alternatively, other methods may be included. Such means, equipment or other techniques may include, for example, a removable storage unit 1722 and an interface 1720. Examples of removable storage units 1722 and interfaces 1720 include program cartridges and cartridge interfaces (such as those found in video game devices), removable memory chips (such as EPROM or PROM) and associated sockets, memory sticks and USB ports, memory cards and It may include an associated memory card slot and / or any other removable storage unit and associated interface.

[00103]コンピュータシステム１７００は、通信又はネットワークインタフェース１７２４をさらに含むことができる。通信インターフェース１７２４は、コンピュータシステム１７００が、遠隔デバイス、遠隔ネットワーク、遠隔エンティティなど（個々に及びまとめて参照符号１７２８によって参照される）の任意の組み合わせと通信及び対話することを可能にする。例えば、通信インターフェース１７２４は、コンピュータシステム１７００が、有線及び／又は無線であってもよく、ＬＡＮ、ＷＡＮ、インターネットなどの任意の組み合わせを含んでもよい通信経路１７２６を介して、遠隔デバイス１７２８と通信することを可能にすることができる。制御論理及び／又はデータは、通信経路１７２６を介して、コンピュータシステム１７００へと、及び、コンピュータシステム１７００から送信することができる。 [00103] Computer system 1700 may further include communication or network interfaces 1724. Communication interface 1724 allows the computer system 1700 to communicate and interact with any combination of remote devices, remote networks, remote entities, etc. (referred to individually and collectively by reference numeral 1728). For example, the communication interface 1724 communicates with the remote device 1728 via a communication path 1726 in which the computer system 1700 may be wired and / or wireless and may include any combination of LAN, WAN, Internet and the like. Can be made possible. The control logic and / or data can be transmitted to and from the computer system 1700 via the communication path 1726.

[00104]一実施形態において、制御論理（ソフトウェア）を記憶されている有形コンピュータ使用可能又は可読媒体を備える有形装置又は製造品も、コンピュータプログラム製品又はプログラム記憶デバイスとして、本明細書において参照される。有形装置又は製造品は、限定ではないが、コンピュータシステム１７００と、メインメモリ１７０８と、二次メモリ１７１０と、リムーバブル記憶ユニット１７１８及び１７２２、並びに、上記の任意の組み合わせを具現化する有形製造品を含む。当該制御論理は、１つ又は複数のデータ処理デバイス（コンピュータシステム１７００など）によって実行されると、当該データ処理デバイスに、本明細書において説明されているように動作させる。 [00104] In one embodiment, a tangible device or manufactured product comprising a tangible computer usable or readable medium in which control logic (software) is stored is also referred to herein as a computer program product or program storage device. .. The tangible device or manufactured product is not limited to the computer system 1700, the main memory 1708, the secondary memory 1710, the removable storage units 1718 and 1722, and the tangible manufactured product embodying any combination of the above. Including. When executed by one or more data processing devices (such as computer system 1700), the control logic causes the data processing device to operate as described herein.

[00105]本開示に含まれる教示に基づいて、図１７に示すもの以外のデータ処理デバイス、コンピュータシステム及び／又はコンピュータアーキテクチャを使用して本開示の実施形態の作成及び使用する方法が、当業者には諒解されよう。特に、実施形態は、本明細書において説明されているもの以外のソフトウェア、ハードウェア、及び／又はオペレーティングシステム実装を用いて動作することができる。 Based on the teachings contained in the present disclosure, methods of making and using embodiments of the present disclosure using data processing devices, computer systems and / or computer architectures other than those shown in FIG. 17 are skilled in the art. Will be understood. In particular, embodiments may operate with software, hardware, and / or operating system implementations other than those described herein.

[00106]概要及び要約の節ではなく、詳細な説明の節は、特許請求の範囲を解釈するために使用されるように意図されていることは諒解されたい。概要及び要約の節は、本発明者（複数可）によって企図されるものとしての、本開示の１つ又は複数の、ただしすべてではない例示的な実施形態を記載し得、したがって、本開示又は添付の特許請求の範囲を限定するようには決して意図されていない。 [00106] It should be understood that the detailed description section, rather than the summary and summary section, is intended to be used to interpret the claims. The summary and summary sections may describe one or more, but not all, exemplary embodiments of the present disclosure as intended by the present inventor (s). It is never intended to limit the scope of the attached claims.

[00107]本開示を例示的な分野及び用途の例示的な実施形態を参照して本明細書において説明したが、本開示は当該実施形態には限定されないことを理解されたい。他の実施形態及び当該実施形態に対する修正が可能であり、本開示の範囲及び精神の内にある。例えば、本段落の一般性を限定することなく、実施形態は、図面に示されており、及び／又は、本明細書において説明されているソフトウェア、ハードウェア、ファームウェア、及び／又はエンティティには限定されない。さらに、実施形態（本明細書において明示的に説明されているか否かにかかわらず）は、本明細書において説明されている例を超えて、分野及び用途に対する多大な有用性を有する。 [00107] Although the present disclosure has been described herein with reference to exemplary embodiments of exemplary fields and uses, it should be understood that the present disclosure is not limited to such embodiments. Other embodiments and modifications to such embodiments are possible and are within the scope and spirit of this disclosure. For example, without limiting the generality of this paragraph, embodiments are limited to the software, hardware, firmware, and / or entities shown in the drawings and / or described herein. Not done. Moreover, embodiments (whether or not expressly described herein) have great utility in the field and application beyond the examples described herein.

[00108]実施形態は、本明細書において、実施形態の指定されている機能及び関係の実施態様を示す機能的構成単位を用いて説明されている。当該機能的構成単位の境界は、説明の便宜上、本明細書においては任意裁量で規定されている。指定されている機能及び関係（又は当該機能及び関係の均等物）が適切に実施される限り、代替の境界を規定することができる。また、代替的な実施形態は、本明細書において説明されているものとは異なる順序付けを使用して、機能ブロック、ステップ、動作、方法などを実施することができる。 [00108] Embodiments are described herein with the use of functional building blocks that indicate embodiments of the specified functions and relationships of the embodiments. The boundaries of the functional building blocks are defined herein at their discretion for convenience of explanation. Alternative boundaries may be defined as long as the specified function and relationship (or equivalent of that function and relationship) is properly implemented. Also, alternative embodiments can implement functional blocks, steps, actions, methods, and the like using an ordering different from that described herein.

[00109]本開示の幅及び範囲は、上述した実施形態例のいずれによっても限定されるべきではなく、添付の特許請求の範囲及び特許請求の範囲の均等物に従ってのみ規定されるべきである。 [00109] The scope and scope of the present disclosure should not be limited by any of the embodiments described above, but should be defined only in accordance with the appended claims and their equivalents.

Claims

A method for detecting grammatical errors
The step of receiving a statement by at least one processor,
A step of identifying one or more target words in the sentence by the at least one processor based on at least partly based on one or more grammatical error types of the one or more target words. An identifying step, each corresponding to at least one of the one or more grammatical error types described above.
An artificial neural network trained by the at least one processor for at least one of the one or more target words to classify the target words related to the corresponding grammatical error type. It is a step of estimating using a model, based on (i) at least one word before the subject word in the sentence and at least one word after the subject word in the sentence. , Two recursive neural networks configured to output the context vector of the subject word, and (ii) the said related to the grammatical error type, at least partially based on the context vector of the subject word. Estimating steps, including a forward-propagating neural network configured to output the classification value of the target word,
A method for grammatical error detection, comprising the step of detecting a grammatical error in the sentence by the at least one processor based on the subject word and the estimated classification of the subject word at least in part.

The estimation step is
Using the two recurrent neural networks, the subject word is based at least in part on the at least one word before the subject word and the at least one word after the subject word in the sentence. Given the context vector of
A claim that further comprises using the feedforward neural network to give the classification value of the subject word associated with the grammatical error type, at least partially based on the context vector of the subject word. The method for detecting a grammatical error described in 1.

The method for detecting grammatical errors according to claim 2, wherein the context vector of the target word is given at least partially based on the headword of the target word.

The estimation step is
To generate a first set of word embedding vectors, where each word embedding vector in the first set of word embedding vectors is each of the at least one word before the subject word in the sentence. To generate the first set of word embedding vectors, which is generated at least partially on the basis of
Generating a second set of word embedding vectors, where each word embedding vector in the second set of word embedding vectors is each of the at least one word after the subject word in the sentence. The method for detecting grammatical errors according to claim 2, further comprising generating a second set of word embedding vectors generated on the basis of at least in part.

The method for detecting grammatical errors according to claim 4, wherein the number of dimensions of each word embedding vector is at least 100.

The at least one word before the target word includes all the words before the target word in the sentence.
The method for detecting a grammatical error according to claim 1, wherein the at least one word after the target word includes all the words after the target word in the sentence.

According to claim 1, the number of the at least one word before the target word and / or the number of the at least one word after the target word is determined at least partially based on the grammatical error type. A method for detecting grammatical errors described.

The estimation step is
Giving a context weight vector of the subject word, at least in part, to the at least one word before the subject word and the at least one word after the subject word in the sentence.
The method for detecting grammatical errors according to claim 2, further comprising applying the context weight vector to the context vector.

The step that gives the context vector
The first recurrent neural network of the two recurrent neural networks is used to give the first context vector of the subject word, at least partially based on the first set of the word embedding vectors. That and
A second recurrent neural network of the two recurrent neural networks is used to give a second context vector for the subject word, at least in part, based on a second set of the word embedding vectors. That and
The method for detecting grammatical errors according to claim 4, further comprising giving the context vector by connecting the first context vector and the second context vector.

A first set of the word embedding vectors is given to the first recurrent neural network, starting from the word embedding vector of the word at the beginning of the sentence.
The grammatical error detection according to claim 9, wherein a second set of the word embedding vectors is provided to the second recurrent neural network, starting from the word embedding vector of the word at the end of the sentence. Method for.

The method for detecting grammatical errors according to claim 1, wherein the number of hidden units in each of the two recursive neural networks is at least 300.

The feedforward neural network
A first layer having a first activation function of a fully combined linear operation on the context vector, and
The method for detecting grammatical errors according to claim 1, further comprising a second layer connected to the first layer and having a second activation function for generating the classification value.

The method for detecting a grammatical error according to claim 1, wherein the classification value is a probability distribution of the target word over a plurality of classes associated with the grammatical error type.

The step to detect
Comparing the estimated classification of the target word with the actual classification of the target word
The method for detecting a grammatical error according to claim 1, further comprising detecting the grammatical error in the sentence when the actual classification of the target word does not match the estimated classification.

The first aspect of claim 1 further comprises a step of providing grammatical error correction of the subject word in response to detection of the grammatical error in the sentence, at least in part based on the estimated classification of the subject word. A method for detecting grammatical errors.

For each of the one or more target words, each artificial neural network trained for the grammatical error type is used to estimate each classification of the target word associated with the corresponding grammatical error type. A step of comparing the estimated classification of the target word with the actual classification of the target word to generate a grammatical error result of the target word.
A step of applying weights to each of the grammatical error results of the one or more target words, at least in part based on the corresponding grammatical error type.
The method for detecting a grammatical error according to claim 1, further comprising a step of giving a grammatical score of the sentence based on the grammatical error result and the weight of the one or more target words.

The method for detecting grammatical errors according to claim 16, wherein the grammar score is given at least in part based on the information associated with the user from which the sentence is received.

The method for detecting grammatical errors according to claim 1, wherein the model is trained by a native training sample.

The method for detecting grammatical errors according to claim 1, wherein both the recurrent neural network and the feedforward neural network are trained.

The model
With another recurrent neural network configured to output a set of initial context vectors to be input to the two recurrent neural networks to generate the context vector,
The method for detecting grammatical errors according to claim 1, further comprising another forward-propagating neural network configured to output a context weight vector to be applied to the context vector.

The method for detecting grammatical errors according to claim 20, wherein all the recurrent neural network and the feedforward neural network are trained together by a native training sample.

With memory
With at least one processor coupled to the memory, the at least one processor
Receiving a statement and
Identifying one or more target words in the sentence based on at least partly based on one or more grammatical error types, each of the one or more target words being said one or more. Identifying and identifying at least one of multiple grammatical error types
For at least one of the one or more target words, the classification of the target word associated with the corresponding grammatical error type is estimated using an artificial neural network model trained for the grammatical error type. That is, the model is (i) a context vector of the subject word based at least in part on at least one word before the subject word and at least one word after the subject word in the sentence. Outputs the classification value of the target word related to the grammatical error type, at least partially based on the context vector of the target word, and (ii) two recursive neural networks configured to generate Estimating and estimating, including forward-propagating neural networks that are configured to
A system for detecting grammatical errors, which is configured to detect a grammatical error in the sentence based on the target word and the estimated classification of the target word at least partially.

In order to estimate the classification of the target word, the at least one processor
Using the two recurrent neural networks, the subject word is based at least in part on the at least one word before the subject word and the at least one word after the subject word in the sentence. Given the context vector of
It is configured to use the feedforward neural network to give the classification value of the target word related to the grammatical error type, at least partially based on the context vector of the target word. The system for detecting grammatical errors according to claim 22.

The system for detecting grammatical errors according to claim 23, wherein the context vector of the target word is given at least partially based on the headword of the target word.

In order to estimate the classification of the target word, the at least one processor
To generate a first set of word embedding vectors, where each word embedding vector in the first set of word embedding vectors is each of the at least one word before the subject word in the sentence. To generate the first set of word embedding vectors, which is generated at least partially on the basis of
Generating a second set of word embedding vectors, where each word embedding vector in the second set of word embedding vectors is each of the at least one word after the subject word in the sentence. 23. The system for grammatical error detection according to claim 23, which is configured to generate a second set of word embedding vectors generated on the basis of at least in part.

The system for detecting grammatical errors according to claim 25, wherein the number of dimensions of each word embedding vector is at least 100.

The at least one word before the target word includes all the words before the target word in the sentence.
The system for detecting grammatical errors according to claim 22, wherein the at least one word after the target word includes all the words after the target word in the sentence.

22. A system for detecting grammatical errors described.

In order to estimate the classification of the target word, the at least one processor
Giving a context weight vector of the subject word, at least in part, to the at least one word before the subject word and the at least one word after the subject word in the sentence.
The system for detecting grammatical errors according to claim 23, which is configured to apply the context weight vector to the context vector.

To give the context vector of the subject word, the at least one processor
The first recurrent neural network of the two recurrent neural networks is used to give the first context vector of the subject word, at least partially based on the first set of the word embedding vectors. That and
A second recurrent neural network of the two recurrent neural networks is used to give a second context vector for the subject word, at least in part, based on a second set of the word embedding vectors. That and
The system for detecting grammatical errors according to claim 25, which is configured to give the context vector by connecting the first context vector and the second context vector.

A first set of the word embedding vectors is given to the first recurrent neural network, starting from the word embedding vector of the word at the beginning of the sentence.
30. The grammatical error detection according to claim 30, wherein a second set of word embedding vectors is provided to the second recurrent neural network, starting from the word embedding vector of the word at the end of the sentence. System for.

The system for detecting grammatical errors according to claim 22, wherein the number of hidden units in each of the two recurrent neural networks is at least 300.

The feedforward neural network
A first layer having a first activation function of a fully combined linear operation on the context vector, and
The system for detecting grammatical errors according to claim 22, further comprising a second layer connected to the first layer and having a second activation function for generating the classification value.

The system for detecting grammatical errors according to claim 22, wherein the classification value is a probability distribution of the target word over a plurality of classes associated with the grammatical error type.

To detect grammatical errors, the at least one processor said
Comparing the estimated classification of the target word with the actual classification of the target word
22. The grammatical error detection according to claim 22, which is configured to detect the grammatical error in the sentence when the actual classification of the target word does not match the estimated classification. System for.

Further such that at least one processor responds to the detection of the grammatical error in the sentence to provide grammatical error correction of the subject word based at least in part on the estimated classification of the subject word. The system for detecting grammatical errors according to claim 22, which is configured.

The at least one processor
For each of the one or more target words, each artificial neural network trained for the grammatical error type is used to estimate each classification of the target word associated with the corresponding grammatical error type. To generate a grammatical error result of the target word by comparing the estimated classification of the target word with the actual classification of the target word.
Applying weights to each of the grammatical error results of the one or more target words, at least in part, based on the corresponding grammatical error type.
The grammatical error detection according to claim 22, further configured to give the grammatical score of the sentence based on the grammatical error result and the weight of the one or more target words. System.

The system for detecting grammatical errors according to claim 37, wherein the grammar score is given at least in part based on the information associated with the user from which the sentence is received.

The system for grammatical error detection according to claim 22, wherein the model is trained by a native training sample.

The system for detecting grammatical errors according to claim 22, wherein both the recurrent neural network and the feedforward neural network are trained.

The model
With another recurrent neural network configured to output a set of initial context vectors to be input to the two recurrent neural networks to generate the context vector,
22. The system for grammatical error detection according to claim 22, further comprising another forward-propagating neural network configured to output a context weight vector to be applied to the context vector.

The system for grammatical error detection according to claim 41, wherein all the recurrent neural network and the feedforward neural network are trained together by a native training sample.

A tangible computer-readable device that, when executed by at least one computing device, stores instructions that cause the at least one computing device to perform the operation.
Receiving a statement and
Identifying one or more target words in the sentence based on at least partly based on one or more grammatical error types, each of the one or more target words being said one or more. Identifying and identifying at least one of multiple grammatical error types
For at least one of the one or more target words, the classification of the target word associated with the corresponding grammatical error type is estimated using an artificial neural network model trained for the grammatical error type. That is, the model is (i) a context vector of the subject word based at least in part on at least one word before the subject word and at least one word after the subject word in the sentence. Outputs the classification value of the target word related to the grammatical error type, at least partially based on the context vector of the target word, and (ii) two recursive neural networks configured to output Estimating and estimating, including forward-propagating neural networks that are configured to
A tangible computer-readable device comprising detecting a grammatical error in the sentence based on the subject word and the estimated classification of the subject word at least in part.

It is a step of providing an artificial neural network model for estimating the classification of a target word in a sentence related to a grammatical error type by at least one processor, wherein the model is (i) the target word in the sentence. Two recursive neural networks configured to output the context vector of the subject word, at least partially based on at least one previous word and at least one word after the subject word, and (ii). ) Provided steps, including a forward propagation neural network configured to output the classification value of the subject word, at least partially based on the context vector of the subject word.
It is a step of acquiring a set of training samples by the at least one processor, and each training sample in the set of training samples is a sentence containing a target word related to the grammatical error type and the grammatical error type. The steps to obtain, including the actual classification of the relevant target word,
The first set of parameters associated with the recurrent neural network and the second set of parameters associated with the feedforward neural network by the at least one processor of the subject term in each training sample. A method for training an artificial neural network model, comprising adjusting together, at least in part, based on the difference between the actual classification and the estimated classification.

The method for training an artificial neural network model according to claim 44, wherein each training sample is a native training sample without grammatical errors.

The artificial neural network model according to claim 44, wherein the recurrent neural network is a gated recurrent unit (GRU) neural network, and the forward neural network is a multi-layer perceptron (MLP) neural network. How to do it.

To train the artificial neural network model of claim 44, wherein the model further comprises another feedforward neural network configured to output a context weight vector to be applied to the context vector. the method of.

The step of coordinating together includes the first set of the parameters, the second set of the parameters, and the third set of parameters associated with the other feedforward neural network in each training sample. 47. The method for training an artificial neural network model according to claim 47, comprising the step of adjusting together, at least in part, based on the difference between said actual classification and said estimated classification of the subject word.

For each training sample
In the step of generating the first set of word embedding vectors, each word embedding vector in the first set of word embedding vectors is each of at least one word before the target word in the training sample. And the steps to generate the first set of word embedding vectors, which are generated on the basis of at least partly.
In the step of generating a second set of word embedding vectors, each word embedding vector in the second set of word embedding vectors is each of at least one word after the target word in the training sample. 44. The method for training an artificial neural network model according to claim 44, further comprising generating a second set of word embedding vectors generated on the basis of at least in part.

The method for training an artificial neural network model according to claim 49, wherein each word embedded vector has at least 100 dimensions.

The at least one word before the target word includes all the words before the target word in the sentence.
The method for training an artificial neural network model according to claim 49, wherein the at least one word after the target word includes all the words after the target word in the sentence.

For each training sample
The first recurrent neural network of the two recurrent neural networks is used to give the first context vector of the subject word, at least partially based on the first set of the word embedding vectors. Steps and
A second recurrent neural network of the two recurrent neural networks is used to give a second context vector for the subject word, at least in part, based on a second set of the word embedding vectors. Steps and
The method for training an artificial neural network model according to claim 49, further comprising the step of giving the context vector by connecting the first context vector and the second context vector.

A first set of the word embedding vectors is given to the first recurrent neural network, starting from the word embedding vector of the word at the beginning of the sentence.
The artificial neural network according to claim 52, wherein a second set of the word embedding vectors is provided to the second recurrent neural network, starting from the word embedding vector of the word at the end of the sentence. A way to train a model.

The method for training an artificial neural network model according to claim 52, wherein the first context vector and the second context vector do not include the semantic features of the sentence in the training sample.

The method for training an artificial neural network model according to claim 44, wherein the number of hidden units in each of the two recurrent neural networks is at least 300.

The feedforward neural network
A first layer having a first activation function of a fully combined linear operation on the context vector, and
44. For training an artificial neural network model according to claim 44, which is connected to the first layer and includes a second layer having a second activation function for generating the classification value. Method.

With memory
With at least one processor coupled to the memory, the at least one processor
To provide an artificial neural network model for estimating the classification of a target word in a sentence related to a grammatical error type, wherein the model is (i) at least one word before the target word in the sentence. And two recursive neural networks configured to output the context vector of the subject word, at least partially based on at least one word after the subject word, and (ii) said of the subject word. Provided, including a forward-propagating neural network configured to output the classification value of the subject word, at least partially based on the context vector.
To acquire a set of training samples, in which each training sample in the set of training samples contains a sentence containing a target word related to the grammatical error type, and an actual target word related to the grammatical error type. To get, including the classification of
The first set of parameters associated with the recurrent neural network and the second set of parameters associated with the feedforward neural network are the actual classification and estimation of the subject word in each training sample. A system for training artificial neural network models that is configured to make adjustments together, at least in part, based on the differences between the classifications made.

The system for training an artificial neural network model according to claim 57, wherein each training sample is a native training sample without grammatical errors.

Train the artificial neural network model of claim 57, wherein the recurrent neural network is a gated recurrent unit (GRU) neural network and the forward neural network is a multi-layer perceptron (MLP) neural network. System to do.

To train the artificial neural network model of claim 57, wherein the model further comprises another feedforward neural network configured to output a context weight vector to be applied to the context vector. System.

To adjust both the first set of parameters and the second set of parameters, the at least one processor
The first set of the parameters, the second set of the parameters, and the third set of parameters associated with the other feedforward neural network are the actual classification of the subject in each training sample. The system for training an artificial neural network model according to claim 60, which is configured to adjust together, at least in part, based on the difference between the and the estimated classification.

The at least one processor, for each training sample,
To generate a first set of word embedding vectors, where each word embedding vector in the first set of word embedding vectors is each of at least one word before the target word in the training sample. To generate the first set of word embedding vectors, which is generated at least partially on the basis of
To generate a second set of word embedding vectors, each word embedding vector in the second set of word embedding vectors is each of at least one word after the subject word in the training sample. 57. A system for training an artificial neural network model, further configured to generate a second set of word embedding vectors, generated on the basis of at least in part. ..

The system for training an artificial neural network model according to claim 62, wherein each word embedding vector has at least 100 dimensions.

The at least one word before the target word includes all the words before the target word in the sentence.
The system for training an artificial neural network model according to claim 62, wherein the at least one word after the target word includes all the words after the target word in the sentence.

The at least one processor, for each training sample,
The first recurrent neural network of the two recurrent neural networks is used to give the first context vector of the subject word, at least partially based on the first set of the word embedding vectors. That and
A second recurrent neural network of the two recurrent neural networks is used to give a second context vector for the subject word, at least in part, based on a second set of the word embedding vectors. That and
To train the artificial neural network model of claim 62, which is further configured to give the context vector by concatenating the first context vector and the second context vector. System.

A first set of the word embedding vectors is given to the first recurrent neural network, starting from the word embedding vector of the word at the beginning of the sentence.
The artificial neural network according to claim 65, wherein a second set of the word embedding vectors is provided to the second recurrent neural network, starting from the word embedding vector of the word at the end of the sentence. A system for training models.

The system for training an artificial neural network model according to claim 65, wherein the first context vector and the second context vector do not include the semantic features of the sentence in the training sample.

The system for training an artificial neural network model according to claim 57, wherein the number of hidden units in each of the two recurrent neural networks is at least 300.

The feedforward neural network
A first layer having a first activation function of a fully combined linear operation on the context vector, and
The artificial neural network model according to claim 57, which is connected to the first layer and includes a second layer having a second activation function for generating the classification value. system.

A tangible computer-readable device that, when executed by at least one computing device, stores instructions that cause the at least one computing device to perform the operation.
To provide an artificial neural network model for estimating the classification of a target word in a sentence related to a grammatical error type, wherein the model is (i) at least one word before the target word in the sentence. And two recursive neural networks configured to output the context vector of the subject word, at least partially based on at least one word after the subject word, and (ii) said of the subject word. Provided, including a forward-propagating neural network configured to output the classification value of the subject word, at least partially based on the context vector.
To acquire a set of training samples, in which each training sample in the set of training samples contains a sentence containing a target word related to the grammatical error type, and an actual target word related to the grammatical error type. To get, including the classification of
The first set of parameters associated with the recurrent neural network and the second set of parameters associated with the feedforward neural network are the actual classification and estimation of the subject word in each training sample. A tangible computer-readable device that includes adjusting together, at least in part, based on differences between the classifications made.