CN117973372A - Chinese grammar error correction method based on pinyin constraint - Google Patents
Chinese grammar error correction method based on pinyin constraint Download PDFInfo
- Publication number
- CN117973372A CN117973372A CN202410144119.XA CN202410144119A CN117973372A CN 117973372 A CN117973372 A CN 117973372A CN 202410144119 A CN202410144119 A CN 202410144119A CN 117973372 A CN117973372 A CN 117973372A
- Authority
- CN
- China
- Prior art keywords
- sound
- model
- error correction
- confusion
- bart
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012937 correction Methods 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000001514 detection method Methods 0.000 claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 29
- 239000011159 matrix material Substances 0.000 claims abstract description 25
- 238000012512 characterization method Methods 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 9
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 3
- 238000003058 natural language processing Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 10
- 101100078144 Mus musculus Msrb1 gene Proteins 0.000 description 6
- 238000012545 processing Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a Chinese grammar error correction method based on pinyin constraint, belonging to the technical field of natural language processing. Firstly, an end-to-end grammar error correction basic model is constructed based on an original BART model, and the model can fully utilize the strong characterization capability of a pre-training language model and improve the error correction performance; then, adding a detection layer after the BART coding is finished, and relieving the problem of overcorrection through effective error detection; then, constructing a sound-like confusion matrix by utilizing the sound-like confusion set of the characters, and fusing the sound-like confusion matrix with the output of the detection layer to obtain sound-like information of the characters containing errors in the input sentence; finally, constraint is made on the output probability of the decoding end by utilizing the sound-like information, so that a more accurate error correction result is obtained.
Description
Technical language
The invention relates to a Chinese grammar error correction method based on pinyin constraint, belonging to the technical field of natural language processing.
Background
Chinese grammar correction is a critical task in the field of natural language processing, the goal of which is to identify and correct grammar errors in chinese text. These errors include, but are not limited to, word order errors, part-of-speech mismatches, and unhappy choice of words in structure, which may impair the clarity and understandability of the text. In view of this, the demand for chinese grammar correction techniques is growing.
In order to enhance the accuracy and processing speed of Chinese grammar error correction, it becomes critical to develop a high-performance error correction model. These models can automatically detect and correct grammatical problems in the text. Further, with the wide application of chinese text in various fields and situations, the chinese grammar correction model needs to adapt to different professional fields and various usage environments to meet various user and application situations.
The objective of Chinese grammar error detection is to automatically detect grammar errors in Chinese natural sentences, such as: missing or excessive ingredients, improper word sequence, etc. The task of detecting chinese grammar generally comprises: whether there is an error, the type of error, the location of the error occurrence, etc. The error correction performance can be effectively improved by reasonably utilizing grammar detection.
In summary, chinese grammar error correction techniques have a crucial role in improving text quality, enhancing user experience, and meeting diverse application requirements. Grammar detection is one of the key technical means of the aim, and helps to ensure the correctness and the speciality of texts and reduce misunderstanding and communication barriers. In addition, it can improve the quality of writing of non-native language persons, promote language learning, and in the field of natural language processing, it can enhance the accuracy of techniques such as machine translation and speech recognition. With the continuous progress of technology, grammar detection will continue to promote the development of the field of Chinese grammar error correction
Disclosure of Invention
The invention provides a Chinese grammar error correction method based on pinyin constraint, which aims to solve the problem of lower Chinese grammar error correction accuracy, and obtains a better experimental result on MAGICDATA task.
The technical scheme of the invention is as follows: a Chinese grammar error correction method based on pinyin constraint comprises the following specific steps:
Step1, constructing a sequence-to-sequence grammar error correction base model based on a pre-training model BART. The grammar error correction basic model adopts a multi-layer multi-head attention mechanism as an encoder and a decoder to effectively capture the context information, and can fully utilize the strong characterization capability of the BART pre-training language model to enhance the error correction effect;
Step2, adding a detection layer at the coding end of the syntax error correction basic model based on the BART, and attempting to filter out correct sentences through a detection module without correction so as to relieve the problem of excessive correction;
Step3, constructing a sound-like confusion matrix by utilizing a sound-like confusion set of the characters, and fusing the sound-like confusion matrix with the output of the detection layer to obtain sound-like information of the characters containing errors in the input sentence;
step4, constraint is carried out on the output probability of the decoding end by utilizing the sound-like information, so that a more accurate error correction result is obtained.
As a further scheme of the invention, the specific steps of Step1 are as follows:
Step1.1, obtaining a Chinese BART model which is open in source and is pre-trained as a pre-training language model;
Step1.2, because the BART pre-training language model uses a multi-layer multi-head attention mechanism to construct an encoder and a decoder, the BART model downloaded in the last step is modified to adapt to the task of grammar correction, so that the strong characterization capability of the pre-training language model is effectively utilized to enhance the grammar correction model;
Step1.3, acquiring a MAGICDATA voice recognition data set of open access, and downloading a voice recognition model from the internet;
step1.4, automatically generating sentences containing the sound-like errors by using a voice recognition model, and combining the sentences containing the sound-like errors with correct sentences in the original dataset to form correct-error sentence pairs, thereby constructing a basic dataset MAGICDATA of a grammar error correction task.
As a further aspect of the present invention, step2 includes the following:
And processing MAGICDATA a text data set, constructing a classification label of the current data set, adding a detection layer at the encoding end of the basic model, and relieving the problem of overcorrection.
The method comprises the following specific steps:
Preprocessing the labeled sentences in the training set and the verification set, and performing two classification according to replacement errors and non-replacement errors, wherein the replacement errors are one of four error types in a grammar error correction task, and the four error types are replacement errors, word order errors, redundancy errors and missing errors respectively. The labels of the two categories are 0 and 1, respectively. Only when a substitution error occurs is 1, the other cases are 0;
The specific steps of Step2 are as follows:
Step2.1, adding a detection layer after the encoder of the BART-based syntax error correction base model. Syntax error detection is considered as a simple two-classification task, outputting 1 if there is a replacement error in the current position, otherwise 0;
Step2.2, preprocessing sentences with labels in the training set and the verification set, marking the sentences as 1 if the current word has a replacement error, and marking the sentences as 0 if the current word has the replacement error;
Step2.3, performing weighted summation on the loss of two tasks of grammar error detection and grammar error correction as a final loss, and updating all model parameters by minimizing the final loss.
As a further scheme of the invention, the specific steps of Step3 are as follows:
step3.1, downloading the phonetic-like character confusion set and the character-like confusion set from some open-source websites. Performing de-duplication and merging on all downloaded confusion sets to obtain a final inclusion confusion set with similar sounds and similar shapes;
Step3.2, preprocessing the confusion set, using tokenizer of the BART-large model to id characters in the dictionary, taking the id of the first character as a Key, taking the id similar to the tone of the character as a Value, and storing the Value as a dictionary format file;
Step3.3, reading the dictionary during model training, mapping the dictionary for the current word segmentation id when the model carries out word segmentation id on an input sentence, obtaining current word segmentation id sound-like character information, and constructing a confusion sound-like matrix for the current sound-like information;
step3.4, multiplying the confusion sound-like matrix obtained in the last step by a result obtained by model detection to obtain the confusion matrix of sound-like information. Since the detection result is classified into two types, 1 is only generated when a replacement error occurs. This ensures that the sound-like information is preserved only if a replacement error occurs in the current character.
As a further scheme of the invention, the specific steps of Step4 are as follows:
Step4.1, mapping the sound-like information confusion matrix obtained in step3.4 into a matrix with output probability dimension, and setting the corresponding position in the dictionary as 1 and the rest positions as 0 according to the sound-like information existing in the current confusion matrix;
step4.2, adding the matrix, which is the same as the output probability dimension, obtained in step4.1 to the final output probability, shows increasing the character probability that sounds similar to the current character.
The beneficial effects of the invention are as follows:
1. the invention integrates common sound-like confusion sets and shape-like confusion sets, and provides a basic confusion set for future researches;
2. The invention uses the detection model to alleviate the problem of excessive correction and uncorrectation of the model.
3. The invention utilizes the sound-like information to effectively improve the accuracy of the model for correcting the replacement errors.
4. The present invention constructs an edit sequence using the erroneous sentence and the correct sentence, and constructs a tag of the detection model using the edit sequence.
Drawings
FIG. 1 is a flow chart of the present invention;
Detailed Description
Example 1: as shown in fig. 1, a pinyin constraint-based Chinese grammar error correction method comprises the following specific steps:
Step1, constructing a sequence-to-sequence grammar error correction base model based on a pre-training model BART. The grammar error correction basic model adopts a multi-layer multi-head attention mechanism as an encoder and a decoder to effectively capture the context information, and can fully utilize the strong characterization capability of the BART pre-training language model to enhance the error correction effect;
and downloading the voice recognition model and the training set for preprocessing and a Chinese BART pre-training model.
The specific steps of Step1 are as follows:
step1.1, a chinese BART model 1 that was open source and already pre-trained was obtained as our basic pre-trained language model. ;
step1.2, because the BART pre-training language model uses a multi-layer multi-head attention mechanism to construct an encoder and a decoder, the BART model downloaded in the last step is modified to adapt to the task of grammar correction, so that the strong characterization capability of the pre-training language model is effectively utilized to enhance the performance of the grammar correction model.
Step1.3, downloading a speech recognition model and MAGICDATA a data set for speech recognition.
Step1.4, automatically generating sentences containing the sound-like errors by using a voice recognition model, and combining the sentences containing the sound-like errors with correct sentences in the original dataset to form correct-error sentence pairs, thereby constructing a basic dataset MAGICDATA of a grammar error correction task.
Specifically, a text MAGICDATA grammar correction dataset is generated using the downloaded MAGICDATA speech recognition dataset and the speech recognition model; preprocessing the generated MAGICDATA data, mainly comprising duplicate removal, deleting of sentence pairs with overlong or excessively short length; the result is 363658 ten thousand more standard syntax error correction data, the number of data sets being shown in table 1.
Table 1 number of MagicData datasets
Data set name | Number of datasets/sentence |
MagicData-train | 342,758 |
MagicData-dev | 6,896 |
MagicData-test | 14,004 |
Step2, adding a detection layer at the coding end of the syntax error correction basic model based on the BART, and attempting to filter out correct sentences through a detection module without correction so as to relieve the problem of excessive correction;
First, the dataset is processed MAGICDATA, the current dataset classification tags are constructed, and finally, a detection layer is added to the basic model encoder, so that the problem of overcorrection is alleviated.
The specific steps of Step2 are as follows:
Step2.1, adding a detection layer after the encoder of the BART-based syntax error correction base model. Syntax error detection is considered as a simple two-classification task, outputting 1 if there is a replacement error in the current position, otherwise 0;
Step2.2, preprocessing sentences with labels in the training set and the verification set, marking the sentences as 1 if the current word has a replacement error, and marking the sentences as 0 if the current word has the replacement error;
Step2.3, performing weighted summation on the loss of two tasks of grammar error detection and grammar error correction as a final loss, and updating all model parameters by minimizing the final loss.
Processing MAGICDATA the dataset to generate text with the edit sequence using the incorrect sentence and the correct sentence, the edit sequence style is shown as "$ STARTSEPL $keycast SEPL |sep keyput SEPL |sep keyworker SEPL |sepr$replace_palace SEPL |sepr$replace_zest SEPL |sepr$replace_theory SEPL |sepr$replace_benefit SEPL |sepr$keysong SEPL |sepr$keyp". In this edit sequence, "KEEP" indicates that the character at the current position remains unchanged, and "replace_palace" indicates that the current character is replaced with "palace". The data set is classified into two categories using the edit sequence.
This basic model contains both the encoder and decoder components that work cooperatively to achieve the goals of syntax error correction. The encoder employs a multi-headed self-attention mechanism that generates a rich contextual representation by contextual modeling each word in the source sentence. The decoder has a similar structure, and a mask multi-head sub-attention module is introduced at the same time to better capture the information of the generated words and ensure that the output sentences are accurate in grammar and semantics. And meanwhile, a pre-training language model is used, so that the context characterization capability of the model is enhanced.
In the training process, the objective function is a minimized cross entropy loss function, and the formula is as follows:
θ is a trainable model parameter, x is a source sentence, y= { y 1,y2,…,yn } is a correct sentence with n words, y <t={y1,y2,…,yt-1 } is a word visible at the t-th time step;
Modification of the underlying model involves introducing a detection layer at the end of the model. The detection layer has independent loss functions, and the loss functions of the original tasks are added according to a certain weight to form a total loss function. After the total loss function is calculated, the gradient is calculated by back-propagation and model parameters are updated using an optimization algorithm such as gradient descent to minimize the total loss function. The overall loss function of the model is shown below
Wherein,To detect layer loss function,/>Is an error correction loss function.
In the prediction stage, an optimal sequence y * is found by using beam search decoding through maximizing conditional probability P (y * |x; theta);
Step3, constructing a sound-like confusion matrix by utilizing a sound-like confusion set of the characters, and fusing the sound-like confusion matrix with the output of the detection layer to obtain sound-like information of the characters containing errors in the input sentence;
the specific steps of Step3 are as follows:
Step3.1, download phonetic-like character confusion sets from some open-source websites. Performing de-duplication and merging on all downloaded confusion sets, arranging all characters in the confusion sets, and placing the similar characters in the same row;
Step3.2, preprocessing the confusion set, using tokenizer of the BART model to id the characters in the dictionary, taking the id of the first character as a Key, taking the ids of other characters similar to the sound of the character as Value, and storing the ids as dictionary format files;
Step3.3, reading the dictionary during model training, mapping the dictionary for the current word segmentation id when the model carries out word segmentation id on an input sentence, obtaining current word segmentation id sound-like character information, and constructing a confusion sound-like matrix for the current sound-like information;
step3.4, multiplying the confusion sound-like matrix obtained in the last step by a result obtained by model detection to obtain the confusion matrix of sound-like information. Since the detection result is classified into two types, 1 is only generated when a replacement error occurs. This ensures that the sound-like information is preserved only if a replacement error occurs in the current character.
Step4, constraint is carried out on the output probability of the decoding end by utilizing the sound-like information, so that a more accurate error correction result is obtained.
The final results are shown in Table 2. On the basis of MAGICDATA data training on the basic model, a detection module and a pinyin constraint model are introduced, and the score difference between the detection module and the pinyin constraint model is obvious. Compared with a basic model, after the detection module is introduced, the F value is improved by 1 unit; and after the pinyin constraint module is introduced, the F value is improved by about 3 units. The method not only shows that the accuracy of replacing the error type in the current grammar error correction model has room for improvement, but also verifies the effectiveness of the pinyin constraint method adopted by us.
A Chinese grammar evaluation index based on characters is used. The evaluation indexes used for Chinese grammar error correction are an accuracy rate P, a recall rate R and a recall rate F 0.5, and the calculation result is as follows:
wherein:
TP (True Positive): the model corrects the erroneous word.
FP (False Positive): the model changes the correct word to the wrong word.
FN (False Negative): the model does not correct the wrong word.
And training the detection model by using the constructed MAGICDATA text data set as a training set, and adjusting the super parameters to obtain the detection model with the best current performance.
TABLE 2 basic model and test results for test models
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (5)
1. A Chinese grammar error correction method based on pinyin constraint is characterized in that: the method comprises the following specific steps:
step1, constructing a sequence-to-sequence grammar error correction basic model based on a pre-training model BART; the grammar error correction basic model adopts a multi-layer multi-head attention mechanism as an encoder and a decoder to effectively capture the context information, and simultaneously, the strong characterization capability of the BART pre-training language model is fully utilized to enhance the error correction effect;
Step2, adding a detection layer at the coding end of the syntax error correction basic model based on the BART, and attempting to filter out correct sentences through a detection module without correction so as to relieve the problem of excessive correction;
Step3, constructing a sound-like confusion matrix by utilizing a sound-like confusion set of the characters, and fusing the sound-like confusion matrix with the output of the detection layer to obtain sound-like information of the characters containing errors in the input sentence;
step4, constraint is carried out on the output probability of the decoding end by utilizing the sound-like information, so that a more accurate error correction result is obtained.
2. The pinyin constraint-based chinese grammar error correction method of claim 1, wherein: the specific steps of Step1 are as follows:
step1.1, acquiring a Chinese BART model which is open in source and is already pre-trained as a basic pre-training language model;
Step1.2, because the BART pre-training language model uses a multi-layer multi-head attention mechanism to construct an encoder and a decoder, the BART model downloaded in the last step is modified to adapt to the task of grammar error correction, so that the strong characterization capability of the pre-training language model is effectively utilized to enhance the performance of the grammar error correction model;
Step1.3, acquiring a MAGICDATA voice recognition data set of open access, and downloading a voice recognition model from the internet;
step1.4, automatically generating sentences containing the sound-like errors by using a voice recognition model, and combining the sentences containing the sound-like errors with correct sentences in the original dataset to form correct-error sentence pairs, thereby constructing a basic dataset MAGICDATA of a grammar error correction task.
3. The pinyin constraint-based chinese grammar error correction method of claim 1, wherein: the specific steps of Step2 are as follows:
Step2.1, adding a detection layer after an encoder of a syntax error correction basic model based on BART; syntax error detection is considered as a simple two-classification task, outputting 1 if there is a replacement error in the current position, otherwise 0;
Step2.2, preprocessing sentences with labels in the training set and the verification set, marking the sentences as 1 if the current word has a replacement error, and marking the sentences as 0 if the current word has the replacement error;
Step2.3, performing weighted summation on the loss of two tasks of grammar error detection and grammar error correction as a final loss, and updating all model parameters by minimizing the final loss.
4. The pinyin constraint-based chinese grammar error correction method of claim 1, wherein: the specific steps of Step3 are as follows:
Step3.1, download phonetic-like character confusion sets from some open-source websites. Performing de-duplication and merging on all downloaded confusion sets, arranging all characters in the confusion sets, and placing the similar characters in the same row;
Step3.2, preprocessing the confusion set, using tokenizer of the BART model to id the characters in the dictionary, taking the id of the first character as a Key, taking the ids of other characters similar to the sound of the character as Value, and storing the ids as dictionary format files;
Step3.3, reading the dictionary during model training, mapping the dictionary for the current word segmentation id when the model carries out word segmentation id on an input sentence, obtaining current word segmentation id sound-like character information, and constructing a confusion sound-like matrix for the current sound-like information;
Step3.4, multiplying the confusion sound-like matrix obtained in the last step by a result obtained by model detection to obtain a confusion matrix of sound-like information; since the detection result is classified into two types, the detection result is 1 only when a replacement error occurs; this ensures that the sound-like information is preserved only if a replacement error occurs in the current character.
5. The pinyin constraint-based chinese grammar error correction method of claim 4, wherein: the specific steps of Step4 are as follows:
Step4.1, mapping the sound-like information confusion matrix obtained in step3.4 into a matrix with output probability dimension, and setting the corresponding position in the dictionary as 1 and the rest positions as 0 according to the sound-like information existing in the current confusion matrix;
Step4.2, adding the matrix with the same dimension as the output probability obtained in step4.1 to the final output probability, and explicitly increasing the character probability similar to the current character.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410144119.XA CN117973372A (en) | 2024-02-01 | 2024-02-01 | Chinese grammar error correction method based on pinyin constraint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410144119.XA CN117973372A (en) | 2024-02-01 | 2024-02-01 | Chinese grammar error correction method based on pinyin constraint |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117973372A true CN117973372A (en) | 2024-05-03 |
Family
ID=90864448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410144119.XA Pending CN117973372A (en) | 2024-02-01 | 2024-02-01 | Chinese grammar error correction method based on pinyin constraint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117973372A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118278394A (en) * | 2024-05-28 | 2024-07-02 | 华东交通大学 | Chinese spelling error correction method |
-
2024
- 2024-02-01 CN CN202410144119.XA patent/CN117973372A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118278394A (en) * | 2024-05-28 | 2024-07-02 | 华东交通大学 | Chinese spelling error correction method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110489760B (en) | Text automatic correction method and device based on deep neural network | |
CN108052499B (en) | Text error correction method and device based on artificial intelligence and computer readable medium | |
CN110276069B (en) | Method, system and storage medium for automatically detecting Chinese braille error | |
CN112183094B (en) | Chinese grammar debugging method and system based on multiple text features | |
CN111651589B (en) | Two-stage text abstract generation method for long document | |
CN114818668B (en) | Name correction method and device for voice transcription text and computer equipment | |
CN112199945A (en) | Text error correction method and device | |
CN111553159B (en) | Question generation method and system | |
CN112905736B (en) | Quantum theory-based unsupervised text emotion analysis method | |
CN117973372A (en) | Chinese grammar error correction method based on pinyin constraint | |
CN113268576B (en) | Deep learning-based department semantic information extraction method and device | |
CN115034218A (en) | Chinese grammar error diagnosis method based on multi-stage training and editing level voting | |
CN111611791A (en) | Text processing method and related device | |
CN114818669B (en) | Method for constructing name error correction model and computer equipment | |
CN115293138A (en) | Text error correction method and computer equipment | |
CN113449514A (en) | Text error correction method and device suitable for specific vertical field | |
CN116306600A (en) | MacBert-based Chinese text error correction method | |
CN114781651A (en) | Small sample learning robustness improving method based on contrast learning | |
CN113961706A (en) | Accurate text representation method based on neural network self-attention mechanism | |
CN116611428A (en) | Non-autoregressive decoding Vietnam text regularization method based on editing alignment algorithm | |
CN115310433A (en) | Data enhancement method for Chinese text proofreading | |
CN112966501B (en) | New word discovery method, system, terminal and medium | |
CN111090720B (en) | Hot word adding method and device | |
CN114925175A (en) | Abstract generation method and device based on artificial intelligence, computer equipment and medium | |
CN111428475B (en) | Construction method of word segmentation word stock, word segmentation method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |