CN117973372A - Chinese grammar error correction method based on pinyin constraint - Google Patents

Chinese grammar error correction method based on pinyin constraint Download PDF

Info

Publication number
CN117973372A
CN117973372A CN202410144119.XA CN202410144119A CN117973372A CN 117973372 A CN117973372 A CN 117973372A CN 202410144119 A CN202410144119 A CN 202410144119A CN 117973372 A CN117973372 A CN 117973372A
Authority
CN
China
Prior art keywords
sound
model
error correction
confusion
bart
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410144119.XA
Other languages
Chinese (zh)
Inventor
李英
朱世昌
余正涛
黄于欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202410144119.XA priority Critical patent/CN117973372A/en
Publication of CN117973372A publication Critical patent/CN117973372A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a Chinese grammar error correction method based on pinyin constraint, belonging to the technical field of natural language processing. Firstly, an end-to-end grammar error correction basic model is constructed based on an original BART model, and the model can fully utilize the strong characterization capability of a pre-training language model and improve the error correction performance; then, adding a detection layer after the BART coding is finished, and relieving the problem of overcorrection through effective error detection; then, constructing a sound-like confusion matrix by utilizing the sound-like confusion set of the characters, and fusing the sound-like confusion matrix with the output of the detection layer to obtain sound-like information of the characters containing errors in the input sentence; finally, constraint is made on the output probability of the decoding end by utilizing the sound-like information, so that a more accurate error correction result is obtained.

Description

Chinese grammar error correction method based on pinyin constraint
Technical language
The invention relates to a Chinese grammar error correction method based on pinyin constraint, belonging to the technical field of natural language processing.
Background
Chinese grammar correction is a critical task in the field of natural language processing, the goal of which is to identify and correct grammar errors in chinese text. These errors include, but are not limited to, word order errors, part-of-speech mismatches, and unhappy choice of words in structure, which may impair the clarity and understandability of the text. In view of this, the demand for chinese grammar correction techniques is growing.
In order to enhance the accuracy and processing speed of Chinese grammar error correction, it becomes critical to develop a high-performance error correction model. These models can automatically detect and correct grammatical problems in the text. Further, with the wide application of chinese text in various fields and situations, the chinese grammar correction model needs to adapt to different professional fields and various usage environments to meet various user and application situations.
The objective of Chinese grammar error detection is to automatically detect grammar errors in Chinese natural sentences, such as: missing or excessive ingredients, improper word sequence, etc. The task of detecting chinese grammar generally comprises: whether there is an error, the type of error, the location of the error occurrence, etc. The error correction performance can be effectively improved by reasonably utilizing grammar detection.
In summary, chinese grammar error correction techniques have a crucial role in improving text quality, enhancing user experience, and meeting diverse application requirements. Grammar detection is one of the key technical means of the aim, and helps to ensure the correctness and the speciality of texts and reduce misunderstanding and communication barriers. In addition, it can improve the quality of writing of non-native language persons, promote language learning, and in the field of natural language processing, it can enhance the accuracy of techniques such as machine translation and speech recognition. With the continuous progress of technology, grammar detection will continue to promote the development of the field of Chinese grammar error correction
Disclosure of Invention
The invention provides a Chinese grammar error correction method based on pinyin constraint, which aims to solve the problem of lower Chinese grammar error correction accuracy, and obtains a better experimental result on MAGICDATA task.
The technical scheme of the invention is as follows: a Chinese grammar error correction method based on pinyin constraint comprises the following specific steps:
Step1, constructing a sequence-to-sequence grammar error correction base model based on a pre-training model BART. The grammar error correction basic model adopts a multi-layer multi-head attention mechanism as an encoder and a decoder to effectively capture the context information, and can fully utilize the strong characterization capability of the BART pre-training language model to enhance the error correction effect;
Step2, adding a detection layer at the coding end of the syntax error correction basic model based on the BART, and attempting to filter out correct sentences through a detection module without correction so as to relieve the problem of excessive correction;
Step3, constructing a sound-like confusion matrix by utilizing a sound-like confusion set of the characters, and fusing the sound-like confusion matrix with the output of the detection layer to obtain sound-like information of the characters containing errors in the input sentence;
step4, constraint is carried out on the output probability of the decoding end by utilizing the sound-like information, so that a more accurate error correction result is obtained.
As a further scheme of the invention, the specific steps of Step1 are as follows:
Step1.1, obtaining a Chinese BART model which is open in source and is pre-trained as a pre-training language model;
Step1.2, because the BART pre-training language model uses a multi-layer multi-head attention mechanism to construct an encoder and a decoder, the BART model downloaded in the last step is modified to adapt to the task of grammar correction, so that the strong characterization capability of the pre-training language model is effectively utilized to enhance the grammar correction model;
Step1.3, acquiring a MAGICDATA voice recognition data set of open access, and downloading a voice recognition model from the internet;
step1.4, automatically generating sentences containing the sound-like errors by using a voice recognition model, and combining the sentences containing the sound-like errors with correct sentences in the original dataset to form correct-error sentence pairs, thereby constructing a basic dataset MAGICDATA of a grammar error correction task.
As a further aspect of the present invention, step2 includes the following:
And processing MAGICDATA a text data set, constructing a classification label of the current data set, adding a detection layer at the encoding end of the basic model, and relieving the problem of overcorrection.
The method comprises the following specific steps:
Preprocessing the labeled sentences in the training set and the verification set, and performing two classification according to replacement errors and non-replacement errors, wherein the replacement errors are one of four error types in a grammar error correction task, and the four error types are replacement errors, word order errors, redundancy errors and missing errors respectively. The labels of the two categories are 0 and 1, respectively. Only when a substitution error occurs is 1, the other cases are 0;
The specific steps of Step2 are as follows:
Step2.1, adding a detection layer after the encoder of the BART-based syntax error correction base model. Syntax error detection is considered as a simple two-classification task, outputting 1 if there is a replacement error in the current position, otherwise 0;
Step2.2, preprocessing sentences with labels in the training set and the verification set, marking the sentences as 1 if the current word has a replacement error, and marking the sentences as 0 if the current word has the replacement error;
Step2.3, performing weighted summation on the loss of two tasks of grammar error detection and grammar error correction as a final loss, and updating all model parameters by minimizing the final loss.
As a further scheme of the invention, the specific steps of Step3 are as follows:
step3.1, downloading the phonetic-like character confusion set and the character-like confusion set from some open-source websites. Performing de-duplication and merging on all downloaded confusion sets to obtain a final inclusion confusion set with similar sounds and similar shapes;
Step3.2, preprocessing the confusion set, using tokenizer of the BART-large model to id characters in the dictionary, taking the id of the first character as a Key, taking the id similar to the tone of the character as a Value, and storing the Value as a dictionary format file;
Step3.3, reading the dictionary during model training, mapping the dictionary for the current word segmentation id when the model carries out word segmentation id on an input sentence, obtaining current word segmentation id sound-like character information, and constructing a confusion sound-like matrix for the current sound-like information;
step3.4, multiplying the confusion sound-like matrix obtained in the last step by a result obtained by model detection to obtain the confusion matrix of sound-like information. Since the detection result is classified into two types, 1 is only generated when a replacement error occurs. This ensures that the sound-like information is preserved only if a replacement error occurs in the current character.
As a further scheme of the invention, the specific steps of Step4 are as follows:
Step4.1, mapping the sound-like information confusion matrix obtained in step3.4 into a matrix with output probability dimension, and setting the corresponding position in the dictionary as 1 and the rest positions as 0 according to the sound-like information existing in the current confusion matrix;
step4.2, adding the matrix, which is the same as the output probability dimension, obtained in step4.1 to the final output probability, shows increasing the character probability that sounds similar to the current character.
The beneficial effects of the invention are as follows:
1. the invention integrates common sound-like confusion sets and shape-like confusion sets, and provides a basic confusion set for future researches;
2. The invention uses the detection model to alleviate the problem of excessive correction and uncorrectation of the model.
3. The invention utilizes the sound-like information to effectively improve the accuracy of the model for correcting the replacement errors.
4. The present invention constructs an edit sequence using the erroneous sentence and the correct sentence, and constructs a tag of the detection model using the edit sequence.
Drawings
FIG. 1 is a flow chart of the present invention;
Detailed Description
Example 1: as shown in fig. 1, a pinyin constraint-based Chinese grammar error correction method comprises the following specific steps:
Step1, constructing a sequence-to-sequence grammar error correction base model based on a pre-training model BART. The grammar error correction basic model adopts a multi-layer multi-head attention mechanism as an encoder and a decoder to effectively capture the context information, and can fully utilize the strong characterization capability of the BART pre-training language model to enhance the error correction effect;
and downloading the voice recognition model and the training set for preprocessing and a Chinese BART pre-training model.
The specific steps of Step1 are as follows:
step1.1, a chinese BART model 1 that was open source and already pre-trained was obtained as our basic pre-trained language model. ;
step1.2, because the BART pre-training language model uses a multi-layer multi-head attention mechanism to construct an encoder and a decoder, the BART model downloaded in the last step is modified to adapt to the task of grammar correction, so that the strong characterization capability of the pre-training language model is effectively utilized to enhance the performance of the grammar correction model.
Step1.3, downloading a speech recognition model and MAGICDATA a data set for speech recognition.
Step1.4, automatically generating sentences containing the sound-like errors by using a voice recognition model, and combining the sentences containing the sound-like errors with correct sentences in the original dataset to form correct-error sentence pairs, thereby constructing a basic dataset MAGICDATA of a grammar error correction task.
Specifically, a text MAGICDATA grammar correction dataset is generated using the downloaded MAGICDATA speech recognition dataset and the speech recognition model; preprocessing the generated MAGICDATA data, mainly comprising duplicate removal, deleting of sentence pairs with overlong or excessively short length; the result is 363658 ten thousand more standard syntax error correction data, the number of data sets being shown in table 1.
Table 1 number of MagicData datasets
Data set name Number of datasets/sentence
MagicData-train 342,758
MagicData-dev 6,896
MagicData-test 14,004
Step2, adding a detection layer at the coding end of the syntax error correction basic model based on the BART, and attempting to filter out correct sentences through a detection module without correction so as to relieve the problem of excessive correction;
First, the dataset is processed MAGICDATA, the current dataset classification tags are constructed, and finally, a detection layer is added to the basic model encoder, so that the problem of overcorrection is alleviated.
The specific steps of Step2 are as follows:
Step2.1, adding a detection layer after the encoder of the BART-based syntax error correction base model. Syntax error detection is considered as a simple two-classification task, outputting 1 if there is a replacement error in the current position, otherwise 0;
Step2.2, preprocessing sentences with labels in the training set and the verification set, marking the sentences as 1 if the current word has a replacement error, and marking the sentences as 0 if the current word has the replacement error;
Step2.3, performing weighted summation on the loss of two tasks of grammar error detection and grammar error correction as a final loss, and updating all model parameters by minimizing the final loss.
Processing MAGICDATA the dataset to generate text with the edit sequence using the incorrect sentence and the correct sentence, the edit sequence style is shown as "$ STARTSEPL $keycast SEPL |sep keyput SEPL |sep keyworker SEPL |sepr$replace_palace SEPL |sepr$replace_zest SEPL |sepr$replace_theory SEPL |sepr$replace_benefit SEPL |sepr$keysong SEPL |sepr$keyp". In this edit sequence, "KEEP" indicates that the character at the current position remains unchanged, and "replace_palace" indicates that the current character is replaced with "palace". The data set is classified into two categories using the edit sequence.
This basic model contains both the encoder and decoder components that work cooperatively to achieve the goals of syntax error correction. The encoder employs a multi-headed self-attention mechanism that generates a rich contextual representation by contextual modeling each word in the source sentence. The decoder has a similar structure, and a mask multi-head sub-attention module is introduced at the same time to better capture the information of the generated words and ensure that the output sentences are accurate in grammar and semantics. And meanwhile, a pre-training language model is used, so that the context characterization capability of the model is enhanced.
In the training process, the objective function is a minimized cross entropy loss function, and the formula is as follows:
θ is a trainable model parameter, x is a source sentence, y= { y 1,y2,…,yn } is a correct sentence with n words, y <t={y1,y2,…,yt-1 } is a word visible at the t-th time step;
Modification of the underlying model involves introducing a detection layer at the end of the model. The detection layer has independent loss functions, and the loss functions of the original tasks are added according to a certain weight to form a total loss function. After the total loss function is calculated, the gradient is calculated by back-propagation and model parameters are updated using an optimization algorithm such as gradient descent to minimize the total loss function. The overall loss function of the model is shown below
Wherein,To detect layer loss function,/>Is an error correction loss function.
In the prediction stage, an optimal sequence y * is found by using beam search decoding through maximizing conditional probability P (y * |x; theta);
Step3, constructing a sound-like confusion matrix by utilizing a sound-like confusion set of the characters, and fusing the sound-like confusion matrix with the output of the detection layer to obtain sound-like information of the characters containing errors in the input sentence;
the specific steps of Step3 are as follows:
Step3.1, download phonetic-like character confusion sets from some open-source websites. Performing de-duplication and merging on all downloaded confusion sets, arranging all characters in the confusion sets, and placing the similar characters in the same row;
Step3.2, preprocessing the confusion set, using tokenizer of the BART model to id the characters in the dictionary, taking the id of the first character as a Key, taking the ids of other characters similar to the sound of the character as Value, and storing the ids as dictionary format files;
Step3.3, reading the dictionary during model training, mapping the dictionary for the current word segmentation id when the model carries out word segmentation id on an input sentence, obtaining current word segmentation id sound-like character information, and constructing a confusion sound-like matrix for the current sound-like information;
step3.4, multiplying the confusion sound-like matrix obtained in the last step by a result obtained by model detection to obtain the confusion matrix of sound-like information. Since the detection result is classified into two types, 1 is only generated when a replacement error occurs. This ensures that the sound-like information is preserved only if a replacement error occurs in the current character.
Step4, constraint is carried out on the output probability of the decoding end by utilizing the sound-like information, so that a more accurate error correction result is obtained.
The final results are shown in Table 2. On the basis of MAGICDATA data training on the basic model, a detection module and a pinyin constraint model are introduced, and the score difference between the detection module and the pinyin constraint model is obvious. Compared with a basic model, after the detection module is introduced, the F value is improved by 1 unit; and after the pinyin constraint module is introduced, the F value is improved by about 3 units. The method not only shows that the accuracy of replacing the error type in the current grammar error correction model has room for improvement, but also verifies the effectiveness of the pinyin constraint method adopted by us.
A Chinese grammar evaluation index based on characters is used. The evaluation indexes used for Chinese grammar error correction are an accuracy rate P, a recall rate R and a recall rate F 0.5, and the calculation result is as follows:
wherein:
TP (True Positive): the model corrects the erroneous word.
FP (False Positive): the model changes the correct word to the wrong word.
FN (False Negative): the model does not correct the wrong word.
And training the detection model by using the constructed MAGICDATA text data set as a training set, and adjusting the super parameters to obtain the detection model with the best current performance.
TABLE 2 basic model and test results for test models
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (5)

1. A Chinese grammar error correction method based on pinyin constraint is characterized in that: the method comprises the following specific steps:
step1, constructing a sequence-to-sequence grammar error correction basic model based on a pre-training model BART; the grammar error correction basic model adopts a multi-layer multi-head attention mechanism as an encoder and a decoder to effectively capture the context information, and simultaneously, the strong characterization capability of the BART pre-training language model is fully utilized to enhance the error correction effect;
Step2, adding a detection layer at the coding end of the syntax error correction basic model based on the BART, and attempting to filter out correct sentences through a detection module without correction so as to relieve the problem of excessive correction;
Step3, constructing a sound-like confusion matrix by utilizing a sound-like confusion set of the characters, and fusing the sound-like confusion matrix with the output of the detection layer to obtain sound-like information of the characters containing errors in the input sentence;
step4, constraint is carried out on the output probability of the decoding end by utilizing the sound-like information, so that a more accurate error correction result is obtained.
2. The pinyin constraint-based chinese grammar error correction method of claim 1, wherein: the specific steps of Step1 are as follows:
step1.1, acquiring a Chinese BART model which is open in source and is already pre-trained as a basic pre-training language model;
Step1.2, because the BART pre-training language model uses a multi-layer multi-head attention mechanism to construct an encoder and a decoder, the BART model downloaded in the last step is modified to adapt to the task of grammar error correction, so that the strong characterization capability of the pre-training language model is effectively utilized to enhance the performance of the grammar error correction model;
Step1.3, acquiring a MAGICDATA voice recognition data set of open access, and downloading a voice recognition model from the internet;
step1.4, automatically generating sentences containing the sound-like errors by using a voice recognition model, and combining the sentences containing the sound-like errors with correct sentences in the original dataset to form correct-error sentence pairs, thereby constructing a basic dataset MAGICDATA of a grammar error correction task.
3. The pinyin constraint-based chinese grammar error correction method of claim 1, wherein: the specific steps of Step2 are as follows:
Step2.1, adding a detection layer after an encoder of a syntax error correction basic model based on BART; syntax error detection is considered as a simple two-classification task, outputting 1 if there is a replacement error in the current position, otherwise 0;
Step2.2, preprocessing sentences with labels in the training set and the verification set, marking the sentences as 1 if the current word has a replacement error, and marking the sentences as 0 if the current word has the replacement error;
Step2.3, performing weighted summation on the loss of two tasks of grammar error detection and grammar error correction as a final loss, and updating all model parameters by minimizing the final loss.
4. The pinyin constraint-based chinese grammar error correction method of claim 1, wherein: the specific steps of Step3 are as follows:
Step3.1, download phonetic-like character confusion sets from some open-source websites. Performing de-duplication and merging on all downloaded confusion sets, arranging all characters in the confusion sets, and placing the similar characters in the same row;
Step3.2, preprocessing the confusion set, using tokenizer of the BART model to id the characters in the dictionary, taking the id of the first character as a Key, taking the ids of other characters similar to the sound of the character as Value, and storing the ids as dictionary format files;
Step3.3, reading the dictionary during model training, mapping the dictionary for the current word segmentation id when the model carries out word segmentation id on an input sentence, obtaining current word segmentation id sound-like character information, and constructing a confusion sound-like matrix for the current sound-like information;
Step3.4, multiplying the confusion sound-like matrix obtained in the last step by a result obtained by model detection to obtain a confusion matrix of sound-like information; since the detection result is classified into two types, the detection result is 1 only when a replacement error occurs; this ensures that the sound-like information is preserved only if a replacement error occurs in the current character.
5. The pinyin constraint-based chinese grammar error correction method of claim 4, wherein: the specific steps of Step4 are as follows:
Step4.1, mapping the sound-like information confusion matrix obtained in step3.4 into a matrix with output probability dimension, and setting the corresponding position in the dictionary as 1 and the rest positions as 0 according to the sound-like information existing in the current confusion matrix;
Step4.2, adding the matrix with the same dimension as the output probability obtained in step4.1 to the final output probability, and explicitly increasing the character probability similar to the current character.
CN202410144119.XA 2024-02-01 2024-02-01 Chinese grammar error correction method based on pinyin constraint Pending CN117973372A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410144119.XA CN117973372A (en) 2024-02-01 2024-02-01 Chinese grammar error correction method based on pinyin constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410144119.XA CN117973372A (en) 2024-02-01 2024-02-01 Chinese grammar error correction method based on pinyin constraint

Publications (1)

Publication Number Publication Date
CN117973372A true CN117973372A (en) 2024-05-03

Family

ID=90864448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410144119.XA Pending CN117973372A (en) 2024-02-01 2024-02-01 Chinese grammar error correction method based on pinyin constraint

Country Status (1)

Country Link
CN (1) CN117973372A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118278394A (en) * 2024-05-28 2024-07-02 华东交通大学 Chinese spelling error correction method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118278394A (en) * 2024-05-28 2024-07-02 华东交通大学 Chinese spelling error correction method

Similar Documents

Publication Publication Date Title
CN110489760B (en) Text automatic correction method and device based on deep neural network
CN108052499B (en) Text error correction method and device based on artificial intelligence and computer readable medium
CN110276069B (en) Method, system and storage medium for automatically detecting Chinese braille error
CN112183094B (en) Chinese grammar debugging method and system based on multiple text features
CN111651589B (en) Two-stage text abstract generation method for long document
CN114818668B (en) Name correction method and device for voice transcription text and computer equipment
CN112199945A (en) Text error correction method and device
CN111553159B (en) Question generation method and system
CN112905736B (en) Quantum theory-based unsupervised text emotion analysis method
CN117973372A (en) Chinese grammar error correction method based on pinyin constraint
CN113268576B (en) Deep learning-based department semantic information extraction method and device
CN115034218A (en) Chinese grammar error diagnosis method based on multi-stage training and editing level voting
CN111611791A (en) Text processing method and related device
CN114818669B (en) Method for constructing name error correction model and computer equipment
CN115293138A (en) Text error correction method and computer equipment
CN113449514A (en) Text error correction method and device suitable for specific vertical field
CN116306600A (en) MacBert-based Chinese text error correction method
CN114781651A (en) Small sample learning robustness improving method based on contrast learning
CN113961706A (en) Accurate text representation method based on neural network self-attention mechanism
CN116611428A (en) Non-autoregressive decoding Vietnam text regularization method based on editing alignment algorithm
CN115310433A (en) Data enhancement method for Chinese text proofreading
CN112966501B (en) New word discovery method, system, terminal and medium
CN111090720B (en) Hot word adding method and device
CN114925175A (en) Abstract generation method and device based on artificial intelligence, computer equipment and medium
CN111428475B (en) Construction method of word segmentation word stock, word segmentation method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination