CN115630635B - Chinese text proofreading method, system and equipment based on retrieval and multiple stages - Google Patents
Chinese text proofreading method, system and equipment based on retrieval and multiple stages Download PDFInfo
- Publication number
- CN115630635B CN115630635B CN202211639239.4A CN202211639239A CN115630635B CN 115630635 B CN115630635 B CN 115630635B CN 202211639239 A CN202211639239 A CN 202211639239A CN 115630635 B CN115630635 B CN 115630635B
- Authority
- CN
- China
- Prior art keywords
- text
- sequence
- correction
- sentence
- modification result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000001915 proofreading effect Effects 0.000 title claims abstract description 27
- 238000012937 correction Methods 0.000 claims abstract description 121
- 238000012986 modification Methods 0.000 claims abstract description 53
- 230000004048 modification Effects 0.000 claims abstract description 53
- 238000012545 processing Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 5
- 230000000007 visual effect Effects 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
The embodiment of the invention provides a Chinese text proofreading method, a Chinese text proofreading system and Chinese text proofreading equipment based on retrieval and multiple stages, wherein the method comprises the steps of inputting error correction texts, searching texts which are most similar to the error correction texts in a database, and splicing the most similar texts and the error correction texts to obtain spliced texts; performing spelling correction on the spliced text; performing grammar correction from sequence to editing on the spelling corrected text to obtain a first modification result; performing confusion comparison on the first modification result and the second modification result obtained by setting a threshold range based on the grammar correction from sequence to sequence; the modification result with low confusion is taken as the final modification result. The invention can effectively improve the robustness of the system, and improve the accuracy of detecting errors and correcting errors while improving various types of text errors.
Description
Technical Field
The invention relates to the technical field of automatic Chinese text proofreading, in particular to a Chinese text proofreading method, system and equipment based on retrieval and multiple stages.
Background
The Chinese text proofreading is to detect and correct errors occurring in Chinese text, so as to obtain correct sentences conforming to original meaning. The types of common errors are classified into four categories, redundancy, missing, misspellings, misordering, where misspellings are most likely to occur. Chinese text proofreading can effectively correct text errors, and many researchers are devoted to Chinese text proofreading.
The current common Chinese text proofreading methods include a Chinese spelling error correction method which only solves spelling errors, a sequence-to-edit-based method and a sequence-to-sequence-based method. These methods for Chinese proofing have thousands of years in terms of two important consideration indices for Chinese proofing, namely, the accuracy of error detection and error correction. The Chinese spelling error correction method has good effects on both error detection and correction, but is only aimed at spelling errors, can not effectively correct other errors, is limited by training data, has limited correction capability when migrating from a pre-training stage to a downstream task, and has poor robustness. The sequence-to-edit based approach is capable of correcting four error types, but is weak in detecting errors, does not find errors in text well and corrects them, and does not have the same ability to modify spelling error types as spelling error correction models. Although the sequence-to-sequence based approach is more capable of detecting errors, it is less capable of correcting errors and is also relatively less capable of correcting spelling errors.
Therefore, there is a need to propose a new text collation system to solve the above problems.
Disclosure of Invention
Therefore, the embodiment of the invention provides a Chinese text proofreading method, a Chinese text proofreading system and Chinese text proofreading equipment based on retrieval and multiple stages, which are used for solving the problems of low accuracy and poor robustness of error detection and error correction of the Chinese text proofreading method in the prior art.
The invention provides a Chinese text proofreading method based on retrieval and multiple stages, which comprises the following steps:
s1: inputting error correction texts, searching texts which are most similar to the error correction texts in a database, and splicing the most similar texts and the error correction texts to obtain spliced texts;
s2: performing spelling correction on the spliced text;
s3: performing grammar correction from sequence to editing on the spelling corrected text to obtain a first modification result;
s4: performing confusion comparison on the first modification result and the second modification result obtained by setting a threshold range based on the grammar correction from sequence to sequence;
s5: the modification result with low confusion is taken as the final modification result.
Preferably, the spelling correction of the spliced text comprises the following steps:
processing the error sentence pair and the correct sentence pair into a word alignment format, sending the word alignment format into a Bert encoder to obtain original sentence characteristics, and adopting a Glyce encoder with a CNN structure as a visual information encoder to obtain characteristics of text word pronunciation and fonts;
and integrating the character sound and the character shape characteristics of the text with the characteristics of the original sentence, inputting the integrated character sound and character shape characteristics of the text into a transducer encoder, and obtaining the spelling corrected sentence through a full connection layer.
Preferably, the method for correcting the spelling corrected text based on the grammar from sequence to edition to obtain the first modified result is as follows:
and carrying out grammar correction on the spelling corrected text based on the GECToR model to obtain a first modification result.
Preferably, the basic training data of the gemtor model is an error-correct sentence pair, the input is an error sentence, the input sentence is converted into a corresponding transformation tag, the iteration error correction is carried out through a BERT encoder and a sequence tag transformation, and finally the corrected sentence is output.
Preferably, the method for correcting the modification result II obtained by setting the threshold range based on the sequence-to-sequence grammar by the error correction text is as follows:
the error correction text obtains a modification result II by setting a threshold range to control error correction based on the seq2seq model.
Preferably, the basic training data of the seq2seq model is a correction sentence pair composed of an original sentence and a correct sentence, the correction sentence pair is input as a wrong sentence, and the corrected sentence is output through an encoder-decoder model.
Preferably, the error sentence needs to be processed by BPE before being input into the encoder-decoder model, and needs to be restored after being output from the encoder-decoder model.
Preferably, the first modification result and the error correction text are subjected to confusion comparison based on the second modification result obtained by setting a threshold range through the grammar correction of the sequence-to-sequence, wherein the confusion formula is expressed as follows:
wherein,,representing sentences, & lt>Representing sentence length,/->Indicate->Personal word (s)/(s)>Indicate->Probability of individual words.
The invention provides a Chinese text proofreading system based on retrieval and multiple stages, which comprises:
the input module is used for inputting error correction text;
the retrieval module is used for searching the text which is most similar to the error correction text, and splicing the most similar text with the error correction text to obtain a spliced text;
the spelling correction module is used for correcting the spelling of the spliced text;
the sequence-based editing module is used for carrying out grammar correction based on sequence-based editing on the spelling corrected text to obtain a modification result I;
the sequence-to-sequence-based module is used for correcting the error correction text based on the grammar from sequence to sequence by setting a threshold range to obtain a second modification result;
the confusion degree selection module is used for carrying out confusion degree comparison on the first modification result and the second modification result;
and the output module is used for outputting the modification result with low confusion degree.
The invention also provides a Chinese text proofreading device which comprises the Chinese text proofreading method based on the searching and the multiple stages, and is used for realizing Chinese text proofreading.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the invention provides a Chinese text correction method, a system and equipment based on retrieval and multiple stages, wherein a retrieval module is added on the basis of a Chinese spelling error correction method to provide a certain correct modification opinion for an error fragment, so that the robustness and the accuracy of text correction are enhanced and the cost is reduced under the condition of not using additional manual labeling data; based on a grammar error correction model from sequence to sequence, the error correction is avoided by controlling the output through a threshold value, and the accuracy of text correction is enhanced; combining the spelling error correction model, the sequence-to-sequence-based grammar error correction model and the sequence-to-edit-based grammar error correction model in a strategy, and improving the accuracy of text correction on the basis of keeping the advantages of the three models in text correction; the invention provides a method for selecting a modification result by using the confusion degree of a language model, thereby improving the robustness and the accuracy of text correction.
Drawings
For a clearer description of embodiments of the invention or of solutions in the prior art, reference will be made below to the accompanying drawings, which are used in the embodiments and which are intended to illustrate, but not to limit, the invention, and from which other drawings can be obtained without inventive effort for a person skilled in the art. Wherein:
FIG. 1 is a flow chart of a search and multi-stage based Chinese text collation method in accordance with an embodiment;
FIG. 2 is a flow diagram of a spelling error correction model inference provided in accordance with an embodiment;
FIG. 3 is a flow chart of GECToR model reasoning provided in accordance with an embodiment;
FIG. 4 is a flow chart of the inference of the seq2seq model provided in accordance with an embodiment;
FIG. 5 is a block diagram of a search and multi-stage based Chinese text collation system in accordance with an embodiment.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.
The database used in the embodiments of the present invention is the data set provided by Wang et al, comprising 271329 correct-incorrect sentence pairs, using the correct sentences therein as the data for the retrieval module database.
Referring to fig. 1, an embodiment of the present invention provides a search and multi-stage based chinese text collation method, comprising:
s1: inputting error correction texts, searching texts which are most similar to the error correction texts in a database, and splicing the most similar texts and the error correction texts to obtain spliced texts;
s2: performing spelling correction on the spliced text;
s3: performing grammar correction from sequence to editing on the spelling corrected text to obtain a first modification result;
s4: performing confusion comparison on the first modification result and the second modification result obtained by setting a threshold range based on the grammar correction from sequence to sequence;
s5: the modification result with low confusion is taken as the final modification result.
The invention provides a Chinese text correction method based on retrieval and multiple stages, which is characterized in that a text which is most similar to an error correction text is searched in a database, the most similar text and the error correction text are spliced, a certain correct modification opinion is provided for the error text, and the robustness and the accuracy of text correction are enhanced and the cost is reduced under the condition that additional manual labeling data is not used; the correction text corrects a second modification result obtained by setting a threshold range based on grammar, so that the situation of error correction is avoided, and the accuracy of text correction is enhanced; through the spelling error correction model, the sequence-to-sequence-based grammar error correction model and the sequence-to-edit-based grammar error correction model, the accuracy of text correction is improved on the basis of keeping the advantages of the three models in text correction; and through confusion comparison, the robustness and accuracy of text correction are improved.
Further, in step S1:
inputting error correction text, searching the text most similar to the error correction text in a database by using a BM25 search algorithm, and splicing the most similar text and the error correction text to obtain a spliced text, so that a certain correct modification opinion is provided for the error text, and the robustness and the accuracy of text correction are enhanced and the cost is reduced under the condition that no additional manual labeling data is used.
Further, in step S2:
as shown in fig. 2, the spliced text is input into a spelling error correction model, and the spelling error correction model can only be used for correcting sentences only containing spelling errors, namely, the lengths of the erroneous sentences and the correct sentences are aligned, so that the erroneous and correct sentence pairs are firstly processed into a word aligned format and then are sent into a Bert encoder to obtain the characteristics of original sentences; as most spelling errors can be summarized into two types of word sound errors and font errors, a Glyce encoder with a CNN structure is adopted as a visual information encoder, so that the characteristics of text word sound and font are obtained; finally, the character and the font characteristics and the original characteristics are integrated and input into a transducer encoder, and finally, a full-connection layer is passed through to obtain a corrected sentence.
Further, in step S3:
as shown in fig. 3, the spelling corrected text is modified by grammar correction based on the gemtor model (based on the sequence-to-edit grammar correction model). The idea of the gemtor model is to convert the grammar correction task into a sequence tag task, tag each token, these tag types are shown in table 1, and since the training data is only a wrong-correct sentence pair, the input needs to be first converted into a corresponding conversion tag, and then sent to the model. The model structure of GECToR is a BERT-like transducer model with two fully connected layers and a softmax at the top. Through the transformation of the labels, the error correction operations such as insertion, deletion, replacement and the like can be realized, and multiple rounds of iterative labeling can be performed until no new error is found, and the final result is output.
TABLE 1
Further, in step S4:
as shown in fig. 4, the correction text is corrected based on the seq2seq model (sequence-to-sequence based grammar correction model) by setting a threshold range to control correction to obtain a second modification result. For a seq2seq model for grammar correction task, the basic training data is a correction sentence pair composed of an original sentence and a correct sentence, the input is an error sentence, and the corrected sentence is directly output by using an encoder-decoder model. Before the sentence is input into the model, it needs to make BPE processing, and the output sentence is in BPE form, and needs to be restored to original form, and the method used is to delete the redundant blank space in the BPE result file, and because some of the words in the sentence are ignored and are not in the vocabulary, so that some of the modified sentences contain [ UNK ], and in this case, the modified sentences are directly equal to the original sentences without modification.
Comparing the confusion degree of the first modification result and the second modification result through the language model to obtain a final correction result, wherein the lower the confusion degree is, the more reasonable the description sentence is, and the confusion degree formula is expressed as follows:
wherein,,representing sentences, & lt>Representing sentence length,/->Indicate->Personal word (s)/(s)>Indicate->Probability of individual words.
In order to further illustrate the technical principles of the present invention, specific examples are provided below for illustration.
Taking correction results of a correction text on three methods of a Chinese spelling correction model, a sequence-to-sequence-based grammar correction model and a sequence-to-edit-based grammar correction model as examples:
the input error correction text is: "held full claims are lost by the victim's immediate economic staff and obtained I expect, which can be processed lightly as appropriate. "
Text corrected by the Chinese spelling error correction model: "reported full reimbursement victim near economic loss, and obtained I expect, can be processed lightly as appropriate. "
Text corrected based on a sequence-to-sequence grammar correction model: "reported full compensation is lost by the victim's immediate economic staff and understanding is obtained, which can be processed lightly as appropriate. "
Text corrected based on a sequence-to-edit grammar correction model: "reported full reimbursement is invaded for economic loss in close relatives and understanding is obtained, and light treatment can be considered. "
It can be seen that there is some careless error when the three models are individually error corrected.
The scheme of the invention is as follows:
changing ' top of the grammar into ' right ' by using a retrieval algorithm, changing ' I expect ' into ' forgiveness ' by using a grammar error correction model based on sequence to edit, and finally performing confusion degree selection with a result modified by the grammar error correction model based on sequence to obtain the final result as follows: "reported full reimbursement victim is close to the economic loss and obtains understanding, can be treated from light. "
As shown in FIG. 5, an embodiment of the present invention provides a search and multi-stage based Chinese text collation system comprising:
an input module 10 for inputting error correction text;
the retrieval module 20 is configured to find a text most similar to the error correction text, and splice the most similar text and the error correction text to obtain a spliced text;
a spelling correction module 30, configured to correct spelling of the spliced text;
a sequence-based editing module 40, configured to perform sequence-based editing grammar correction on the spelling corrected text, to obtain a modification result one;
a sequence-to-sequence based module 50, configured to correct the error correction text based on a sequence-to-sequence grammar by setting a threshold range, to obtain a second modification result;
a confusion selecting module 60, configured to perform confusion comparison on the modification result one and the modification result two;
and an output module 70 for outputting the modified result with low confusion.
The system is used for realizing the search and multi-stage-based Chinese text proofreading method, and is not repeated here for avoiding redundancy.
The invention also provides a Chinese text proofreading device which comprises the Chinese text proofreading method based on the searching and the multiple stages, and is used for realizing Chinese text proofreading. The technical principle and the advantageous effects of the device are similar to those of the above method and are not described herein.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.
Claims (9)
1. A search and multi-stage based chinese text collation method comprising:
s1: inputting error correction texts, searching texts which are most similar to the error correction texts in a database, and splicing the most similar texts and the error correction texts to obtain spliced texts;
s2: performing spelling correction on the spliced text;
s3: performing grammar correction from sequence to editing on the spelling corrected text to obtain a first modification result;
s4: performing confusion comparison on the first modification result and the second modification result obtained by setting a threshold range based on the grammar correction from sequence to sequence;
s5: taking the modification result with low confusion degree as the final modification result;
the spelling correction of the spliced text comprises the following steps:
processing the error sentence pair and the correct sentence pair into a word alignment format, sending the word alignment format into a Bert encoder to obtain original sentence characteristics, and adopting a Glyce encoder with a CNN structure as a visual information encoder to obtain characteristics of text word pronunciation and fonts;
and integrating the character sound and the character shape characteristics of the text with the characteristics of the original sentence, inputting the integrated character sound and character shape characteristics of the text into a transducer encoder, and obtaining the spelling corrected sentence through a full connection layer.
2. The search and multi-stage chinese text verification method of claim 1 wherein the sequence-to-edit based grammar correction of spell corrected text results in a modification of:
and carrying out grammar correction on the spelling corrected text based on the GECToR model to obtain a first modification result.
3. The search and multistage based chinese text proofreading method of claim 2, wherein the basic training data of the gemtor model is an erroneous-correct sentence pair, the input is an erroneous sentence, the iterative correction is performed by converting the input sentence into a corresponding transformation tag, and the corrected sentence is finally output by a BERT encoder, a sequence tag transformation.
4. The search and multi-stage based chinese text proofreading method of claim 1, wherein said correction text corrects a modified result two obtained by setting a threshold range based on a sequence-to-sequence grammar by:
the error correction text obtains a modification result II by setting a threshold range to control error correction based on the seq2seq model.
5. The search and multistage based chinese text proofreading method of claim 4, wherein the basic training data of the seq2seq model is a corrected sentence pair composed of an original sentence and a correct sentence, the input is a wrong sentence, and the corrected sentence is output through an encoder-decoder model.
6. The method for correcting Chinese text based on search and multiple stages according to claim 5, wherein the erroneous sentence is subjected to BPE processing before being input into the encoder-decoder model and is subjected to restoration processing after being output from the encoder-decoder model.
7. The search and multi-stage based chinese text collation method according to claim 1, wherein the first modified result and error corrected text are subjected to confusion comparison based on sequence-to-sequence grammar correction by setting a threshold range, wherein a confusion formula is expressed as follows:
8. A search and multi-stage based chinese text collation system comprising:
the input module is used for inputting error correction text;
the retrieval module is used for searching the text which is most similar to the error correction text, and splicing the most similar text with the error correction text to obtain a spliced text;
the spelling correction module is used for correcting the spelling of the spliced text;
the sequence-based editing module is used for carrying out grammar correction based on sequence-based editing on the spelling corrected text to obtain a modification result I;
the sequence-to-sequence-based module is used for correcting the error correction text based on the grammar from sequence to sequence by setting a threshold range to obtain a second modification result;
the confusion degree selection module is used for carrying out confusion degree comparison on the first modification result and the second modification result;
the output module is used for outputting a modification result with low confusion degree;
the spelling correction of the spliced text comprises the following steps:
processing the error sentence pair and the correct sentence pair into a word alignment format, sending the word alignment format into a Bert encoder to obtain original sentence characteristics, and adopting a Glyce encoder with a CNN structure as a visual information encoder to obtain characteristics of text word pronunciation and fonts;
and integrating the character sound and the character shape characteristics of the text with the characteristics of the original sentence, inputting the integrated character sound and character shape characteristics of the text into a transducer encoder, and obtaining the spelling corrected sentence through a full connection layer.
9. A chinese text collation apparatus comprising a search and multi-stage based chinese text collation method according to any one of claims 1 to 7 for implementing chinese text collation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211639239.4A CN115630635B (en) | 2022-12-20 | 2022-12-20 | Chinese text proofreading method, system and equipment based on retrieval and multiple stages |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211639239.4A CN115630635B (en) | 2022-12-20 | 2022-12-20 | Chinese text proofreading method, system and equipment based on retrieval and multiple stages |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115630635A CN115630635A (en) | 2023-01-20 |
CN115630635B true CN115630635B (en) | 2023-04-25 |
Family
ID=84910787
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211639239.4A Active CN115630635B (en) | 2022-12-20 | 2022-12-20 | Chinese text proofreading method, system and equipment based on retrieval and multiple stages |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115630635B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118261149A (en) * | 2024-03-05 | 2024-06-28 | 北京深言科技有限责任公司 | Grammar error correction method and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779970B (en) * | 2021-09-24 | 2023-05-23 | 北京字跳网络技术有限公司 | Text error correction method, device, equipment and computer readable storage medium |
CN114065738B (en) * | 2022-01-11 | 2022-05-17 | 湖南达德曼宁信息技术有限公司 | Chinese spelling error correction method based on multitask learning |
-
2022
- 2022-12-20 CN CN202211639239.4A patent/CN115630635B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115630635A (en) | 2023-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114444479B (en) | End-to-end Chinese speech text error correction method, device and storage medium | |
US9069753B2 (en) | Determining proximity measurements indicating respective intended inputs | |
JP5444308B2 (en) | System and method for spelling correction of non-Roman letters and words | |
US9342499B2 (en) | Round-trip translation for automated grammatical error correction | |
KR20080003364A (en) | Method and system for generating spelling suggestions | |
CN109977220B (en) | Method for reversely generating abstract based on key sentence and key word | |
JP2009145853A (en) | Method and system for generating and detecting confusing sound | |
CN111985234B (en) | Voice text error correction method | |
CN115630635B (en) | Chinese text proofreading method, system and equipment based on retrieval and multiple stages | |
KR20230009564A (en) | Learning data correction method and apparatus thereof using ensemble score | |
CN114218926A (en) | Chinese spelling error correction method and system based on word segmentation and knowledge graph | |
CN114707492B (en) | Vietnam grammar error correction method and device integrating multi-granularity features | |
KR20150092879A (en) | Language Correction Apparatus and Method based on n-gram data and linguistic analysis | |
KR20120052591A (en) | Apparatus and method for error correction in a continuous speech recognition system | |
JP2999768B1 (en) | Speech recognition error correction device | |
CN111462734A (en) | Semantic slot filling model training method and system | |
CN115688703A (en) | Specific field text error correction method, storage medium and device | |
US11341961B2 (en) | Multi-lingual speech recognition and theme-semanteme analysis method and device | |
KR102430918B1 (en) | Device and method for correcting Korean spelling | |
CN114912441A (en) | Text error correction model generation method, error correction method, system, device and medium | |
CN113011149A (en) | Text error correction method and system | |
JP5057916B2 (en) | Named entity extraction apparatus, method, program, and recording medium | |
CN112988955B (en) | Multilingual voice recognition and topic semantic analysis method and device | |
CN118194854B (en) | Chinese text error correction method based on whole word mask and dependency mask | |
CN117575026B (en) | Large model reasoning analysis method, system and product based on external knowledge enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20231225 Address after: 215000 Bamboo Garden Road, Suzhou high tech Zone, Jiangsu Province, No. 209 Patentee after: SUZHOU CHADA SOFTWARE RESEARCH & DEVELOPMENT Co.,Ltd. Address before: No. 188, Shihu West Road, Wuzhong District, Suzhou City, Jiangsu Province Patentee before: SOOCHOW University |