CN116090441B - Chinese spelling error correction method integrating local semantic features and global semantic features - Google Patents
Chinese spelling error correction method integrating local semantic features and global semantic features Download PDFInfo
- Publication number
- CN116090441B CN116090441B CN202211740208.8A CN202211740208A CN116090441B CN 116090441 B CN116090441 B CN 116090441B CN 202211740208 A CN202211740208 A CN 202211740208A CN 116090441 B CN116090441 B CN 116090441B
- Authority
- CN
- China
- Prior art keywords
- word
- error correction
- sentence
- candidate
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a Chinese spelling error correction method integrating local semantic features and global semantic features, which comprises the following steps: for a document, a sentence collection is obtained through a sentence dividing module; for each sentence, obtaining error correction suggestions through a pipeline type error correction model and an end-to-end error correction model; in order to prevent correct word error correction, obtaining error correction suggestions through an error correction filtering module; and finally, outputting the end-to-end error correction model and the pipeline error correction model to obtain the final output correct sentence and correct document through the model fusion module. The invention has the advantages of wide error correction range, high error correction accuracy and the like.
Description
Technical Field
The invention relates to the field of Internet, in particular to a Chinese spelling error correction method integrating local semantic features and global semantic features.
Background
Chinese spelling correction is an important technology for automatic sentence checking and automatic correction in text correction, and aims to improve word correctness and reduce manual verification cost. In government affairs, media, law, education and other industries, manuscript writing occupies an important position, and the traditional manual correction workload is huge, so that the intelligent and accurate error correction system has wide application prospect. In the face of the complex and various characteristics of Chinese text semantics, a set of Chinese spelling error correction system which fuses local semantic features and global semantic features is designed.
Therefore, it is necessary to provide a new technical solution.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention discloses a Chinese spelling error correction method integrating local semantic features and global semantic features, which comprises the following specific technical scheme:
the invention provides a Chinese spelling error correction method integrating local semantic features and global semantic features, which comprises the following steps:
for a document, a sentence collection is obtained through a sentence dividing module;
for each sentence, obtaining error correction suggestions through a pipeline type error correction model and an end-to-end error correction model;
in order to prevent correct word error correction, obtaining error correction suggestions through an error correction filtering module;
and finally, outputting the end-to-end error correction model and the pipeline error correction model to obtain the final output correct sentence and correct document through the model fusion module.
Further, the end-to-end error correction model acquires semantic vector representation of each character in the sentence by using a feature encoder based on a transducer architecture, transmits the semantic vector representation into a feedforward neural network prediction vocabulary, introduces constraint rules based on voice similarity and shape similarity at an output end, and for each position in the original sentence, the feedforward neural network picks out the candidate characters with the highest probability of the first k corresponding to the prediction vocabulary,
traversing candidate characters in turn according to the probability size:
if the original character is a punctuation mark, maintaining the original character;
if the predicted character is the original character or is not in the vocabulary, maintaining the original character;
if the pronunciation and the font similarity of the predicted character and the original character are within the threshold value, the predicted character is used as an answer, otherwise, the method traverses backwards; if the traversal is finished, no character satisfying the condition is found, and the original character is maintained.
Furthermore, the pipeline type error correction model adopts a three-stage error correction method of error detection, candidate recall and candidate sorting.
Further, the error detection adopts a method based on a global semantic model and a global semantic model,
the method based on the local semantic model comprises the following steps:
s1, mining a bi-gram dictionary through a large-scale field corpus, counting word frequencies, obtaining a bi-gram word sequence after a sentence to be predicted is segmented, and if the bi-gram word is not in the dictionary and the word frequency is smaller than a set threshold value, considering that the word is possibly wrong, and adding candidate wrong words;
s2, training a 5-element ngram language model through a large-scale field corpus, wherein for a given Chinese character string, if any error exists in a sentence, the error word can appear in continuous single words, the error word can appear after Chinese word segmentation, and the probability of one character in the bi-gram model only depends on the word immediately before, wherein the probability of the character string is approximated by the product of the following series of conditional probabilities:
the probability of each term in the above equation may be calculated from the maximum likelihood estimates:
its N (c) l-1 ,c l )N(c l-1 ) And respectively represent the number of occurrences of the character string in the given corpus,
if the probability of bi-gram word is greater than the set threshold, then the word is added to the candidate wrong word.
Further, the candidate recalls include pronunciation-like recalls and font-like recalls,
the pronunciation similarity recall is divided into word recall and word recall for each error candidate word by constructing a similar pinyin dictionary from a bi-gram word library obtained by large-scale corpus mining:
the word recall finds all the sounds of the word in a recall library of word sounds, and then finds all the words of the sounds according to the sounds to add a candidate set;
the word recall obtains all the pinyin of each word for each word, combines all the word sounds to obtain all the word sounds, finds all the words of each pinyin in the pinyin library according to each word sound, adds the candidate set,
and recalling the similar word by each word in the word according to a similar word library, combining all the words to obtain the similar word, calculating the similarity of each similar word and the wrong candidate word, and adding the similar word into the candidate set if the similarity value is larger than a set threshold value.
Further, the candidate ranking obtains correction suggested words through coarse ranking and fine ranking for recall candidate sets of wrong words,
in the thick line, each word is calculated according to the following formula using the ngram language model
The first k candidate words are obtained after the ngram-score is ranked from high to low, and then fine ranking is performed;
in the fine line, the score of the sentence is calculated according to the confusion degree (ppl), and then the degree of the sentence after the replacement of the correction recommended word is evaluated: the smaller the ppl value, the more smooth the sentence, the better the correction suggestion word,
the confusion degree calculation formula is as follows:
wherein S is the current sentence, N is the sentence length, p (wi) is the probability of the ith word, p (wi|w1w 2 … … wi-1) represents the probability of the ith word calculated based on the previous i-1 word,
and sorting the ppl values of the candidate words from low to high, setting different thresholds according to sentence lengths, traversing the sorting result, adding the current word into the error correction suggestion set if the current value is smaller than the threshold, and finally taking the words at the lower 1 position of the error correction suggestion word set as final error correction suggestion words.
Further, the error correction filtering module includes the following steps:
1) Calculating ngram score and confusion degree by using the depth language model and the statistical language model respectively;
2) Averaging the ngram scores and the confusion degrees obtained by the two models to obtain a mean ngram score and a mean confusion degree;
3) Taking the maximum value of the ngram score;
4) Taking the minimum value of the confusion degree;
5) Multiplying the length of the word by the minimum confusion, and calculating the confusion difference of the two models;
6) Calculating a score of the word: mean confusion degree the length of the word- (mean ngram score) confusion degree difference/maximum ngram score).
Further, the model fusion module selects the word with the minimum score value calculated last in the error correction module as output.
The invention has the following beneficial effects:
1. the Chinese spelling error correction method integrating the local semantic features and the global semantic features improves the correctness of words and reduces the manual verification cost.
2. According to the Chinese spelling error correction method integrating the local semantic features and the global semantic features, the error correction suggestions are obtained through the pipeline type error correction model and the end-to-end error correction model, and the error correction range is wider.
3. According to the Chinese spelling error correction method integrating the local semantic features and the global semantic features, the error correction proposal is obtained through the error correction filtering module, the error correction is more accurate, the error correction accuracy reaches 94.53%, the recall rate is 87.22%, and the error correction rate is 2.97%. Accuracy = number of predicted errors/total number of samples tested, recall = number of predicted errors/number of samples with errors in samples with errors, error correction = number of errors predicted in samples without errors/total number of samples without errors.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a system provided in an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
In the description of the present invention, it should be understood that the directions or positional relationships indicated by the terms "upper", "lower", "top", "bottom", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
The invention discloses a Chinese spelling error correction method integrating local semantic features and global semantic features, which refers to FIG. 1 and comprises the following steps:
for a document, a sentence collection is obtained through a sentence dividing module;
for each sentence, obtaining error correction suggestions through a pipeline type error correction model and an end-to-end error correction model;
in order to prevent correct word error correction, obtaining error correction suggestions through an error correction filtering module;
and finally, outputting the end-to-end error correction model and the pipeline error correction model to obtain the final output correct sentence and correct document through the model fusion module.
The end-to-end error correction model acquires semantic vector representation of each character in a sentence by using a feature encoder based on a transducer architecture, transmits the semantic vector representation into a feedforward neural network prediction vocabulary, introduces constraint rules based on voice similarity and shape similarity at an output end, selects the first k candidate characters with the highest probability in the prediction vocabulary according to each position in an original sentence by using the feedforward neural network,
traversing candidate characters in turn according to the probability size:
if the original character is a punctuation mark, maintaining the original character;
if the predicted character is the original character or is not in the vocabulary, maintaining the original character;
if the pronunciation and the font similarity of the predicted character and the original character are within the threshold value, the predicted character is used as an answer, otherwise, the method traverses backwards; if the traversal is finished, no character satisfying the condition is found, and the original character is maintained.
The pipeline type error correction model adopts a three-stage error correction method of error detection, candidate recall and candidate sorting.
The error detection adopts a method based on a global semantic model and a global semantic model,
the method based on the local semantic model comprises the following steps:
s1, mining a bi-gram dictionary through a large-scale field corpus, counting word frequencies, obtaining a bi-gram word sequence after a sentence to be predicted is segmented, and if the bi-gram word is not in the dictionary and the word frequency is smaller than a set threshold value, considering that the word is possibly wrong, and adding candidate wrong words;
s2, training a 5-element ngram language model through a large-scale field corpus, wherein for a given Chinese character string, if any error exists in a sentence, the error word can appear in continuous single words, the error word can appear after Chinese word segmentation, and the probability of one character in the bi-gram model only depends on the word immediately before, wherein the probability of the character string is approximated by the product of the following series of conditional probabilities:
the probability of each term in the above equation may be calculated from the maximum likelihood estimates:
n (c) where it appears l-1 ,c l ) Secondary N (c) l-1 ) In numbers, and representing strings in a given corpus, respectively
If the probability of bi-gram word is greater than the set threshold, then the word is added to the candidate wrong word.
The candidate recalls include pronunciation-like recalls and glyph-like recalls,
the pronunciation similarity recall is divided into word recall and word recall for each error candidate word by constructing a similar pinyin dictionary from a bi-gram word library obtained by large-scale corpus mining:
the word recall finds all the sounds of the word in a recall library of word sounds, and then finds all the words of the sounds according to the sounds to add a candidate set;
the word recall obtains all the pinyin of each word for each word, combines all the word sounds to obtain all the word sounds, finds all the words of each pinyin in the pinyin library according to each word sound, adds the candidate set,
and recalling the similar word by each word in the word according to a similar word library, combining all the words to obtain the similar word, calculating the similarity of each similar word and the wrong candidate word, and adding the similar word into the candidate set if the similarity value is larger than a set threshold value.
The candidate ranking obtains correction suggested words through coarse ranking and fine ranking for recall candidate sets of wrong words,
in the coarse row, an ngram-score for each word is calculated using an ngram language model according to the following formula
The first k candidate words are obtained after the ngram-score is ranked from high to low, and then fine ranking is performed;
in the fine line, the score of the sentence is calculated according to the confusion degree (ppl), and then the degree of the sentence after the replacement of the correction recommended word is evaluated: the smaller the ppl value, the more smooth the sentence, the better the correction suggestion word,
the confusion degree calculation formula is as follows:
wherein S is the current sentence, N is the sentence length, p (wi) is the probability of the ith word, p (wi|w1w 2 … … wi-1) represents the probability of the ith word calculated based on the previous i-1 word,
and sorting the ppl values of the candidate words from low to high, setting different thresholds according to sentence lengths, traversing the sorting result, adding the current word into the error correction suggestion set if the current value is smaller than the threshold, and finally taking the words at the lower 1 position of the error correction suggestion word set as final error correction suggestion words.
The error correction filtering module comprises the following steps:
1) Calculating ngram score and confusion degree by using the depth language model and the statistical language model respectively;
2) Averaging the ngram scores and the confusion degrees obtained by the two models to obtain a mean ngram score and a mean confusion degree;
3) Taking the maximum value of the ngram score;
4) Taking the minimum value of the confusion degree;
5) Multiplying the length of the word by the minimum confusion, and calculating the confusion difference of the two models;
6) Calculating a score of the word: mean confusion degree the length of the word- (mean ngram score) confusion degree difference/maximum ngram score).
The model fusion module selects the word with the minimum score value calculated last in the error correction module as output.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Further, one skilled in the art may combine and combine the different embodiments or examples described in this specification.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications and alternatives to the above embodiments may be made by those skilled in the art within the scope of the invention.
Claims (3)
1. A Chinese spelling error correction method integrating local semantic features and global semantic features is characterized by comprising the following steps:
for a document, a sentence collection is obtained through a sentence dividing module;
for each sentence, obtaining error correction suggestions through a pipeline type error correction model and an end-to-end error correction model;
in order to prevent correct word error correction, obtaining error correction suggestions through an error correction filtering module;
and finally outputting the end-to-end error correction model and the pipeline error correction model to obtain the final output correct sentence and correct document through the model fusion module,
the end-to-end error correction model acquires semantic vector representation of each character in a sentence by using a feature encoder based on a transducer architecture, transmits the semantic vector representation into a feedforward neural network prediction vocabulary, introduces constraint rules based on voice similarity and shape similarity at an output end, selects the first k candidate characters with the highest probability in the prediction vocabulary according to each position in an original sentence by using the feedforward neural network,
traversing candidate characters in turn according to the probability size:
if the original character is a punctuation mark, maintaining the original character;
if the predicted character is the original character or is not in the vocabulary, maintaining the original character;
if the pronunciation and the font similarity of the predicted character and the original character are within the threshold value, the predicted character is used as an answer, otherwise, the method traverses backwards; if the traversal is over, no character is found that satisfies the condition, the original character is maintained,
the pipeline type error correction model adopts a three-stage error correction method of error detection, candidate recall and candidate sorting,
the error detection adopts a method based on a global semantic model and a global semantic model,
the method based on the local semantic model comprises the following steps:
s1, mining a bi-gram dictionary through a large-scale field corpus, counting word frequencies, obtaining a bi-gram word sequence after a sentence to be predicted is segmented, and if the bi-gram word is not in the dictionary and the word frequency is smaller than a set threshold value, considering that the word is possibly wrong, and adding candidate wrong words;
s2, training a 5-element ngram language model through a large-scale field corpus, wherein for a given Chinese character string, if any error exists in a sentence, the error word can appear in continuous single words, the error word can appear after Chinese word segmentation, and the probability of one character in the bi-gram model only depends on the word immediately before, wherein the probability of the character string is approximated by the product of the following series of conditional probabilities:
the probability of each term in the above equation may be calculated from the maximum likelihood estimates:
wherein N (c) l-1 ,c l ) And N (c) l-1 ) Respectively representing the number of occurrences of a string in a given corpus,
if the probability of bi-gram words is greater than the set threshold, then the word is added to the candidate erroneous word,
the candidate recalls include pronunciation-like recalls and glyph-like recalls,
the pronunciation similarity recall is divided into word recall and word recall for each error candidate word by constructing a similar pinyin dictionary from a bi-gram word library obtained by large-scale corpus mining:
the word recall finds all the sounds of the word in a recall library of word sounds, and then finds all the words of the sounds according to the sounds to add a candidate set;
the word recall obtains all the pinyin of each word for each word, combines all the word sounds to obtain all the word sounds, finds all the words of each pinyin in the pinyin library according to each word sound, adds the candidate set,
the word pattern similarity recall recalls the similar word patterns according to a similar word pattern library for each error candidate word, all word patterns are combined to obtain similar words, the similarity is calculated for each similar word and the error candidate word, if the similarity value is larger than a set threshold value, the similar words are added into a candidate set,
the candidate ranking obtains correction suggested words through coarse ranking and fine ranking for recall candidate sets of wrong words,
in the thick line, the ngram-score of each word is calculated according to the following formula using the ngram language model,
the first k candidate words are obtained after the ngram-score is ranked from high to low, and then fine ranking is performed;
in the fine line, sentence score is calculated according to confusion (ppl), and then the degree of the sentence after correction of the recommended word replacement is evaluated: the smaller the ppl value, the more smooth the sentence, the better the correction of the suggested word, the confusion degree calculation formula is as follows:
wherein S is the current sentence, N is the sentence length, p (w i ) For the probability of the ith word, p (w i |w 1 w 2 …w i-1 ) Representing the probability of computing the ith word based on the first i-1 words,
and sorting the ppl values of the candidate words from low to high, setting different thresholds according to sentence lengths, traversing the sorting result, adding the current word into the error correction suggestion set if the current value is smaller than the threshold, and finally taking the words at the lower 1 position of the error correction suggestion word set as final error correction suggestion words.
2. The method for chinese spelling error correction for fusing local and global semantic features of claim 1 wherein the error correction filtering module comprises the steps of:
1) Calculating ngram score and confusion degree by using the depth language model and the statistical language model respectively;
2) Averaging the ngram scores and the confusion degrees obtained by the two models to obtain a mean ngram score and a mean confusion degree;
3) Taking the maximum value of the ngram score;
4) Taking the minimum value of the confusion degree;
5) Multiplying the length of the word by the minimum confusion, and calculating the confusion difference of the two models;
6) Calculating a score of the word: mean confusion degree the length of the word- (mean ngram score) confusion degree difference/maximum ngram score).
3. The method for correcting Chinese spelling errors by combining local semantic features and global semantic features according to claim 2, wherein the model combining module selects the word with the smallest score value calculated last in the error correcting module as output.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211740208.8A CN116090441B (en) | 2022-12-30 | 2022-12-30 | Chinese spelling error correction method integrating local semantic features and global semantic features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211740208.8A CN116090441B (en) | 2022-12-30 | 2022-12-30 | Chinese spelling error correction method integrating local semantic features and global semantic features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116090441A CN116090441A (en) | 2023-05-09 |
CN116090441B true CN116090441B (en) | 2023-10-20 |
Family
ID=86186355
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211740208.8A Active CN116090441B (en) | 2022-12-30 | 2022-12-30 | Chinese spelling error correction method integrating local semantic features and global semantic features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116090441B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116306600B (en) * | 2023-05-25 | 2023-08-11 | 山东齐鲁壹点传媒有限公司 | MacBert-based Chinese text error correction method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729316A (en) * | 2017-10-12 | 2018-02-23 | 福建富士通信息软件有限公司 | The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese |
CN110852087A (en) * | 2019-09-23 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Chinese error correction method and device, storage medium and electronic device |
CN111090986A (en) * | 2019-11-29 | 2020-05-01 | 福建亿榕信息技术有限公司 | Method for correcting errors of official document |
CN112149406A (en) * | 2020-09-25 | 2020-12-29 | 中国电子科技集团公司第十五研究所 | Chinese text error correction method and system |
CN113221542A (en) * | 2021-03-31 | 2021-08-06 | 国家计算机网络与信息安全管理中心 | Chinese text automatic proofreading method based on multi-granularity fusion and Bert screening |
CN113435186A (en) * | 2021-06-18 | 2021-09-24 | 上海熙瑾信息技术有限公司 | Chinese text error correction system, method, device and computer readable storage medium |
CN114444479A (en) * | 2022-04-11 | 2022-05-06 | 南京云问网络技术有限公司 | End-to-end Chinese speech text error correction method, device and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220309360A1 (en) * | 2021-03-25 | 2022-09-29 | Oracle International Corporation | Efficient and accurate regional explanation technique for nlp models |
-
2022
- 2022-12-30 CN CN202211740208.8A patent/CN116090441B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729316A (en) * | 2017-10-12 | 2018-02-23 | 福建富士通信息软件有限公司 | The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese |
CN110852087A (en) * | 2019-09-23 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Chinese error correction method and device, storage medium and electronic device |
CN111090986A (en) * | 2019-11-29 | 2020-05-01 | 福建亿榕信息技术有限公司 | Method for correcting errors of official document |
CN112149406A (en) * | 2020-09-25 | 2020-12-29 | 中国电子科技集团公司第十五研究所 | Chinese text error correction method and system |
CN113221542A (en) * | 2021-03-31 | 2021-08-06 | 国家计算机网络与信息安全管理中心 | Chinese text automatic proofreading method based on multi-granularity fusion and Bert screening |
CN113435186A (en) * | 2021-06-18 | 2021-09-24 | 上海熙瑾信息技术有限公司 | Chinese text error correction system, method, device and computer readable storage medium |
CN114444479A (en) * | 2022-04-11 | 2022-05-06 | 南京云问网络技术有限公司 | End-to-end Chinese speech text error correction method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116090441A (en) | 2023-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110489760B (en) | Text automatic correction method and device based on deep neural network | |
CN111369996B (en) | Speech recognition text error correction method in specific field | |
CN112149406B (en) | Chinese text error correction method and system | |
CN102968989B (en) | Improvement method of Ngram model for voice recognition | |
Derouault et al. | Natural language modeling for phoneme-to-text transcription | |
JP4833476B2 (en) | Language input architecture that converts one text format to the other text format with modeless input | |
CN108959250A (en) | A kind of error correction method and its system based on language model and word feature | |
CN105957518A (en) | Mongolian large vocabulary continuous speech recognition method | |
CN110276069B (en) | Method, system and storage medium for automatically detecting Chinese braille error | |
CN116090441B (en) | Chinese spelling error correction method integrating local semantic features and global semantic features | |
CN109948144B (en) | Teacher utterance intelligent processing method based on classroom teaching situation | |
CN116306600B (en) | MacBert-based Chinese text error correction method | |
CN111985234B (en) | Voice text error correction method | |
CN112380841B (en) | Chinese spelling error correction method and device, computer equipment and storage medium | |
KR20230009564A (en) | Learning data correction method and apparatus thereof using ensemble score | |
CN115034218A (en) | Chinese grammar error diagnosis method based on multi-stage training and editing level voting | |
Roy et al. | Unsupervised context-sensitive bangla spelling correction with character n-gram | |
CN112149388B (en) | Method for recognizing vocabulary deformation in password and generating guessing rule | |
Göker et al. | Neural text normalization for turkish social media | |
CN111274826A (en) | Semantic information fusion-based low-frequency word translation method | |
CN112597771A (en) | Chinese text error correction method based on prefix tree combination | |
Namboodiri et al. | On using classical poetry structure for Indian language post-processing | |
CN111428475A (en) | Word segmentation word bank construction method, word segmentation method, device and storage medium | |
Sertsi et al. | Hybrid input-type recurrent neural network language modeling for end-to-end speech recognition | |
CN115688904B (en) | Translation model construction method based on noun translation prompt |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |