CN116090441B - Chinese spelling error correction method integrating local semantic features and global semantic features - Google Patents

Chinese spelling error correction method integrating local semantic features and global semantic features Download PDF

Info

Publication number
CN116090441B
CN116090441B CN202211740208.8A CN202211740208A CN116090441B CN 116090441 B CN116090441 B CN 116090441B CN 202211740208 A CN202211740208 A CN 202211740208A CN 116090441 B CN116090441 B CN 116090441B
Authority
CN
China
Prior art keywords
word
error correction
sentence
candidate
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211740208.8A
Other languages
Chinese (zh)
Other versions
CN116090441A (en
Inventor
夏振涛
李艳
朱立烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yozosoft Co ltd
Original Assignee
Yozosoft Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yozosoft Co ltd filed Critical Yozosoft Co ltd
Priority to CN202211740208.8A priority Critical patent/CN116090441B/en
Publication of CN116090441A publication Critical patent/CN116090441A/en
Application granted granted Critical
Publication of CN116090441B publication Critical patent/CN116090441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a Chinese spelling error correction method integrating local semantic features and global semantic features, which comprises the following steps: for a document, a sentence collection is obtained through a sentence dividing module; for each sentence, obtaining error correction suggestions through a pipeline type error correction model and an end-to-end error correction model; in order to prevent correct word error correction, obtaining error correction suggestions through an error correction filtering module; and finally, outputting the end-to-end error correction model and the pipeline error correction model to obtain the final output correct sentence and correct document through the model fusion module. The invention has the advantages of wide error correction range, high error correction accuracy and the like.

Description

Chinese spelling error correction method integrating local semantic features and global semantic features
Technical Field
The invention relates to the field of Internet, in particular to a Chinese spelling error correction method integrating local semantic features and global semantic features.
Background
Chinese spelling correction is an important technology for automatic sentence checking and automatic correction in text correction, and aims to improve word correctness and reduce manual verification cost. In government affairs, media, law, education and other industries, manuscript writing occupies an important position, and the traditional manual correction workload is huge, so that the intelligent and accurate error correction system has wide application prospect. In the face of the complex and various characteristics of Chinese text semantics, a set of Chinese spelling error correction system which fuses local semantic features and global semantic features is designed.
Therefore, it is necessary to provide a new technical solution.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention discloses a Chinese spelling error correction method integrating local semantic features and global semantic features, which comprises the following specific technical scheme:
the invention provides a Chinese spelling error correction method integrating local semantic features and global semantic features, which comprises the following steps:
for a document, a sentence collection is obtained through a sentence dividing module;
for each sentence, obtaining error correction suggestions through a pipeline type error correction model and an end-to-end error correction model;
in order to prevent correct word error correction, obtaining error correction suggestions through an error correction filtering module;
and finally, outputting the end-to-end error correction model and the pipeline error correction model to obtain the final output correct sentence and correct document through the model fusion module.
Further, the end-to-end error correction model acquires semantic vector representation of each character in the sentence by using a feature encoder based on a transducer architecture, transmits the semantic vector representation into a feedforward neural network prediction vocabulary, introduces constraint rules based on voice similarity and shape similarity at an output end, and for each position in the original sentence, the feedforward neural network picks out the candidate characters with the highest probability of the first k corresponding to the prediction vocabulary,
traversing candidate characters in turn according to the probability size:
if the original character is a punctuation mark, maintaining the original character;
if the predicted character is the original character or is not in the vocabulary, maintaining the original character;
if the pronunciation and the font similarity of the predicted character and the original character are within the threshold value, the predicted character is used as an answer, otherwise, the method traverses backwards; if the traversal is finished, no character satisfying the condition is found, and the original character is maintained.
Furthermore, the pipeline type error correction model adopts a three-stage error correction method of error detection, candidate recall and candidate sorting.
Further, the error detection adopts a method based on a global semantic model and a global semantic model,
the method based on the local semantic model comprises the following steps:
s1, mining a bi-gram dictionary through a large-scale field corpus, counting word frequencies, obtaining a bi-gram word sequence after a sentence to be predicted is segmented, and if the bi-gram word is not in the dictionary and the word frequency is smaller than a set threshold value, considering that the word is possibly wrong, and adding candidate wrong words;
s2, training a 5-element ngram language model through a large-scale field corpus, wherein for a given Chinese character string, if any error exists in a sentence, the error word can appear in continuous single words, the error word can appear after Chinese word segmentation, and the probability of one character in the bi-gram model only depends on the word immediately before, wherein the probability of the character string is approximated by the product of the following series of conditional probabilities:
the probability of each term in the above equation may be calculated from the maximum likelihood estimates:
its N (c) l-1 ,c l )N(c l-1 ) And respectively represent the number of occurrences of the character string in the given corpus,
if the probability of bi-gram word is greater than the set threshold, then the word is added to the candidate wrong word.
Further, the candidate recalls include pronunciation-like recalls and font-like recalls,
the pronunciation similarity recall is divided into word recall and word recall for each error candidate word by constructing a similar pinyin dictionary from a bi-gram word library obtained by large-scale corpus mining:
the word recall finds all the sounds of the word in a recall library of word sounds, and then finds all the words of the sounds according to the sounds to add a candidate set;
the word recall obtains all the pinyin of each word for each word, combines all the word sounds to obtain all the word sounds, finds all the words of each pinyin in the pinyin library according to each word sound, adds the candidate set,
and recalling the similar word by each word in the word according to a similar word library, combining all the words to obtain the similar word, calculating the similarity of each similar word and the wrong candidate word, and adding the similar word into the candidate set if the similarity value is larger than a set threshold value.
Further, the candidate ranking obtains correction suggested words through coarse ranking and fine ranking for recall candidate sets of wrong words,
in the thick line, each word is calculated according to the following formula using the ngram language model
The first k candidate words are obtained after the ngram-score is ranked from high to low, and then fine ranking is performed;
in the fine line, the score of the sentence is calculated according to the confusion degree (ppl), and then the degree of the sentence after the replacement of the correction recommended word is evaluated: the smaller the ppl value, the more smooth the sentence, the better the correction suggestion word,
the confusion degree calculation formula is as follows:
wherein S is the current sentence, N is the sentence length, p (wi) is the probability of the ith word, p (wi|w1w 2 … … wi-1) represents the probability of the ith word calculated based on the previous i-1 word,
and sorting the ppl values of the candidate words from low to high, setting different thresholds according to sentence lengths, traversing the sorting result, adding the current word into the error correction suggestion set if the current value is smaller than the threshold, and finally taking the words at the lower 1 position of the error correction suggestion word set as final error correction suggestion words.
Further, the error correction filtering module includes the following steps:
1) Calculating ngram score and confusion degree by using the depth language model and the statistical language model respectively;
2) Averaging the ngram scores and the confusion degrees obtained by the two models to obtain a mean ngram score and a mean confusion degree;
3) Taking the maximum value of the ngram score;
4) Taking the minimum value of the confusion degree;
5) Multiplying the length of the word by the minimum confusion, and calculating the confusion difference of the two models;
6) Calculating a score of the word: mean confusion degree the length of the word- (mean ngram score) confusion degree difference/maximum ngram score).
Further, the model fusion module selects the word with the minimum score value calculated last in the error correction module as output.
The invention has the following beneficial effects:
1. the Chinese spelling error correction method integrating the local semantic features and the global semantic features improves the correctness of words and reduces the manual verification cost.
2. According to the Chinese spelling error correction method integrating the local semantic features and the global semantic features, the error correction suggestions are obtained through the pipeline type error correction model and the end-to-end error correction model, and the error correction range is wider.
3. According to the Chinese spelling error correction method integrating the local semantic features and the global semantic features, the error correction proposal is obtained through the error correction filtering module, the error correction is more accurate, the error correction accuracy reaches 94.53%, the recall rate is 87.22%, and the error correction rate is 2.97%. Accuracy = number of predicted errors/total number of samples tested, recall = number of predicted errors/number of samples with errors in samples with errors, error correction = number of errors predicted in samples without errors/total number of samples without errors.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained from these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a system provided in an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
In the description of the present invention, it should be understood that the directions or positional relationships indicated by the terms "upper", "lower", "top", "bottom", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
The invention discloses a Chinese spelling error correction method integrating local semantic features and global semantic features, which refers to FIG. 1 and comprises the following steps:
for a document, a sentence collection is obtained through a sentence dividing module;
for each sentence, obtaining error correction suggestions through a pipeline type error correction model and an end-to-end error correction model;
in order to prevent correct word error correction, obtaining error correction suggestions through an error correction filtering module;
and finally, outputting the end-to-end error correction model and the pipeline error correction model to obtain the final output correct sentence and correct document through the model fusion module.
The end-to-end error correction model acquires semantic vector representation of each character in a sentence by using a feature encoder based on a transducer architecture, transmits the semantic vector representation into a feedforward neural network prediction vocabulary, introduces constraint rules based on voice similarity and shape similarity at an output end, selects the first k candidate characters with the highest probability in the prediction vocabulary according to each position in an original sentence by using the feedforward neural network,
traversing candidate characters in turn according to the probability size:
if the original character is a punctuation mark, maintaining the original character;
if the predicted character is the original character or is not in the vocabulary, maintaining the original character;
if the pronunciation and the font similarity of the predicted character and the original character are within the threshold value, the predicted character is used as an answer, otherwise, the method traverses backwards; if the traversal is finished, no character satisfying the condition is found, and the original character is maintained.
The pipeline type error correction model adopts a three-stage error correction method of error detection, candidate recall and candidate sorting.
The error detection adopts a method based on a global semantic model and a global semantic model,
the method based on the local semantic model comprises the following steps:
s1, mining a bi-gram dictionary through a large-scale field corpus, counting word frequencies, obtaining a bi-gram word sequence after a sentence to be predicted is segmented, and if the bi-gram word is not in the dictionary and the word frequency is smaller than a set threshold value, considering that the word is possibly wrong, and adding candidate wrong words;
s2, training a 5-element ngram language model through a large-scale field corpus, wherein for a given Chinese character string, if any error exists in a sentence, the error word can appear in continuous single words, the error word can appear after Chinese word segmentation, and the probability of one character in the bi-gram model only depends on the word immediately before, wherein the probability of the character string is approximated by the product of the following series of conditional probabilities:
the probability of each term in the above equation may be calculated from the maximum likelihood estimates:
n (c) where it appears l-1 ,c l ) Secondary N (c) l-1 ) In numbers, and representing strings in a given corpus, respectively
If the probability of bi-gram word is greater than the set threshold, then the word is added to the candidate wrong word.
The candidate recalls include pronunciation-like recalls and glyph-like recalls,
the pronunciation similarity recall is divided into word recall and word recall for each error candidate word by constructing a similar pinyin dictionary from a bi-gram word library obtained by large-scale corpus mining:
the word recall finds all the sounds of the word in a recall library of word sounds, and then finds all the words of the sounds according to the sounds to add a candidate set;
the word recall obtains all the pinyin of each word for each word, combines all the word sounds to obtain all the word sounds, finds all the words of each pinyin in the pinyin library according to each word sound, adds the candidate set,
and recalling the similar word by each word in the word according to a similar word library, combining all the words to obtain the similar word, calculating the similarity of each similar word and the wrong candidate word, and adding the similar word into the candidate set if the similarity value is larger than a set threshold value.
The candidate ranking obtains correction suggested words through coarse ranking and fine ranking for recall candidate sets of wrong words,
in the coarse row, an ngram-score for each word is calculated using an ngram language model according to the following formula
The first k candidate words are obtained after the ngram-score is ranked from high to low, and then fine ranking is performed;
in the fine line, the score of the sentence is calculated according to the confusion degree (ppl), and then the degree of the sentence after the replacement of the correction recommended word is evaluated: the smaller the ppl value, the more smooth the sentence, the better the correction suggestion word,
the confusion degree calculation formula is as follows:
wherein S is the current sentence, N is the sentence length, p (wi) is the probability of the ith word, p (wi|w1w 2 … … wi-1) represents the probability of the ith word calculated based on the previous i-1 word,
and sorting the ppl values of the candidate words from low to high, setting different thresholds according to sentence lengths, traversing the sorting result, adding the current word into the error correction suggestion set if the current value is smaller than the threshold, and finally taking the words at the lower 1 position of the error correction suggestion word set as final error correction suggestion words.
The error correction filtering module comprises the following steps:
1) Calculating ngram score and confusion degree by using the depth language model and the statistical language model respectively;
2) Averaging the ngram scores and the confusion degrees obtained by the two models to obtain a mean ngram score and a mean confusion degree;
3) Taking the maximum value of the ngram score;
4) Taking the minimum value of the confusion degree;
5) Multiplying the length of the word by the minimum confusion, and calculating the confusion difference of the two models;
6) Calculating a score of the word: mean confusion degree the length of the word- (mean ngram score) confusion degree difference/maximum ngram score).
The model fusion module selects the word with the minimum score value calculated last in the error correction module as output.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Further, one skilled in the art may combine and combine the different embodiments or examples described in this specification.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications and alternatives to the above embodiments may be made by those skilled in the art within the scope of the invention.

Claims (3)

1. A Chinese spelling error correction method integrating local semantic features and global semantic features is characterized by comprising the following steps:
for a document, a sentence collection is obtained through a sentence dividing module;
for each sentence, obtaining error correction suggestions through a pipeline type error correction model and an end-to-end error correction model;
in order to prevent correct word error correction, obtaining error correction suggestions through an error correction filtering module;
and finally outputting the end-to-end error correction model and the pipeline error correction model to obtain the final output correct sentence and correct document through the model fusion module,
the end-to-end error correction model acquires semantic vector representation of each character in a sentence by using a feature encoder based on a transducer architecture, transmits the semantic vector representation into a feedforward neural network prediction vocabulary, introduces constraint rules based on voice similarity and shape similarity at an output end, selects the first k candidate characters with the highest probability in the prediction vocabulary according to each position in an original sentence by using the feedforward neural network,
traversing candidate characters in turn according to the probability size:
if the original character is a punctuation mark, maintaining the original character;
if the predicted character is the original character or is not in the vocabulary, maintaining the original character;
if the pronunciation and the font similarity of the predicted character and the original character are within the threshold value, the predicted character is used as an answer, otherwise, the method traverses backwards; if the traversal is over, no character is found that satisfies the condition, the original character is maintained,
the pipeline type error correction model adopts a three-stage error correction method of error detection, candidate recall and candidate sorting,
the error detection adopts a method based on a global semantic model and a global semantic model,
the method based on the local semantic model comprises the following steps:
s1, mining a bi-gram dictionary through a large-scale field corpus, counting word frequencies, obtaining a bi-gram word sequence after a sentence to be predicted is segmented, and if the bi-gram word is not in the dictionary and the word frequency is smaller than a set threshold value, considering that the word is possibly wrong, and adding candidate wrong words;
s2, training a 5-element ngram language model through a large-scale field corpus, wherein for a given Chinese character string, if any error exists in a sentence, the error word can appear in continuous single words, the error word can appear after Chinese word segmentation, and the probability of one character in the bi-gram model only depends on the word immediately before, wherein the probability of the character string is approximated by the product of the following series of conditional probabilities:
the probability of each term in the above equation may be calculated from the maximum likelihood estimates:
wherein N (c) l-1 ,c l ) And N (c) l-1 ) Respectively representing the number of occurrences of a string in a given corpus,
if the probability of bi-gram words is greater than the set threshold, then the word is added to the candidate erroneous word,
the candidate recalls include pronunciation-like recalls and glyph-like recalls,
the pronunciation similarity recall is divided into word recall and word recall for each error candidate word by constructing a similar pinyin dictionary from a bi-gram word library obtained by large-scale corpus mining:
the word recall finds all the sounds of the word in a recall library of word sounds, and then finds all the words of the sounds according to the sounds to add a candidate set;
the word recall obtains all the pinyin of each word for each word, combines all the word sounds to obtain all the word sounds, finds all the words of each pinyin in the pinyin library according to each word sound, adds the candidate set,
the word pattern similarity recall recalls the similar word patterns according to a similar word pattern library for each error candidate word, all word patterns are combined to obtain similar words, the similarity is calculated for each similar word and the error candidate word, if the similarity value is larger than a set threshold value, the similar words are added into a candidate set,
the candidate ranking obtains correction suggested words through coarse ranking and fine ranking for recall candidate sets of wrong words,
in the thick line, the ngram-score of each word is calculated according to the following formula using the ngram language model,
the first k candidate words are obtained after the ngram-score is ranked from high to low, and then fine ranking is performed;
in the fine line, sentence score is calculated according to confusion (ppl), and then the degree of the sentence after correction of the recommended word replacement is evaluated: the smaller the ppl value, the more smooth the sentence, the better the correction of the suggested word, the confusion degree calculation formula is as follows:
wherein S is the current sentence, N is the sentence length, p (w i ) For the probability of the ith word, p (w i |w 1 w 2 …w i-1 ) Representing the probability of computing the ith word based on the first i-1 words,
and sorting the ppl values of the candidate words from low to high, setting different thresholds according to sentence lengths, traversing the sorting result, adding the current word into the error correction suggestion set if the current value is smaller than the threshold, and finally taking the words at the lower 1 position of the error correction suggestion word set as final error correction suggestion words.
2. The method for chinese spelling error correction for fusing local and global semantic features of claim 1 wherein the error correction filtering module comprises the steps of:
1) Calculating ngram score and confusion degree by using the depth language model and the statistical language model respectively;
2) Averaging the ngram scores and the confusion degrees obtained by the two models to obtain a mean ngram score and a mean confusion degree;
3) Taking the maximum value of the ngram score;
4) Taking the minimum value of the confusion degree;
5) Multiplying the length of the word by the minimum confusion, and calculating the confusion difference of the two models;
6) Calculating a score of the word: mean confusion degree the length of the word- (mean ngram score) confusion degree difference/maximum ngram score).
3. The method for correcting Chinese spelling errors by combining local semantic features and global semantic features according to claim 2, wherein the model combining module selects the word with the smallest score value calculated last in the error correcting module as output.
CN202211740208.8A 2022-12-30 2022-12-30 Chinese spelling error correction method integrating local semantic features and global semantic features Active CN116090441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211740208.8A CN116090441B (en) 2022-12-30 2022-12-30 Chinese spelling error correction method integrating local semantic features and global semantic features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211740208.8A CN116090441B (en) 2022-12-30 2022-12-30 Chinese spelling error correction method integrating local semantic features and global semantic features

Publications (2)

Publication Number Publication Date
CN116090441A CN116090441A (en) 2023-05-09
CN116090441B true CN116090441B (en) 2023-10-20

Family

ID=86186355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211740208.8A Active CN116090441B (en) 2022-12-30 2022-12-30 Chinese spelling error correction method integrating local semantic features and global semantic features

Country Status (1)

Country Link
CN (1) CN116090441B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306600B (en) * 2023-05-25 2023-08-11 山东齐鲁壹点传媒有限公司 MacBert-based Chinese text error correction method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729316A (en) * 2017-10-12 2018-02-23 福建富士通信息软件有限公司 The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese
CN110852087A (en) * 2019-09-23 2020-02-28 腾讯科技(深圳)有限公司 Chinese error correction method and device, storage medium and electronic device
CN111090986A (en) * 2019-11-29 2020-05-01 福建亿榕信息技术有限公司 Method for correcting errors of official document
CN112149406A (en) * 2020-09-25 2020-12-29 中国电子科技集团公司第十五研究所 Chinese text error correction method and system
CN113221542A (en) * 2021-03-31 2021-08-06 国家计算机网络与信息安全管理中心 Chinese text automatic proofreading method based on multi-granularity fusion and Bert screening
CN113435186A (en) * 2021-06-18 2021-09-24 上海熙瑾信息技术有限公司 Chinese text error correction system, method, device and computer readable storage medium
CN114444479A (en) * 2022-04-11 2022-05-06 南京云问网络技术有限公司 End-to-end Chinese speech text error correction method, device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220309360A1 (en) * 2021-03-25 2022-09-29 Oracle International Corporation Efficient and accurate regional explanation technique for nlp models

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729316A (en) * 2017-10-12 2018-02-23 福建富士通信息软件有限公司 The identification of wrong word and the method and device of error correction in the interactive question and answer text of Chinese
CN110852087A (en) * 2019-09-23 2020-02-28 腾讯科技(深圳)有限公司 Chinese error correction method and device, storage medium and electronic device
CN111090986A (en) * 2019-11-29 2020-05-01 福建亿榕信息技术有限公司 Method for correcting errors of official document
CN112149406A (en) * 2020-09-25 2020-12-29 中国电子科技集团公司第十五研究所 Chinese text error correction method and system
CN113221542A (en) * 2021-03-31 2021-08-06 国家计算机网络与信息安全管理中心 Chinese text automatic proofreading method based on multi-granularity fusion and Bert screening
CN113435186A (en) * 2021-06-18 2021-09-24 上海熙瑾信息技术有限公司 Chinese text error correction system, method, device and computer readable storage medium
CN114444479A (en) * 2022-04-11 2022-05-06 南京云问网络技术有限公司 End-to-end Chinese speech text error correction method, device and storage medium

Also Published As

Publication number Publication date
CN116090441A (en) 2023-05-09

Similar Documents

Publication Publication Date Title
CN110489760B (en) Text automatic correction method and device based on deep neural network
CN111369996B (en) Speech recognition text error correction method in specific field
CN112149406B (en) Chinese text error correction method and system
CN102968989B (en) Improvement method of Ngram model for voice recognition
Derouault et al. Natural language modeling for phoneme-to-text transcription
JP4833476B2 (en) Language input architecture that converts one text format to the other text format with modeless input
CN108959250A (en) A kind of error correction method and its system based on language model and word feature
CN105957518A (en) Mongolian large vocabulary continuous speech recognition method
CN110276069B (en) Method, system and storage medium for automatically detecting Chinese braille error
CN116090441B (en) Chinese spelling error correction method integrating local semantic features and global semantic features
CN109948144B (en) Teacher utterance intelligent processing method based on classroom teaching situation
CN116306600B (en) MacBert-based Chinese text error correction method
CN111985234B (en) Voice text error correction method
CN112380841B (en) Chinese spelling error correction method and device, computer equipment and storage medium
KR20230009564A (en) Learning data correction method and apparatus thereof using ensemble score
CN115034218A (en) Chinese grammar error diagnosis method based on multi-stage training and editing level voting
Roy et al. Unsupervised context-sensitive bangla spelling correction with character n-gram
CN112149388B (en) Method for recognizing vocabulary deformation in password and generating guessing rule
Göker et al. Neural text normalization for turkish social media
CN111274826A (en) Semantic information fusion-based low-frequency word translation method
CN112597771A (en) Chinese text error correction method based on prefix tree combination
Namboodiri et al. On using classical poetry structure for Indian language post-processing
CN111428475A (en) Word segmentation word bank construction method, word segmentation method, device and storage medium
Sertsi et al. Hybrid input-type recurrent neural network language modeling for end-to-end speech recognition
CN115688904B (en) Translation model construction method based on noun translation prompt

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant