CN113449514A - Text error correction method and device suitable for specific vertical field - Google Patents

Text error correction method and device suitable for specific vertical field Download PDF

Info

Publication number
CN113449514A
CN113449514A CN202110687769.5A CN202110687769A CN113449514A CN 113449514 A CN113449514 A CN 113449514A CN 202110687769 A CN202110687769 A CN 202110687769A CN 113449514 A CN113449514 A CN 113449514A
Authority
CN
China
Prior art keywords
error correction
text
word
pinyin
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110687769.5A
Other languages
Chinese (zh)
Other versions
CN113449514B (en
Inventor
励建科
陈再蝶
朱晓秋
周杰
樊伟东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangxu Technology Co ltd
Original Assignee
Zhejiang Kangxu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Kangxu Technology Co ltd filed Critical Zhejiang Kangxu Technology Co ltd
Priority to CN202110687769.5A priority Critical patent/CN113449514B/en
Publication of CN113449514A publication Critical patent/CN113449514A/en
Application granted granted Critical
Publication of CN113449514B publication Critical patent/CN113449514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a text error correction method and a text error correction device suitable for a specific vertical field, which comprise the following steps: s1, importing the text into a pre-trained Bert error correction model to perform text word sense error correction; s2, guiding the text subjected to error correction by the Bert error correction model into the pinyin error correction model, and performing secondary error correction; and S3, introducing the text subjected to the secondary error correction by the pinyin error correction model into the hot word replacement rule model, and performing the third error correction. In the invention, the text input by a user is firstly poured into a Bert error correction model for text error correction, and then the text which is corrected once is led into a pinyin error correction model for secondary error correction, so that after semantic correction is carried out on the text, the text is corrected aiming at the proper nouns in the vertical field to achieve the enhancement effect, the accuracy of text error correction is improved, and then the text after secondary error correction is poured into a hot word replacement rule model for hot word replacement, dialect and other spoken texts are converted into proper nouns, and the error correction effect is enhanced again.

Description

Text error correction method and device suitable for specific vertical field
Technical Field
The invention relates to the technical field of natural language processing, in particular to a text error correction method and a text error correction device suitable for a specific vertical field.
Background
Natural Language Processing (NLP) is an artificial intelligence for professional analysis of human language, modern NLP is a mixed discipline integrating linguistics, computer disciplines and machine learning, and in order to make NLP respond to input text more accurately, we need to correct the text, thereby reducing noise. At present, semantic analysis is mainly focused on text error correction to find and replace wrongly written characters, and text error correction models in the market are mainly classified into machine learning and deep learning.
However, firstly, the machine learning model cannot fit data, so that the accuracy is low, while the deep learning model requires a large amount of accurate corpora and a large amount of time for training, and in the vertical field, the accuracy of the common deep model still needs to be improved due to the corpus noise problem;
secondly, there are many proper nouns in the vertical domain that will be used in this scenario, it is difficult to detect wrongly written words in proper nouns only by semantic error correction, and the model may even change correct words to wrong ones based on corpus;
finally, because of dialects or personal habits, there may be multiple names for the same thing, which may cause noise, making it difficult for NLP to obtain correct information, but these terms are not strictly wrong, and general error correction is difficult to react to these words.
Disclosure of Invention
In order to solve the technical problems mentioned in the background art, a text error correction method and an error correction device thereof are provided, which are suitable for a specific vertical field.
In order to achieve the purpose, the invention adopts the following technical scheme:
a text error correction method suitable for a specific vertical field comprises the following steps:
s1, importing the text into a pre-trained Bert error correction model to perform text word sense error correction;
s11, segmenting the text into short sentences according to punctuation marks;
s12, performing mask processing on the first word in the short sentence;
s13, carrying out short sentence prediction on the words subjected to mask processing through a pre-trained Bert error correction model, and storing all prediction results in a first list, wherein the prediction results in the first list are arranged in a sequence from large to small according to prediction scores;
s131, if the masked word is in the first list, the masked word is regarded as correct;
s132, if the masked characters are not in the first list, all common characters with the same pronunciation as the masked characters are obtained according to the pinyin and stored in a second list;
s1321, if the same word exists in the first list and the second list, regarding the masked word as a wrongly-written word, and selecting the word with the highest prediction score from the first list to replace the masked word so as to achieve the purpose of error correction;
s1322, if the words in the first list and the second list are not consistent, the masked words are regarded as correct;
s14, after the first character of the short sentence is judged, the next character in the short sentence is subjected to mask processing and the step S13 is repeated until all Chinese characters in the text are detected and corrected;
s2, guiding the text subjected to error correction by the Bert error correction model into the pinyin error correction model, and performing secondary error correction;
s21, converting all texts subjected to error correction by the Bert error correction model into pinyin;
s22, comparing the pinyin of the hot word with the pinyin of the text in sequence from small to large according to the number of words;
s23, when the hot word pinyin and the text pinyin are completely the same, replacing the part of the text which is the same as the hot word pinyin with the hot word;
s24, repeating the step S22 and the step S23 until all hotwords are checked.
S3, importing the text subjected to the secondary error correction by the pinyin error correction model into a hot word replacement rule model, and performing a third error correction;
s31, importing the text subjected to the secondary error correction by the pinyin error correction model into a hot word replacement rule model;
and S32, traversing the text by using the key list, replacing the key detected by the text with a corresponding value when the key is a word needing error correction, namely the corresponding correct word, and outputting the final error-corrected text.
As a further description of the above technical solution:
the text error correction device comprises a pre-trained Bert error correction model, a pinyin error correction model and a hot word replacement rule model, wherein the Bert error correction model is a Multi-layer bidirectional Transformers encoder, the Embelling of the Bert error correction model is formed by summing three Embelling, the three Embelling are Token Embelling, Segment Embelling and Position Embelling respectively, the Bert error correction model uses Multi _ Head Attenttion for coding, dimension expansion is carried out on the input Embelling, three dimensions of Key, Query and Value are obtained respectively, Multi _ Head division is carried out on each dimension, each divided Head is carried out with other words by self-attribute, new vectors are obtained, the new vectors of each Head are spliced, and linear conversion is carried out through a weight matrix, and a final Multi-Head Attention Value is obtained.
As a further description of the above technical solution:
the pinyin error correction model comprises a database, wherein the database contains hot words in a certain field and corresponding hot word pinyin and word number, and the hot words in the certain field are derived from proper nouns in the field.
As a further description of the above technical solution:
the pinyin error correction model comprises a database, wherein the database contains hot words in a certain field and corresponding hot word pinyin and word number, and the hot words in the certain field are derived from proper nouns in the field.
As a further description of the above technical solution:
the hot word replacement rule model comprises a dictionary, wherein words needing to be corrected are set as keys in the dictionary, corresponding correct words are set as values, and all the keys are stored in a key list.
As a further description of the above technical solution:
the pre-trained Bert error correction model is pre-trained through two models, wherein the two models comprise a Masked language model and a Next sense prediction;
the Masked language mode inputs randomly covered tokens in a corpus and predicts the randomly covered tokens to pre-train a Bert error correction model;
the Next sense prediction is performed by inputting a sentence a and a sentence B, wherein the sentence B is 50% likely to be the Next sentence of the sentence a and 50% likely to be a random sentence in the corpus, and the Bert error correction model is used for pre-training whether the sentence B is the Next sentence of the sentence a.
As a further description of the above technical solution:
the corpus comprises corpora of the hot words in a vertical field of a certain field.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that: in the invention, the text input by the user is firstly poured into a Bert error correction model for text error correction, and then the text which is corrected once is led into a pinyin error correction model for secondary error correction, after the text is semantically corrected, the special nouns in the vertical field are corrected to achieve the effect of strengthening, the accuracy of text error correction is improved, then the text after secondary error correction is poured into a hot word replacement rule model for hot word replacement, spoken language texts such as dialects and the like are converted into the special nouns, the error correction effect is strengthened again, by the three sets of error correction systems, not only can a basic error correction be performed semantically on a text through context, but also a certain degree of replacement error correction can be performed on proper nouns and special nouns in the vertical field and dialect in an application scene environment, which is difficult to realize by a single bert error correction model.
Drawings
FIG. 1 is a flowchart illustrating a text error correction method applicable to a specific vertical domain according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a Bert error correction flow of a text error correction method applicable to a specific vertical domain according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a pinyin error correction flow of a text error correction method applicable to a specific vertical domain according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a flow of hotword replacement rules of a text error correction method applicable to a specific vertical domain according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a structure of an input part of a Bert error correction model of a text error correction apparatus suitable for a specific vertical domain according to an embodiment of the present invention;
fig. 6 is a schematic flow chart illustrating Multi _ Head attachment in the Bert error correction model of a text error correction apparatus suitable for a specific vertical domain according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1-6, the present invention provides a technical solution: a text error correction method suitable for a specific vertical field comprises the following steps:
s1, importing the text into a pre-trained Bert error correction model to perform text word sense error correction;
s11, segmenting the text into short sentences according to punctuation marks;
s12, performing mask processing on the first word in the short sentence;
s13, carrying out short sentence prediction on the words subjected to mask processing through a pre-trained Bert error correction model, and storing all prediction results in a first list, wherein the prediction results in the first list are arranged in a sequence from large to small according to prediction scores;
s131, if the masked word is in the first list, the masked word is regarded as correct;
s132, if the masked characters are not in the first list, all common characters with the same pronunciation as the masked characters are obtained according to the pinyin and stored in a second list;
s1321, if the same word exists in the first list and the second list, regarding the masked word as a wrongly-written word, and selecting the word with the highest prediction score from the first list to replace the masked word so as to achieve the purpose of error correction;
s1322, if the words in the first list and the second list are not consistent, the masked words are regarded as correct;
s14, after the first character of the short sentence is judged, the next character in the short sentence is subjected to mask processing and the step S13 is repeated until all Chinese characters in the text are detected and corrected;
s2, guiding the text corrected by the Bert error correction model into the pinyin error correction model, performing secondary error correction, and performing enhancement aiming at the vertical field, wherein the Bert error correction model may not find errors or even change the originally correct words into the errors based on the corpus because a plurality of proper nouns which can be used in the small scene exist;
for example, a long positive bank card is broken into a long symbolic bank card by a text error, and the error may not be sensed only by semantic error correction of a Bert error correction model, so that a pinyin error correction model is used for reinforcement, and proper nouns of small scenes, such as the names of the Wuhua eight men in the field of banks, are stored in a database together with corresponding pinyin and word numbers, such as [ "great wall credit card", "chang + cheng + xin + yon + yong + ka", 5 ];
s21, converting all texts subjected to error correction by the Bert error correction model into pinyin;
s22, comparing the pinyin of the hot word with the pinyin of the text in sequence from small to large according to the number of words;
s23, when the hot word pinyin and the text pinyin are completely the same, replacing the part of the text which is the same as the hot word pinyin with the hot word;
s24, repeating the step S22 and the step S23 until all hotwords are checked.
S3, importing the text subjected to the secondary error correction by the pinyin error correction model into a hot word replacement rule model for carrying out third error correction, and further processing the text subjected to the pinyin error correction by using the hot word replacement rule model in order to further optimize an error correction result, wherein the texts such as spoken languages and dialects are possibly ignored by semantic error correction of a Bert error correction model, and meanwhile, the pinyin error correction model can also disregard the texts because the difference between the texts and the pronunciation of proper nouns is huge;
for example, the text needed by us is "personal credit", but the text input is "private credit", for the Bert error correction model, the semantic meaning of "private credit" is not problematic, and meanwhile, "si + ren + dai", 3 "is obviously different from" ge + dai ", 2", and pinyin error correction cannot respond;
for another example, "me" has several different reading methods in chinese, such as "me", "your", etc., these words can not be recognized by the Bert error correction model and the pinyin error correction model, so we use the hot word replacement rule model to correct the errors of these texts and replace them with the words we need;
s31, importing the text subjected to the secondary error correction by the pinyin error correction model into a hot word replacement rule model;
and S32, traversing the text by using the key list, replacing the key detected by the text with a corresponding value when the key is a word needing error correction, namely the corresponding correct word, and outputting the final error-corrected text.
Please refer to fig. 4 and 5, a text error correction device suitable for a specific vertical field includes a pre-trained Bert error correction model, a pinyin error correction model, and a hot word replacement rule model, where the Bert error correction model is a Multi-layer bidirectional Transformers encoder, an Embedding of the Bert error correction model is formed by summing three Embeddings, which are Token Embedding, Segment Embedding, and Position Embedding, respectively, the Bert error correction model uses Multi _ Head attribute to encode, and performs dimension expansion on the input Embedding to obtain three dimensions of Key, Query, and Value, and performs Multi _ Head division on each dimension, and each divided Head performs self-entry with other words to obtain a new vector, and then performs linear conversion on the new vector of each Head, and performs linear conversion on a weight matrix to obtain a final Multi-Head Attention Value;
the Bert error correction model depends on Multi _ Head attachment and bidirectional encoding to enable unsupervised learning of the model to be more effective, because a Transformer is used, the Bert error correction model is more efficient than previous models and can capture dependence of longer distance, bidirectional context information in the true sense can be captured, and in order to enable the Bert error correction model to play a better effect in the vertical field, the Bert error correction model is trained by adding linguistic data in the relevant vertical field in the linguistic data, so that the identification capability of the Bert error correction model in the field is improved.
Specifically, the pinyin error correction model comprises a database, wherein the database contains hot words in a certain field and corresponding hot word pinyin and word numbers, and the hot words in the certain field are derived from proper nouns in the field;
the method is characterized in that a Pinyin error correction model is used for carrying out secondary error correction on a text subjected to Bert semantic error correction, correction of proper nouns in related fields is emphasized, wrong characters of the proper nouns are difficult to detect through context, therefore, the wrong characters are probably ignored by semantic error correction, the proper nouns are set as hot words by using the Pinyin error correction model, when the hot word Pinyin and the text Pinyin are completely the same, the corresponding characters are replaced by the hot words to ensure the correctness of the proper noun text, the method is convenient to update, and updating can be completed only by adding or deleting the proper nouns in a hot word list, for example, a large amount of time can be saved in fields with frequent product changes, such as a bank field and the like.
Specifically, the hot word replacement rule model includes a dictionary, the words to be corrected in the dictionary are set as keys, the corresponding correct words are set as values, and all the keys are stored in a key list.
Specifically, the pre-trained Bert error correction model is pre-trained through two models, wherein the two models comprise a Masked language model and a Next sensitivity prediction;
the Masked language model pre-trains the Bert error correction model by inputting randomly covered tokens in the corpus and predicting the randomly covered tokens;
next sense prediction allows the Bert error correction model to pre-train whether sentence B is the Next sentence of sentence A by inputting sentence A and sentence B, where sentence B is 50% likely to be the Next sentence of sentence A and 50% likely to be a random one sentence in the corpus.
Specifically, the corpus includes corpora of hot words in a vertical field of a certain field, pre-training requires a large amount of corpora support, and in order to improve the recognition capability of the Bert error correction model in the vertical field, we add corpora of hot words in a corresponding field to the corpora for update training, for example, we hope to use a relevant model in a bank field and add corpora update training including hot words in the bank vertical field.
The method is characterized in that a hot word replacement rule model is used for carrying out third error correction on a text after secondary error correction so as to strengthen the error correction effect, different people can call different things of the same kind, for NLP, noise can be caused to influence task efficiency, however, the words are not wrong words strictly, so semantic error correction and pinyin error correction possibly ignore the words, the different calls are set as hot words, when the hot words exist in the text, the hot words are replaced by the words required by the NLP, the generation of noise is reduced to the maximum extent, and the method is the same as a pinyin error correction part, the updating operation is very simple and convenient, and only the words required to be corrected and the corresponding corrected words need to be added in the hot word rule;
in the invention, the text input by the user is firstly poured into a Bert error correction model for text error correction, and then the text which is corrected once is led into a pinyin error correction model for secondary error correction, after the text is semantically corrected, the special nouns in the vertical field are corrected to achieve the effect of strengthening, the accuracy of text error correction is improved, then the text after secondary error correction is poured into a hot word replacement rule model for hot word replacement, spoken language texts such as dialects and the like are converted into the special nouns, the error correction effect is strengthened again, by the three sets of error correction systems, not only can a basic error correction be performed semantically on a text through context, but also a certain degree of replacement error correction can be performed on proper nouns and special nouns in the vertical field and dialect in an application scene environment, which is difficult to realize by a single bert error correction model.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (6)

1. A text error correction method suitable for a specific vertical field is characterized by comprising the following steps:
s1, importing the text into a pre-trained Bert error correction model to perform text word sense error correction;
s11, segmenting the text into short sentences according to punctuation marks;
s12, performing mask processing on the first word in the short sentence;
s13, carrying out short sentence prediction on the words subjected to mask processing through a pre-trained Bert error correction model, and storing all prediction results in a first list, wherein the prediction results in the first list are arranged in a sequence from large to small according to prediction scores;
s131, if the masked word is in the first list, the masked word is regarded as correct;
s132, if the masked characters are not in the first list, all common characters with the same pronunciation as the masked characters are obtained according to the pinyin and stored in a second list;
s1321, if the same word exists in the first list and the second list, regarding the masked word as a wrongly-written word, and selecting the word with the highest prediction score from the first list to replace the masked word so as to achieve the purpose of error correction;
s1322, if the words in the first list and the second list are not consistent, the masked words are regarded as correct;
s14, after the first character of the short sentence is judged, the next character in the short sentence is subjected to mask processing and the step S13 is repeated until all Chinese characters in the text are detected and corrected;
s2, guiding the text subjected to error correction by the Bert error correction model into the pinyin error correction model, and performing secondary error correction;
s21, converting all texts subjected to error correction by the Bert error correction model into pinyin;
s22, comparing the pinyin of the hot word with the pinyin of the text in sequence from small to large according to the number of words;
s23, when the hot word pinyin and the text pinyin are completely the same, replacing the part of the text which is the same as the hot word pinyin with the hot word;
s24, repeating the step S22 and the step S23 until all hotwords are checked.
S3, importing the text subjected to the secondary error correction by the pinyin error correction model into a hot word replacement rule model, and performing a third error correction;
s31, importing the text subjected to the secondary error correction by the pinyin error correction model into a hot word replacement rule model;
and S32, traversing the text by using the key list, replacing the key detected by the text with a corresponding value when the key is a word needing error correction, namely the corresponding correct word, and outputting the final error-corrected text.
2. The text error correction device is characterized by comprising a pre-trained Bert error correction model, a pinyin error correction model and a hot word replacement rule model, wherein the Bert error correction model is a Multi-layer bidirectional Transformers encoder, the Embelling of the Bert error correction model is formed by summing three kinds of Embelling, the three kinds of Embelling are Token Embellings, Segment Embellings and Position Embellings respectively, the Bert error correction model is encoded by using Multi _ Head Attention, three dimensions of Key, Query and Value are obtained by carrying out dimension expansion on the input Embelling, Multi _ Head division is carried out on each dimension, each divided Head is selected-attribute with other words, new vectors of each Head are obtained, and Multi-Head Attention values are obtained by carrying out linear conversion on the new vectors of each Head through a weight matrix.
3. The apparatus of claim 2, wherein the pinyin error correction model comprises a database, the database containing hot words of a certain domain and corresponding hot word pinyin and word counts, the hot words of the certain domain being derived from proper nouns of the certain domain.
4. The apparatus according to claim 2, wherein the hot word replacement rule model comprises a dictionary, the dictionary sets the word to be corrected as key, sets the corresponding correct word as value, and stores all the keys in a key list.
5. The apparatus of claim 2, wherein the pre-trained Bert error correction model is pre-trained by two models, the two models comprising a Masked language model and a Next content prediction;
the Masked language mode inputs randomly covered tokens in a corpus and predicts the randomly covered tokens to pre-train a Bert error correction model;
the Next sense prediction is performed by inputting a sentence a and a sentence B, wherein the sentence B is 50% likely to be the Next sentence of the sentence a and 50% likely to be a random sentence in the corpus, and the Bert error correction model is used for pre-training whether the sentence B is the Next sentence of the sentence a.
6. The apparatus according to claim 5, wherein the corpus contains corpus of hot words in a vertical domain of a certain domain.
CN202110687769.5A 2021-06-21 2021-06-21 Text error correction method and device suitable for vertical field Active CN113449514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110687769.5A CN113449514B (en) 2021-06-21 2021-06-21 Text error correction method and device suitable for vertical field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110687769.5A CN113449514B (en) 2021-06-21 2021-06-21 Text error correction method and device suitable for vertical field

Publications (2)

Publication Number Publication Date
CN113449514A true CN113449514A (en) 2021-09-28
CN113449514B CN113449514B (en) 2023-10-31

Family

ID=77812053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110687769.5A Active CN113449514B (en) 2021-06-21 2021-06-21 Text error correction method and device suitable for vertical field

Country Status (1)

Country Link
CN (1) CN113449514B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115168565A (en) * 2022-07-07 2022-10-11 北京数美时代科技有限公司 Cold start method, device, equipment and storage medium for vertical domain language model
CN114817469B (en) * 2022-04-27 2023-08-08 马上消费金融股份有限公司 Text enhancement method, training method and training device for text enhancement model
CN116975298A (en) * 2023-09-22 2023-10-31 厦门智慧思明数据有限公司 NLP-based modernized society governance scheduling system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020186778A1 (en) * 2019-03-15 2020-09-24 平安科技(深圳)有限公司 Error word correction method and device, computer device, and storage medium
CN112287670A (en) * 2020-11-18 2021-01-29 北京明略软件系统有限公司 Text error correction method, system, computer device and readable storage medium
CN112395861A (en) * 2020-11-18 2021-02-23 平安普惠企业管理有限公司 Method and device for correcting Chinese text and computer equipment
WO2021189851A1 (en) * 2020-09-03 2021-09-30 平安科技(深圳)有限公司 Text error correction method, system and device, and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020186778A1 (en) * 2019-03-15 2020-09-24 平安科技(深圳)有限公司 Error word correction method and device, computer device, and storage medium
WO2021189851A1 (en) * 2020-09-03 2021-09-30 平安科技(深圳)有限公司 Text error correction method, system and device, and readable storage medium
CN112287670A (en) * 2020-11-18 2021-01-29 北京明略软件系统有限公司 Text error correction method, system, computer device and readable storage medium
CN112395861A (en) * 2020-11-18 2021-02-23 平安普惠企业管理有限公司 Method and device for correcting Chinese text and computer equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817469B (en) * 2022-04-27 2023-08-08 马上消费金融股份有限公司 Text enhancement method, training method and training device for text enhancement model
CN115168565A (en) * 2022-07-07 2022-10-11 北京数美时代科技有限公司 Cold start method, device, equipment and storage medium for vertical domain language model
CN116975298A (en) * 2023-09-22 2023-10-31 厦门智慧思明数据有限公司 NLP-based modernized society governance scheduling system and method
CN116975298B (en) * 2023-09-22 2023-12-05 厦门智慧思明数据有限公司 NLP-based modernized society governance scheduling system and method

Also Published As

Publication number Publication date
CN113449514B (en) 2023-10-31

Similar Documents

Publication Publication Date Title
CN113449514B (en) Text error correction method and device suitable for vertical field
CN112183094B (en) Chinese grammar debugging method and system based on multiple text features
CN114580382A (en) Text error correction method and device
CN111599340A (en) Polyphone pronunciation prediction method and device and computer readable storage medium
Alkhatib et al. Deep learning for Arabic error detection and correction
CN113268576B (en) Deep learning-based department semantic information extraction method and device
Abbad et al. Multi-components system for automatic Arabic diacritization
KR20230009564A (en) Learning data correction method and apparatus thereof using ensemble score
CN113239666A (en) Text similarity calculation method and system
KR20230061001A (en) Apparatus and method for correcting text
CN110633456B (en) Language identification method, language identification device, server and storage medium
KR101941692B1 (en) named-entity recognition method and apparatus for korean
CN115204164B (en) Method, system and storage medium for identifying communication sensitive information of power system
Muhamad et al. Proposal: A hybrid dictionary modelling approach for malay tweet normalization
CN111090720B (en) Hot word adding method and device
CN112528003B (en) Multi-item selection question-answering method based on semantic sorting and knowledge correction
CN114896966A (en) Method, system, equipment and medium for positioning grammar error of Chinese text
Nyberg Grammatical error correction for learners of swedish as a second language
Lv et al. StyleBERT: Chinese pretraining by font style information
CN114444492A (en) Non-standard word class distinguishing method and computer readable storage medium
Hasan et al. SweetCoat-2D: Two-Dimensional Bangla Spelling Correction and Suggestion Using Levenshtein Edit Distance and String Matching Algorithm
Saychum et al. Efficient Thai Grapheme-to-Phoneme Conversion Using CRF-Based Joint Sequence Modeling.
CN111428005A (en) Standard question and answer pair determining method and device and electronic equipment
Athanaselis et al. A corpus based technique for repairing ill-formed sentences with word order errors using co-occurrences of n-grams
CN113012685A (en) Audio recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: No. 2-206, No. 1399 Liangmu Road, Cangqian Street, Yuhang District, Hangzhou City, Zhejiang Province, 311100

Patentee after: Kangxu Technology Co.,Ltd.

Country or region after: China

Address before: 310000 2-206, 1399 liangmu Road, Cangqian street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee before: Zhejiang kangxu Technology Co.,Ltd.

Country or region before: China