CN114492453A - Text error correction method and device, storage medium and electronic equipment - Google Patents

Text error correction method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN114492453A
CN114492453A CN202111677576.8A CN202111677576A CN114492453A CN 114492453 A CN114492453 A CN 114492453A CN 202111677576 A CN202111677576 A CN 202111677576A CN 114492453 A CN114492453 A CN 114492453A
Authority
CN
China
Prior art keywords
text
error
corrected
clauses
error correction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111677576.8A
Other languages
Chinese (zh)
Inventor
杨子清
林旻
崔一鸣
伍大勇
陈志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Xunfei Institute Of Artificial Intelligence
Zhongke Xunfei Internet Beijing Information Technology Co ltd
iFlytek Co Ltd
Original Assignee
Hebei Xunfei Institute Of Artificial Intelligence
Zhongke Xunfei Internet Beijing Information Technology Co ltd
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Xunfei Institute Of Artificial Intelligence, Zhongke Xunfei Internet Beijing Information Technology Co ltd, iFlytek Co Ltd filed Critical Hebei Xunfei Institute Of Artificial Intelligence
Priority to CN202111677576.8A priority Critical patent/CN114492453A/en
Publication of CN114492453A publication Critical patent/CN114492453A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application provides a text error correction method and device, a storage medium and electronic equipment, and relates to the technical field of text processing. The text error correction method comprises the following steps: judging whether the text to be corrected is wrong or not by using an error detection model; and if the text to be corrected is wrong, determining a corrected text corresponding to the text to be corrected by using an error correction model, wherein the error detection model is obtained by training a discriminator in the countermeasure generation network, and the error correction model is obtained by training a generator in the countermeasure generation network. The text error correction method only carries out error correction tasks on the text with errors, reduces the calculated amount of an error correction model, and improves the running speed of a text error correction system.

Description

Text error correction method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of text processing technologies, and in particular, to a text error correction method and apparatus, a storage medium, and an electronic device.
Background
With the progress of social science and technology and the continuous development of deep learning technology, various deep learning task-based technologies on computers emerge endlessly. The deep learning technology is utilized to assist people to revise the article, and the characteristic of convenience and high efficiency greatly improves the revision efficiency of the article.
In the related art, text error correction is generally performed by using an end-to-end neural network model, and a training sample of the end-to-end neural network model is usually generated artificially, so that the accuracy of the model is influenced by subjective factors of people, and the problem that the text error correction result is not accurate enough is brought about.
Disclosure of Invention
The present application is proposed to solve the above-mentioned technical problems. The embodiment of the application provides a text error correction method and device, a storage medium and electronic equipment.
In a first aspect, an embodiment of the present application provides a text error correction method, which is applied to a text error correction system based on a countermeasure generation network, and the method includes: judging whether the text to be corrected is wrong or not by using an error detection model; and if the text to be corrected is wrong, determining a corrected text corresponding to the text to be corrected by using an error correction model, wherein the error detection model is obtained by training a discriminator in the countermeasure generation network, and the error correction model is obtained by training a generator in the countermeasure generation network.
With reference to the first aspect, in some implementations of the first aspect, determining whether the text to be corrected is incorrect using an error detection model includes: dividing the text to be corrected to obtain M clauses corresponding to the text to be corrected, wherein M is a positive integer; inputting the M clauses into an error detection model to obtain detection results corresponding to the M clauses; respectively judging whether the M clauses are wrong or not based on the detection results corresponding to the M clauses; if the text to be corrected is wrong, determining a corrected text corresponding to the text to be corrected by using an error correction model, wherein the method comprises the following steps: if N clauses in the M clauses are wrong, respectively determining correction sentences corresponding to the N clauses by using an error correction model, wherein N is a positive integer less than or equal to M; and based on the sequence information among the M clauses, recombining the corrected sentences corresponding to the N clauses with the error-free clauses in the M clauses to generate corrected texts.
With reference to the first aspect, in some implementations of the first aspect, the text to be corrected includes a chinese text to be corrected, and the determining, by using an error correction model, correction sentences corresponding to the N clauses respectively includes: determining character characteristic vectors corresponding to the Chinese characters in the clauses aiming at each clause in the N clauses; determining at least one expansion word corresponding to each Chinese character in the clause based on the character characteristic vector corresponding to each Chinese character in the clause and semantic information of the clause by using an error correction model; determining correction results corresponding to the Chinese characters in the clauses based on at least one extension word corresponding to each Chinese character in the clauses by using an error correction model; and determining a corrected sentence corresponding to the clause based on the correction result corresponding to each Chinese character in the clause by using the error correction model.
With reference to the first aspect, in some implementations of the first aspect, determining at least one expansion word corresponding to each of the chinese characters in the clause based on the word feature vector corresponding to each of the chinese characters in the clause and semantic information of the clause includes: for each Chinese character in the clause, determining P candidate expansion words corresponding to the Chinese character and the use probability data corresponding to the P candidate expansion words based on the semantic information of the clause and the character feature vector corresponding to the Chinese character, wherein P is a positive integer; and selecting at least one expansion word corresponding to the Chinese character from the P candidate expansion words based on a preset use probability threshold and the use probability data corresponding to the P candidate expansion words respectively.
With reference to the first aspect, in some implementations of the first aspect, determining a correction result corresponding to each of the chinese characters in the clause based on at least one expanded word corresponding to each of the chinese characters in the clause includes: aiming at each Chinese character in the clause, determining whether the Chinese character is wrong or not based on at least one expansion word corresponding to the Chinese character; if the Chinese character is wrong, determining a correction result corresponding to the Chinese character based on at least one expansion word corresponding to the Chinese character; and if the Chinese character is correct, taking the Chinese character as a correction result corresponding to the Chinese character.
With reference to the first aspect, in certain implementations of the first aspect, the text correction method further includes: inputting the N clauses into an error detection model to obtain error position data corresponding to the N clauses; and generating correction mark information of correction sentences corresponding to the N clauses based on the error position data corresponding to the N clauses and the correction sentences corresponding to the N clauses.
With reference to the first aspect, in some implementations of the first aspect, before determining whether the text to be corrected has an error by using an error detection model, the method further includes: determining S training sets, wherein the training sets comprise correct text samples, error text samples corresponding to the correct text samples and error detail labels corresponding to the error text samples; and training the generator and the discriminator based on the S training sets to obtain an error detection model and an error correction model.
With reference to the first aspect, in some implementations of the first aspect, deriving the error detection model and the error correction model based on the S training set training generators and the discriminators includes: inputting S error text samples included in the S training sets into a generator to obtain error correction text samples corresponding to the S error text samples; aiming at each error text sample in the S error text samples, inputting an error correction text sample corresponding to the error text sample and a correct text sample corresponding to the error text sample into a discriminator to obtain the text correct probability of the error correction text sample corresponding to the error text sample, and adjusting the parameters of a generator and/or the discriminator based on the text correct probability of the error correction text sample corresponding to each error text sample; and repeatedly utilizing the S training sets for training until a preset training stopping condition is met, and obtaining an error detection model and the error correction model.
In a second aspect, an embodiment of the present application provides a text correction apparatus applied to a text correction system based on a countermeasure generation network, the apparatus including: the judging module is used for judging whether the text to be corrected has errors by using the error detection model; and the corrected text determining module is used for determining the corrected text corresponding to the text to be corrected by using an error correction model when the text to be corrected has errors, wherein the error detection model is obtained by training a discriminator in the countermeasure generation network, and the error correction model is obtained by training a generator in the countermeasure generation network.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program is configured to execute the text error correction method mentioned in the first aspect.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor; a memory for storing processor-executable instructions; the processor is configured to perform the text error correction method mentioned in the above first aspect.
According to the text error correction method provided by the embodiment of the application, firstly, the text to be corrected is judged by using the error detection model, the text to be corrected which is judged to be in error is input into the error correction model, and the method is different from the method of directly inputting the text to be corrected into the model for text error correction. In addition, the error detection model is obtained by training of the discriminator in the countermeasure generation network, and the error correction model is obtained by training of the generator in the countermeasure generation network, and is different from the traditional error correction model obtained by training of the end-to-end neural network.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a schematic view illustrating an application scenario of a text error correction method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of a text error correction method according to an embodiment of the present application.
Fig. 3 is a schematic flow chart illustrating a process of determining whether a text to be corrected has an error by using an error detection model according to an embodiment of the present application.
Fig. 4 is a schematic flowchart illustrating a process of determining a corrected text corresponding to a text to be corrected by using an error correction model according to an embodiment of the present application.
Fig. 5 is a schematic flow chart illustrating a process of determining a corrected sentence corresponding to each of N clauses by using an error correction model according to an embodiment of the present application.
Fig. 6 is a schematic flow chart illustrating a process of determining at least one expansion word corresponding to each of the chinese characters in the clause according to an embodiment of the present application.
Fig. 7 is a schematic flow chart illustrating a process of determining a correction result corresponding to each of the chinese characters in the clause based on at least one expansion word corresponding to each of the chinese characters in the clause according to an embodiment of the present application.
Fig. 8 is a schematic diagram illustrating a structure of an error correction model for correcting an erroneous clause.
Fig. 9 is a schematic flow chart illustrating a process of determining a corrected text corresponding to a text to be corrected by using an error correction model according to another embodiment of the present application.
Fig. 10 is a schematic flowchart of a text error correction method according to another embodiment of the present application.
Fig. 11 is a schematic flow chart illustrating a process of obtaining an error detection model and an error correction model based on S training sets and training generators and discriminators according to an embodiment of the present application.
Fig. 12 is a schematic structural diagram of a text error correction apparatus according to an embodiment of the present application.
Fig. 13 is a schematic structural diagram of a text error correction apparatus according to another embodiment of the present application.
Fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the related art, there are various text error correction methods. For example, a key information map of the document is generated according to the context information of the text to be corrected, and text correction is performed based on the key information map; text error correction is carried out through the character self characteristics, the pronunciation characteristics and the context semantic characteristics; obtaining a text error correction label sequence through a text error correction model, and obtaining a corrected text through the text error correction label sequence; and (4) calculating to obtain various semantic features by using a pre-training model, obtaining an error correction text through the semantic features, and the like.
Accordingly, the existing text error correction method may encounter some difficulties in training the text error correction model. For example, if an error text in a real situation is used, a training sample set is less, and thus an error correction model is difficult to learn various errors, so that the error correction model ignores errors in some texts in the using process; if the text error correction model receives almost all the text with errors in the training process, the text error correction model can excessively correct the input text in practical use. In addition, training samples containing a lot of error information need to be constructed artificially, and the training samples are not close to the real data.
In addition, the conventional text error correction method generally inputs the text to be corrected directly into a text error correction model obtained by end-to-end neural network training, and the text error correction model does not determine whether the text to be corrected is incorrect or not, and directly corrects the text to be corrected throughout, so that the calculation amount of the text error correction model is increased, and the operation speed of a text error correction system is reduced.
Fig. 1 is a schematic view illustrating an application scenario of a text error correction method according to an embodiment of the present application. The scenario comprises a server 1 and a computer device 2. The computer device 2 can receive the text to be corrected input by the user, and the computer device 2 and the server 1 are connected through a communication network. Optionally, the communication network is a wired network or a wireless network.
The computer device 2 may be a general-purpose computer or a computer device composed of an application-specific integrated circuit, and the like, which is not limited in this embodiment of the present application. For example, the computer device 2 may be a mobile terminal device such as a tablet computer or a personal computer. Those skilled in the art will appreciate that the number of the above-described computer apparatuses 2 may be one or more, and the types thereof may be the same or different. The above-mentioned computer device 2 may be one, or the above-mentioned computer device 2 may be several tens or hundreds, or more. The number and types of the computer devices 2 are not limited in the embodiments of the present application. The server 1 is a server, or consists of a plurality of servers, or is a virtualization platform, or is a cloud computing service center.
In an embodiment, an error detection model and an error correction model are deployed in the computer device 2. The computer device 2 may modify the text to be corrected received by the computer device by using the error detection model and the error correction model deployed thereon, so as to obtain a modified text corresponding to the text to be corrected.
Illustratively, in the practical application process, the computer device 2 receives a text to be corrected input by a user and sends the received text to be corrected to the server 1, the server 1 generates a corrected text corresponding to the text to be corrected by using an error detection model and an error correction model deployed on the computer device 2 and sends the corrected text to the computer device 2, and the computer device 2 presents the received corrected text to the user.
Fig. 2 is a schematic flow chart of a text error correction method according to an embodiment of the present application. Illustratively, the text correction method is applied to a text correction system based on a countermeasure generation network. As shown in fig. 2, the text error correction method provided in the embodiment of the present application includes the following steps.
And step 30, judging whether the text to be corrected has errors or not by using the error detection model.
First, the countermeasure generation network is composed of a generator that generates data close to reality and a discriminator that discriminates the data generated by the generator from the reality. The discriminator discriminates the data generated by the generator as negative and the real data as positive. The generator continuously generates data closer to the real data in order to deceive the discriminator, and the discriminator continuously enhances the judgment capability to distinguish the generated data from the real data. This gives a generator that can generate spurious data by countermeasure, where the discriminator is hard to distinguish between these data, but can distinguish between data that is not near reality. Specifically, the error detection model in the present application is trained by the arbiter in the countermeasure generation network.
The text to be corrected may be an article, a segment of words, or a sentence, which is not specifically limited in this embodiment of the present application. The errors of the text to be corrected can be characters out of order, wrongly written characters, missing characters, multiple characters and the like.
In the practical application process, a user inputs a text to be corrected into the text correction system, the text correction system performs some processing on the text to be corrected, then the text to be corrected is sent to the error detection model, and the error detection model makes a judgment on whether the text to be corrected is wrong or not.
And step 40, if the text to be corrected is wrong, determining a corrected text corresponding to the text to be corrected by using the error correction model.
Specifically, the correction text contains correction type information and correction flag information. The error correction model is trained by the generators in the countermeasure generation network. The training target of the generator is changed from generating data with false or false to generating corrected text without difference from real text.
Illustratively, the revision type information is of a missing, redundant, out of order, misword, or the like type. The correction mark information is a mark that is made at a position corresponding to the corrected text with respect to the text to be corrected, for example, a highlight mark is used, but of course, the mark may also be made in a manner of underlining, bolding a font, changing a font color, and the like. Further, the error information of the text to be corrected may contain one or more types.
In the practical application process, the error detection model inputs the text to be corrected, which is judged to be erroneous, into the error correction model, and the error correction model corrects the text to be corrected and outputs a corrected text corresponding to the text to be corrected.
The text error correction method provided by the embodiment of the application comprises an error detection model and an error correction model. In practical application, error sentences are screened by the error detection model, so that the frequency of the error sentences received by the error correction model is close to the high frequency in training, and the occurrence of unnecessary error correction is reduced. The error samples of the error detection model and the error correction model during training are closer to the error samples under the real condition, so that the corrected text obtained by performing text error correction through the error detection model and the error correction model in the embodiment is more real and accurate. In addition, the error correction model is used for correcting the text to be corrected on the basis that the error detection model judges the text to be corrected wrongly, so that the calculated amount of the error correction model is reduced, and the operation efficiency of the text correction system is improved. The correction text contains correction type information and correction mark information, and correction traces can be further conveniently inquired by a user.
Fig. 3 is a schematic flow chart illustrating a process of determining whether a text to be corrected has an error by using an error detection model according to an embodiment of the present application. The embodiment shown in fig. 3 is extended based on the embodiment shown in fig. 2, and the differences between the embodiment shown in fig. 3 and the embodiment shown in fig. 2 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 3, the step of determining whether the text to be corrected has errors by using the error detection model includes the following steps.
And step 31, dividing the text to be corrected to obtain M clauses corresponding to the text to be corrected. Wherein M is a positive integer.
Specifically, a sentence dividing tool in the text error correction system is used for dividing the text to be corrected to obtain M sentences corresponding to the text to be corrected. Further, the sentence segmentation tool is used to "to make the error detection model and the error correction model better analyze semantic information of the sentences. ","! ","? And dividing the text to be corrected according to the equal symbol marks, and recording the sequence information of each division in the text to be corrected.
As an example, when the text to be corrected is "yearly," it always takes a hard time on a road where thorns are paved. When people are healthy, things look good but they are still beautiful. When the youth is all driven to the sky, the user will know the cherish! Is regret available? "the sentence segmentation result of the text to be corrected is: "in the course of a few years, it always takes a long time to be painstakingly on a road full of thorns. "is the first sentence," in good age, those things look like beautiful are maverick. "is the second sentence," when youth is all being volatilized empty, it will know the cherish! "is the third sentence," is regret available? "is the fourth clause. The first clause, the second clause, the third clause and the fourth clause represent sequence information of each clause.
It is to be understood that the representation manner of the order information of each clause shown in the present embodiment is only an example, and those skilled in the art can select the representation manner of the order information of the clauses according to the actual situation.
And step 32, inputting the M clauses into the error detection model to obtain detection results corresponding to the M clauses.
Specifically, the detection result may be represented by a clause score, or may be represented by a clause correct probability value or other methods, which is not specifically limited in this embodiment of the present application. Next, the embodiments of the present application are further analyzed by using the sentence score as a sentence detection result.
Following the example in step 31, inputting four clauses corresponding to the text to be corrected into the error detection model, and outputting the detection results as follows: the first clause, when young, always experiences difficulty on roads paved with thorns. When the score 97, the second clause, is positive for life, the good-looking things are still good. "score 89, third clause" when youth is all driven empty, it will know the value! "score 98, fourth clause" is regret available? "score 96.
And step 33, respectively judging whether the M clauses are wrong or not based on the detection results corresponding to the M clauses.
And according to a preset correct sentence score threshold value, if a certain sentence score is smaller than the correct sentence score threshold value, the sentence corresponding to the sentence score is regarded as a wrong sentence. If the score of a clause is greater than or equal to the score threshold of the correct sentence, the clause corresponding to the score of the clause is regarded as a non-error clause.
Illustratively, if the correct sentence score threshold is set to 95. According to the example in step 31 and step 32, the first clause, when "yearly old, always experiences a painstaking course on the road where thorns are paved. When the second clause is true, things which look good tend to be true. "there is a wrong clause, the third clause" will know the cherries when all youth is volatilized to be empty! "is the fourth sentence" is repentability useful? "is an error-free clause. It should be understood that the score threshold of the correct sentence may be set according to actual situations, and in order to sufficiently improve the adaptability of the text error correction method provided in this embodiment, this is not specifically limited in this embodiment of the application.
In the technical scheme provided by the embodiment of the application, the text to be corrected is divided into sentences, whether each sentence is wrong is judged, and on the basis, the error correction model corrects the text, so that the calculation amount of a text correction system is greatly reduced.
Fig. 4 is a schematic flow chart illustrating a process of determining a corrected text corresponding to a text to be corrected by using an error correction model according to an embodiment of the present application. The embodiment shown in fig. 4 is extended based on the embodiment shown in fig. 3, and the differences between the embodiment shown in fig. 4 and the embodiment shown in fig. 3 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 4, if the text to be corrected is incorrect, the step of determining the corrected text corresponding to the text to be corrected by using the error correction model includes the following steps.
In the actual use process, if the determination result in step 33 is that N clauses out of the M clauses are incorrect, step 41 is executed. If the determination result in step 33 is that W clauses out of the M clauses are correct, step 42 is executed.
And step 41, determining correction sentences corresponding to the N clauses respectively by using the error correction model. Wherein N is a positive integer less than or equal to M.
Specifically, N clauses determined by the error detection model to have errors are input to the error correction model, and the error correction model outputs corrected sentences corresponding to the N clauses, respectively. The output correction sentence contains correction type information.
Along the example in fig. 3, the second clause "age good, those things that seem nice tend to be true. "after the error detection model judges that there is an error clause, the second clause is input to the error correction model, and the correction sentence output by the error correction model is: when the age is positive (missing), what looks good is true. The "missing" is correction type information included in the correction sentence.
And step 42, keeping the W error-free clauses. Wherein W is a positive integer less than or equal to M.
Following the example in fig. 3, the first clause, the third clause, and the fourth clause are judged as error-free clauses by the error detection model, and the first clause, the third clause, and the fourth clause are retained.
And step 44, recombining the corrected sentences corresponding to the N clauses and the error-free clauses in the M clauses based on the sequence information among the M clauses to generate corrected texts.
Along the example in fig. 3, the corrected sentence corresponding to the error clause and the error-free clause are recombined according to the sequence information between the 4 clauses, and the obtained corrected text is: in the course of the year, it always takes a long time to be painstaking on the road where thorns are paved. When the age is positive (missing), what looks good is true. When the youth is all driven to the sky, the user will know the cherish! Is regret available?
According to the technical scheme provided by the embodiment of the application, on the basis that the error detection model judges whether the clauses corresponding to the text to be corrected are incorrect or not, the error correction model only corrects the clauses judged to be incorrect, and then recombines the corrected sentences and the error-free sentences according to the sequence information marked during the clauses to obtain the corrected text corresponding to the text to be corrected. The technical scheme in the embodiment greatly reduces the calculated amount of the error correction model and improves the running speed of the text error correction system.
Fig. 5 is a schematic flow chart illustrating a process of determining a corrected sentence corresponding to each of N clauses by using an error correction model according to an embodiment of the present application. The embodiment shown in fig. 5 is extended based on the embodiment shown in fig. 4, and the differences between the embodiment shown in fig. 5 and the embodiment shown in fig. 4 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 5, the step of determining a corrected sentence corresponding to each of the N clauses by using the error correction model includes the following steps.
Step 411, determining character feature vectors corresponding to the Chinese characters in the clauses for each clause of the N clauses. The text to be corrected comprises a Chinese text to be corrected.
Specifically, a user inputs a text to be corrected into a text correction system, the text correction system represents Chinese characters in each clause in N clauses in the text to be corrected as a group of numbers in a computer memory, and the group of numbers are character feature vectors of the Chinese characters.
Step 412, determining at least one expansion word corresponding to each Chinese character in the clause based on the character feature vector corresponding to each Chinese character in the clause and the semantic information of the clause by using the error correction model.
Specifically, expansion words corresponding to the character feature vectors of the Chinese characters with the first position and the nth position of the wrong clause are determined in sequence, and the expansion words corresponding to the character feature vectors of the Chinese characters with the x-th position are determined on the basis of the expansion words corresponding to the character feature vectors of the Chinese characters with the first position and the x-1 th position. Wherein x is more than 1 and less than or equal to n.
And 413, determining correction results corresponding to the Chinese characters in the clauses based on at least one expansion word corresponding to each Chinese character in the clauses by using the error correction model.
And step 414, determining corrected sentences corresponding to the clauses based on the correction results corresponding to the Chinese characters in the clauses by using the error correction model.
Specifically, when the error sentence is corrected, the correction is performed word by word, and the corrected sentence corresponding to the error sentence is determined according to the correction result corresponding to each Chinese character in the error sentence.
And determining a correction result corresponding to each Chinese character by determining at least one expansion word corresponding to each Chinese character in the misclassification sentence. By the method, a plurality of corrected sentences corresponding to the wrong sentences can be obtained, and then a plurality of possible corrected texts can be obtained on the basis of not deviating from the semantics which are required to be expressed by the original text to be corrected.
Fig. 6 is a schematic flow chart illustrating a process of determining at least one expansion word corresponding to each of the chinese characters in the clause according to an embodiment of the present application. The embodiment shown in fig. 6 is extended based on the embodiment shown in fig. 5, and the differences between the embodiment shown in fig. 6 and the embodiment shown in fig. 5 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 6, the step of determining at least one expansion word corresponding to each of the chinese characters in the clause based on the character feature vector corresponding to each of the chinese characters in the clause and the semantic information of the clause includes the following steps.
Step 4121, for each chinese character in the clause, determining P candidate expanded words corresponding to the chinese character and usage probability data corresponding to the P candidate expanded words, respectively, based on semantic information of the clause and the character feature vector corresponding to the chinese character.
Illustratively, for clauses: the weather is good, and the analysis is performed on the embodiment by taking the Chinese character "day" at the first position as an example. For the Chinese character "day" at the first position, determining candidate expansion words corresponding to the Chinese character "day" at the first position as "weather", "today", "semi-day", "tomorrow", "sunny day", and the like based on the semantic information of the clauses and the character feature vectors corresponding to the Chinese characters. Further, the usage probability data of "weather" is 89%, the usage probability data of "today" is 90%, the usage probability data of "half day" is 30%, the usage probability data of "tomorrow" is 77%, and the usage probability data of "sunny day" is 51%.
Step 4122, based on the preset usage probability threshold and the usage probability data corresponding to the P candidate expansion words, selecting at least one expansion word corresponding to the chinese character from the P candidate expansion words.
For example, if the preset usage probability threshold is 85%, the satisfactory expansion words corresponding to the chinese character "day" at the first position may be determined to be "today" and "weather" for the usage probability data corresponding to each candidate expansion word in step 4121. It can be understood that by adjusting the preset usage probability threshold, a plurality of expanded word results for a certain chinese character can be obtained.
It should be noted that the preset usage probability threshold may be determined according to an actual usage situation, and the specific usage probability threshold is not limited in the embodiment of the present application.
In the embodiment of the application, the wrong clauses are corrected by taking characters as units, and various different expansion words related to a certain Chinese character can be obtained by adjusting the preset use probability threshold, so that various correction results related to the wrong clauses are obtained for a user to select.
Fig. 7 is a schematic flow chart illustrating a process of determining a correction result corresponding to each of the chinese characters in the clause based on at least one expansion word corresponding to each of the chinese characters in the clause according to an embodiment of the present application. The embodiment shown in fig. 7 is extended based on the embodiment shown in fig. 5, and the differences between the embodiment shown in fig. 7 and the embodiment shown in fig. 5 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 7, the step of determining the correction result corresponding to each of the chinese characters in the clause based on at least one expanded word corresponding to each of the chinese characters in the clause includes the following steps.
Step 4131, determining whether the Chinese character is incorrect for each Chinese character in the clause based on the at least one expanded word corresponding to the Chinese character.
Following the example in FIG. 6, for the first position of the Chinese character "day", the corresponding satisfactory expanded words result in "today" and "weather". Taking the expanded word "today" as an example, if the error detection model detects that the previous position of the Chinese character "day" at the first position has no Chinese character "present", the Chinese character "day" at the first position is judged to have errors. Taking the expanded word "weather" as an example, if the error detection model detects that the position behind the Chinese character "day" at the first position is not "qi", it is determined that the Chinese character "day" at the first position has an error.
In practical applications, if the determination result in the step 4131 is yes, that is, if it is determined that the chinese character is incorrect based on the at least one expanded word corresponding to the chinese character, the step 4132 is executed. If the determination result in the step 4131 is negative, that is, it is determined that the chinese character is correct based on the at least one expanded word corresponding to the chinese character, then step 4133 is performed.
Step 4132, determining a correction result corresponding to the Chinese character based on the at least one expanded word corresponding to the Chinese character.
Following the above example, taking the expanded word "today" as an example, the correction result corresponding to the Chinese character "day" at the first position is "day (missing) day". Taking the expansion word "weather" as an example, the correction result corresponding to the Chinese character "day" at the first position is "weather (redundancy).
Step 4133, using the Chinese character as the correction result corresponding to the Chinese character.
Specifically, for a Chinese character without errors, the Chinese character is directly used as a correction result.
Through the technical scheme in the embodiment, at least one expansion result corresponding to the Chinese character at each position with the wrong clause can be obtained, and the selectivity of a user is increased.
Fig. 8 is a schematic structural diagram illustrating the structure of the error correction model for correcting the clause with errors. As shown in FIG. 8, in some embodiments, the primary network structure of the error correction model is the transform-Encoder-Decoder.
Step 1, inputting the 'good natural weather' with the wrong clause into an error correction model, and obtaining coding information corresponding to the 'good natural weather' with the wrong clause through a Transformer-Encoder network structure in the error correction model.
And 2, inputting coding information corresponding to the incorrect clause of 'good weather and weather' into a Transformer-Decoder network structure, and outputting 'present (missing)' by the Transformer-Decoder network structure.
Step 3, inputting the 'missing' output by the transform-Decoder into the transform-Decoder again to obtain the 'sky' output by the next position "
And repeating the step 3 until the Transformer-Decoder outputs a statement ending mark "[ end ]".
Fig. 9 is a schematic flow chart illustrating a process of determining a corrected text corresponding to a text to be corrected by using an error correction model according to another embodiment of the present application. The embodiment shown in fig. 9 is extended based on the embodiment shown in fig. 4, and the differences between the embodiment shown in fig. 9 and the embodiment shown in fig. 4 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 9, the text error correction method in the present application further includes the following steps.
And 43, inputting the N clauses into the error detection model to obtain error position data corresponding to the N clauses.
Illustratively, following the example in fig. 7, for a wrong clause "good day weather", it is input into the error detection model, and the error position data output by the error detection model is: the first position of the Chinese character. It is to be understood that the representation manner of the error position data in the embodiment of the present application is only an example, and those skilled in the art may select an appropriate representation manner of the error position data according to a specific use case.
And step 45, generating correction mark information of the correction sentences corresponding to the N clauses based on the error position data corresponding to the N clauses and the correction sentences corresponding to the N clauses.
Specifically, according to the error position data, the correction mark information at the corresponding position is added to the corresponding correction text. Illustratively, according to the error position data, highlighting, underlining, font bolding or font color modification can be performed at the corresponding position of the corresponding corrected text, or font color modification can be performed to distinguish the Chinese characters at the unmodified position.
Through the technical scheme in the embodiment, the correction mark information is added at the correction position with the wrong clause, so that a user can conveniently determine the wrong position and reason when using the electronic book.
Fig. 10 is a schematic flowchart of a text error correction method according to another embodiment of the present application. The embodiment shown in fig. 10 is extended based on the embodiment shown in fig. 2, and the differences between the embodiment shown in fig. 10 and the embodiment shown in fig. 2 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 10, the method for determining whether the text to be corrected has errors by using the error detection model further includes the following steps.
Step 10, determining S training sets.
Each training set comprises a correct text sample, an error text sample corresponding to the correct text sample and an error detail label corresponding to the error text sample.
Specifically, a plurality of words are selected at random positions of the correct text sample, errors are added through operations of word increasing, word decreasing, replacing and the like, an error text sample is obtained, and error details are added to the error text sample and marked. Illustratively, the starting position of the error is marked as B, the middle position is marked as I, the end position is marked as E, and the position at the correct text is marked as O. If the error position is only one Chinese character, only marking the error initial position; if the error position is two words, marking the initial position and the end position of the error, and if the error position is more than or equal to three words, marking the initial position, the middle position and the end position of the error.
And step 20, training the generator and the discriminator based on the S training sets to obtain an error detection model and an error correction model.
Inputting error samples in each training set into a generator in the countermeasure generation network, outputting first correction samples corresponding to the error samples by the generator, inputting the first correction samples into a discriminator in the countermeasure generation network, outputting text correct probability corresponding to the first correction samples by the discriminator, and adjusting parameters of the generator according to the text correct probability until a training target is reached to obtain an error detection model and an error correction model.
By the technical scheme in the embodiment, the error detection model and the error correction model generated based on the countermeasure network can be obtained, different from the traditional error correction model obtained by end-to-end neural network training, the error samples obtained by the error detection model and the error correction model in the training process are closer to the error samples in the real situation, and therefore the corrected text obtained by the text error correction method in the application is more real and accurate.
Fig. 11 is a schematic flow chart illustrating a process of obtaining an error detection model and an error correction model based on S training sets and training generators and discriminators according to an embodiment of the present application. The embodiment shown in fig. 11 is extended based on the embodiment shown in fig. 10, and the differences between the embodiment shown in fig. 11 and the embodiment shown in fig. 10 will be emphasized below, and the descriptions of the same parts will not be repeated.
As shown in fig. 11, the steps of obtaining the error detection model and the error correction model based on the S training set training generators and discriminators include the following steps.
And 21, inputting S error text samples included in the S training sets into a generator to obtain error correction text samples corresponding to the S error text samples.
And step 22, inputting the error correction text sample corresponding to the error text sample and the correct text sample corresponding to the error text sample into a discriminator to obtain the text correct probability of the error correction text sample corresponding to the error text sample, and adjusting the parameters of the generator and/or the discriminator based on the text correct probabilities of the error correction text samples corresponding to the error text samples.
And step 23, repeatedly utilizing the S training sets for training until a preset training stopping condition is met, and obtaining an error detection model and an error correction model.
Specifically, the training stop condition of the generator is:
Figure BDA0003452596260000161
the training stop conditions of the discriminator are as follows:
Figure BDA0003452596260000162
wherein the content of the first and second substances,
Figure BDA0003452596260000163
representing erroneous text samples, x representing correct text samples, D representing a discriminator, G representing a generator,
Figure BDA0003452596260000164
meaning that x follows a normal distribution,
Figure BDA0003452596260000165
to represent
Figure BDA0003452596260000166
Obey a normal distribution.
When the countermeasure generation network training error detection model and the error correction model are used, the requirement on the training data set is low, a large amount of manually labeled data is not needed, and the training data sets are not required to come from real text errors. Unlike the traditional training process of the text correction model, if a large amount of real error sample data or manual marking data is not used, the accuracy of the text correction model is reduced. In the training process of the countermeasure generation network, the error detection model needs to discriminate the difference between the real text and the text generated by the error correction model, so that under the condition that the training data set is not close to the real, the error detection model and the error correction model can be trained more accurately.
In some embodiments, the specific process of text error correction using generation of error detection models and error correction models in a countermeasure network is as follows.
Setting a text to be corrected: "the institute considers that legal loan relation is protected by law. The three-dimensional original advertisement of the notice is proved to be a certificate by the borrowed notes provided by the original advertisement and the borrowed money provided by the witness Li IV and Wang V, so the relation of borrowing and lending between the original advertisement Zhao VI and the three-dimensional notice is confirmed by the court according to law. "
The text error correction system receives the text to be error corrected and performs sentence division on the text, and the sentence division result is as follows: the first sentence, "this institute believes that the legal loan relationship is protected by law. The 'and the second clause' are advised to have three-way original loan, and the borrowers provided by the original announcement and the borrowed proofs provided by the witness Li IV and Wang Wu are used as evidence, so the relation of borrowing and lending between the original announcement Zhao Liu and the advised three is confirmed by the court according to law. "
The first clause and the second clause are input into a text error correction model, the text error correction model judges that the first clause is correct and the second clause is wrong, and the wrong second clause, namely the three-way original announcement borrowing of the announced third clause, the borrowing provided by the original announcement and the borrowed certificate provided by the witness Li four and Wang five are proved to be evidence, so the borrowing relation between the original announcement Zhao six and the announced third clause is confirmed by the courts according to law. "input error correction model.
The error correction model receives the three-way original announcement borrowing of the second clause 'the announced piece', the borrowing proof provided by the original announcement and the witnesses from the Li four and the Wang five are evidences, and therefore the courtyard affirms according to law the borrowing relationship between the original announcement Zhao six and the announced piece three. "correct the error, output the corresponding correction sentence" three-way original announcement borrowing of the announced, there are the borrowed note that the original announcement provided and the borrowed evidence that witnesses Li four, Wang five issues prove to be the evidence, therefore, the relation of borrowing and lending between Zhao six and the three announced is confirmed by the law of the institute. "
The text correction system considers the 'home country without error' that the legal loan relation is protected by law. The corrected sentence corresponding to the error clause is advised to three-way original loan, the borrowed notes provided by the original announcement and the borrowed proofs provided by the witness Li IV and Wang V are evidence of the borrowed notes, so the courtyard affirms according to law the relationship between the original announcement Zhao and the three announced notes. "get the corrected text according to the order information recombination that there is wrong clause and no wrong clause respectively, and set up the modification position as the highlight, the text error correction system outputs the corrected text.
The text error correction method embodiment of the present application is described in detail above with reference to fig. 1 to 11, and the text error correction device embodiment of the present application is described in detail below with reference to fig. 12 and 13. It should be understood that the description of the embodiments of the text correction method and the text correction device correspond to each other, and therefore, reference may be made to the foregoing method embodiments for portions that are not described in detail.
Fig. 12 is a schematic structural diagram of a text error correction apparatus according to an embodiment of the present application. As shown in fig. 12, the text error correction apparatus provided in the embodiment of the present application includes:
the judging module 300 is configured to judge whether the text to be corrected has errors by using the error detection model;
and a corrected text determining module 400, configured to determine, when the text to be corrected has an error, a corrected text corresponding to the text to be corrected by using an error correction model, where the error detection model is obtained by training a discriminator in the countermeasure generation network, and the error correction model is obtained by training a generator in the countermeasure generation network.
In an embodiment of the present application, the determining module 300 is further configured to divide the text to be corrected to obtain M clauses corresponding to the text to be corrected, where M is a positive integer; determining detection results corresponding to the M clauses based on the M clauses by using an error detection model; and respectively judging whether the M clauses are wrong or not based on the detection results corresponding to the M clauses.
In an embodiment of the present application, the modified text determining module 400 is further configured to, if N clauses of the M clauses are incorrect, respectively determine, by using an error correction model, modified sentences corresponding to the N clauses, where N is a positive integer less than or equal to M; and recombining the corrected sentences corresponding to the N clauses and the error-free clauses in the M clauses based on the sequence information among the M clauses to generate corrected texts corresponding to the texts to be corrected.
In an embodiment of the present application, the modified text determining module 400 is further configured to, for each clause of the N clauses, determine a character feature vector corresponding to each of the chinese characters in the clause; determining at least one expansion word corresponding to each Chinese character in the clause based on the character characteristic vector corresponding to each Chinese character in the clause and semantic information of the clause by using an error correction model; determining correction results corresponding to the Chinese characters in the clauses based on at least one extension word corresponding to each Chinese character in the clauses by using an error correction model; and determining a corrected sentence corresponding to the clause based on the correction result corresponding to each Chinese character in the clause by using the error correction model.
In an embodiment of the present application, the modified text determining module 400 is further configured to determine, for each chinese character in the clause, use probability data corresponding to P candidate expanded words and P candidate expanded words corresponding to the chinese character, based on semantic information of the clause and a character feature vector corresponding to the chinese character, where P is a positive integer; and selecting at least one expansion word corresponding to the Chinese character from the P candidate expansion words based on a preset use probability threshold.
In an embodiment of the present application, the modified text determining module 400 is further configured to determine, for each chinese character in the clause, whether the chinese character is incorrect based on at least one expanded word corresponding to the chinese character; if the Chinese character is wrong, determining a correction result corresponding to the Chinese character based on at least one expansion word corresponding to the Chinese character; and if the Chinese character is correct, taking the Chinese character as a correction result corresponding to the Chinese character.
In an embodiment of the present application, the modified text determining module 400 is further configured to input the N clauses into an error detection model, and generate error position data corresponding to each of the N clauses; and generating correction mark information of correction sentences corresponding to the N clauses based on the error position data corresponding to the N clauses and the correction sentences corresponding to the N clauses.
Fig. 13 is a schematic structural diagram of a text error correction apparatus according to another embodiment of the present application. As shown in fig. 13, the text correction apparatus further includes:
a training set determining module 100, configured to determine S training sets, where a training set includes a correct text sample, an incorrect text sample corresponding to the correct text sample, and an error detail label corresponding to the incorrect text sample.
The model training module 200 is configured to train generators and discriminators in the countermeasure generation network based on the S training sets to obtain an error detection model and an error correction model.
In an embodiment of the present application, the model training module 200 is further configured to input S error text samples included in the S training sets into the generator, so as to obtain error correction text samples corresponding to the S error text samples; inputting an error correction text sample and a correct text sample corresponding to the error text sample into a discriminator to obtain the text correct probability of the error correction text sample corresponding to the error text sample, and adjusting the parameters of a generator and/or the discriminator based on the text correct probabilities of the error correction text samples corresponding to the error text samples; and repeatedly utilizing the S training sets for training until a preset training stopping condition is met, and obtaining an error detection model and the error correction model.
It should be understood that the operations and functions of the training set determining module 100, the model training module 200, the judging module 300 and the modified text determining module 400 in the text error correction apparatus provided in fig. 12 and 13 may refer to the text error correction method provided in fig. 2 to 11, and are not repeated herein to avoid repetition.
Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 14. Fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
As shown in fig. 14, the electronic device 140 includes one or more processors 1401 and memory 1402.
The processor 1401 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 140 to perform desired functions.
Memory 1402 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 1401 to implement the text error correction methods of the various embodiments of the present application described above and/or other desired functions. Various contents such as including a corrected text may also be stored in the computer-readable storage medium.
In one example, the electronic device 140 may further include: an input device 1403 and an output device 1404, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 1403 may include, for example, a keyboard, a mouse, and the like.
The output device 1404 may output various information including a correction text, correction flag information, correction type information, and the like to the outside. The output devices 1404 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device 140 relevant to the present application are shown in fig. 14, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 140 may include any other suitable components depending on the particular application.
In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the text correction method according to various embodiments of the present application described above in this specification.
The computer program product may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages, for carrying out operations according to embodiments of the present application. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer readable storage medium having stored thereon computer program instructions, which, when executed by a processor, cause the processor to perform the steps in the text error correction method according to various embodiments of the present application described above in the present specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (11)

1. A text correction method is applied to a text correction system based on a countermeasure generation network, and the method comprises the following steps:
judging whether the text to be corrected is wrong or not by using an error detection model;
and if the text to be corrected is wrong, determining a corrected text corresponding to the text to be corrected by using an error correction model, wherein the error detection model is obtained by training a discriminator in the countermeasure generation network, and the error correction model is obtained by training a generator in the countermeasure generation network.
2. The method of claim 1, wherein the determining whether the text to be corrected has errors by using the error detection model comprises:
dividing the text to be corrected to obtain M clauses corresponding to the text to be corrected, wherein M is a positive integer;
inputting the M clauses into the error detection model to obtain detection results corresponding to the M clauses;
respectively judging whether the M clauses are wrong or not based on the detection results corresponding to the M clauses;
if the text to be corrected is wrong, determining a corrected text corresponding to the text to be corrected by using an error correction model, wherein the step of determining the corrected text corresponding to the text to be corrected comprises the following steps:
if N clauses in the M clauses are wrong, respectively determining correction sentences corresponding to the N clauses by using the error correction model, wherein N is a positive integer less than or equal to M;
and recombining the corrected sentences corresponding to the N clauses and the error-free clauses in the M clauses based on the sequence information among the M clauses to generate the corrected text.
3. The text error correction method according to claim 2, wherein the text to be corrected includes a chinese text to be corrected, and the determining, by using the error correction model, the corrected sentences corresponding to the N clauses respectively includes:
for each of the N clauses,
determining character characteristic vectors corresponding to the Chinese characters in the clauses;
determining at least one expansion word corresponding to each Chinese character in the clause based on the character feature vector corresponding to each Chinese character in the clause and the semantic information of the clause by using the error correction model;
determining correction results corresponding to the Chinese characters in the clauses based on at least one extension word corresponding to each Chinese character in the clauses by using the error correction model;
and determining a corrected sentence corresponding to the clause based on the correction result corresponding to each Chinese character in the clause by using the error correction model.
4. The text error correction method of claim 3, wherein the determining at least one expansion word corresponding to each Chinese character in the clause based on the character feature vector corresponding to each Chinese character in the clause and the semantic information of the clause comprises:
for each chinese character in the clause,
determining P candidate expansion words corresponding to the Chinese characters and use probability data corresponding to the P candidate expansion words respectively based on the semantic information of the clauses and the character feature vectors corresponding to the Chinese characters, wherein P is a positive integer;
and selecting the at least one expansion word corresponding to the Chinese character from the P candidate expansion words based on a preset use probability threshold and the use probability data corresponding to the P candidate expansion words respectively.
5. The method for correcting text errors according to claim 3 or 4, wherein the determining the correction result corresponding to each of the Chinese characters in the clause based on at least one expansion word corresponding to each of the Chinese characters in the clause comprises:
for each chinese character in the clause,
determining whether the Chinese character is wrong or not based on at least one expansion word corresponding to the Chinese character;
if the Chinese character is wrong, determining a correction result corresponding to the Chinese character based on at least one expansion word corresponding to the Chinese character;
and if the Chinese character is correct, taking the Chinese character as a correction result corresponding to the Chinese character.
6. The text correction method according to any one of claims 2 to 4, further comprising:
inputting the N clauses into the error detection model to obtain error position data corresponding to the N clauses;
and generating correction mark information of correction sentences corresponding to the N clauses based on the error position data corresponding to the N clauses and the correction sentences corresponding to the N clauses.
7. The method according to any one of claims 1 to 4, wherein before the determining whether the text to be corrected has errors by using the error detection model, the method further comprises:
determining S training sets, wherein the training sets comprise correct text samples, error text samples corresponding to the correct text samples and error detail labels corresponding to the error text samples;
training the generator and the discriminator based on the S training sets to obtain the error detection model and the error correction model.
8. The method of claim 7, wherein training the generator and the discriminator based on the S training sets to obtain the error detection model and the error correction model comprises:
inputting S error text samples included in the S training sets into the generator to obtain error correction text samples corresponding to the S error text samples;
for each error text sample in the S error text samples, inputting an error correction text sample corresponding to the error text sample and a correct text sample corresponding to the error text sample into the discriminator to obtain a text correct probability of the error correction text sample corresponding to the error text sample, and adjusting parameters of the generator and/or the discriminator based on the text correct probabilities of the error correction text samples corresponding to the error text samples;
and repeatedly utilizing the S training sets for training until a preset training stopping condition is met, and obtaining the error detection model and the error correction model.
9. A text correction apparatus applied to a text correction system for generating a network based on a countermeasure, the apparatus comprising:
the judging module is used for judging whether the text to be corrected has errors by using the error detection model;
and the corrected text determining module is used for determining a corrected text corresponding to the text to be corrected by using an error correction model when the text to be corrected has an error, wherein the error detection model is obtained by training a discriminator in the countermeasure generation network, and the error correction model is obtained by training a generator in the countermeasure generation network.
10. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the text correction method according to any one of the preceding claims 1 to 8.
11. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing the processor-executable instructions;
the processor, configured to execute the text error correction method according to any one of claims 1 to 8.
CN202111677576.8A 2021-12-31 2021-12-31 Text error correction method and device, storage medium and electronic equipment Pending CN114492453A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111677576.8A CN114492453A (en) 2021-12-31 2021-12-31 Text error correction method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111677576.8A CN114492453A (en) 2021-12-31 2021-12-31 Text error correction method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114492453A true CN114492453A (en) 2022-05-13

Family

ID=81509437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111677576.8A Pending CN114492453A (en) 2021-12-31 2021-12-31 Text error correction method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114492453A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116484811A (en) * 2023-06-16 2023-07-25 北京语言大学 Text revising method and device for multiple editing intents

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116484811A (en) * 2023-06-16 2023-07-25 北京语言大学 Text revising method and device for multiple editing intents
CN116484811B (en) * 2023-06-16 2023-09-19 北京语言大学 Text revising method and device for multiple editing intents

Similar Documents

Publication Publication Date Title
CN110110041B (en) Wrong word correcting method, wrong word correcting device, computer device and storage medium
Bakharia Towards cross-domain MOOC forum post classification
CN112070138B (en) Construction method of multi-label mixed classification model, news classification method and system
CN111310447A (en) Grammar error correction method, grammar error correction device, electronic equipment and storage medium
KR102143745B1 (en) Method and system for error correction of korean using vector based on syllable
CN115357719B (en) Power audit text classification method and device based on improved BERT model
CN113672731B (en) Emotion analysis method, device, equipment and storage medium based on field information
CN113657098B (en) Text error correction method, device, equipment and storage medium
CN110929524A (en) Data screening method, device, equipment and computer readable storage medium
US20190180641A1 (en) System and Method for Draft-Contemporaneous Essay Evaluating and Interactive Editing
CN116306600B (en) MacBert-based Chinese text error correction method
CN111611791B (en) Text processing method and related device
CN107797981B (en) Target text recognition method and device
CN111401012A (en) Text error correction method, electronic device and computer readable storage medium
CN114492453A (en) Text error correction method and device, storage medium and electronic equipment
CN113312918B (en) Word segmentation and capsule network law named entity identification method fusing radical vectors
CN107783958B (en) Target statement identification method and device
CN117009223A (en) Software testing method, system, storage medium and terminal based on abstract grammar
CN116187304A (en) Automatic text error correction algorithm and system based on improved BERT
CN111090970A (en) Text standardization processing method after speech recognition
CN113779199B (en) Method, apparatus, device and medium for consistency detection of documents and summaries
CN116246278A (en) Character recognition method and device, storage medium and electronic equipment
CN115713082A (en) Named entity identification method, device, equipment and storage medium
CN115146644A (en) Multi-feature fusion named entity identification method for warning situation text
CN114298032A (en) Text punctuation detection method, computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 311-2, 3rd Floor, Building 5, East District, No. 10, Northwest Wangdong Road, Haidian District, Beijing 100089

Applicant after: iFLYTEK (Beijing) Co.,Ltd.

Applicant after: Hebei Xunfei Institute of Artificial Intelligence

Applicant after: IFLYTEK Co.,Ltd.

Address before: Room 311-2, third floor, building 5, East District, yard 10, northwest Wangdong Road, Haidian District, Beijing 100089

Applicant before: Zhongke Xunfei Internet (Beijing) Information Technology Co.,Ltd.

Applicant before: Hebei Xunfei Institute of Artificial Intelligence

Applicant before: IFLYTEK Co.,Ltd.

CB02 Change of applicant information