CN115630645B

CN115630645B - Text error correction method, text error correction device, electronic equipment and medium

Info

Publication number: CN115630645B
Application number: CN202211552883.8A
Authority: CN
Inventors: 张乐平; 李文举; 侯磊; 支蕴倩
Original assignee: Beijing Deepctrl Co ltd
Current assignee: Beijing Deepctrl Co ltd
Priority date: 2022-12-06
Filing date: 2022-12-06
Publication date: 2023-04-07
Anticipated expiration: 2042-12-06
Also published as: CN115630645A

Abstract

The application provides a text error correction method, a text error correction device, electronic equipment and a medium, wherein the method comprises the following steps: inputting a target sentence in a target business field into a trained proprietary phrase error correction model, and judging whether a proprietary phrase exists in the target sentence or not through the proprietary phrase error correction model; if the target sentence does not exist, determining that the error correction result is that no proprietary phrase exists in the target sentence; if the target sentence exists, determining a proprietary phrase tag corresponding to the target sentence, and determining a target segment from the target sentence according to the proprietary phrase tag; when the target segment is judged to be the same as the exclusive phrase label, determining that the error correction result is that the exclusive phrase in the target sentence is correct; when the target segment is judged to be inconsistent with the proprietary phrase tag, determining that the proprietary phrase in the target sentence is wrong, and determining that an error correction result is the proprietary phrase tag and the target segment; the method and the device consider the sentence semantics to identify whether the proprietary phrase of the sentence is wrong or not, and have high accuracy and small calculation amount.

Description

Text error correction method, text error correction device, electronic equipment and medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a text error correction method, apparatus, electronic device, and medium.

Background

In the text content examination, the proprietary phrase refers to a phrase which must strictly follow the prescribed expression and does not allow any modification. For example, in the medical field, a special expression "Raynaud's quintet" is used for expressing the disease of a patient, and the expression "Raynaud's quintet" cannot be written. For example, in the legal field, the expression of the third person of goodwill cannot be written as "the third person of goodwill is to three persons" and "the third person of goodwill", meaning expressed by profit and profit are completely different; errors such as wrongly written characters, multi-written characters, few written characters, harmonic characters and the like in the special phrase can cause defects in text contents, and seriously even change the meaning of the text contents.

The existing error correction method in text content examination is to construct a proprietary phrase dictionary, divide a sentence to be detected into segments with different lengths, calculate an edit distance between each segment and each proprietary phrase in the proprietary phrase dictionary, and consider that the sentence to be detected uses the proprietary phrase incorrectly when the edit distance is smaller than a certain threshold value and the proprietary phrase is not in the sentence to be detected. The method is completely based on edit distance matching, and does not consider sentence semantics; meanwhile, in the detection process, the edit distance of each segment and each proprietary phrase in the proprietary phrase dictionary needs to be calculated, and the calculation amount is large.

Disclosure of Invention

In view of the above, an object of the present application is to provide a text error correction method, apparatus, electronic device, and medium, which consider the sentence semantics to identify whether a proprietary phrase in the sentence is incorrect, and at the same time, do not need to calculate the edit distance between each segment and each proprietary phrase in the proprietary phrase dictionary, and have a small calculation amount.

The text error correction method provided by the embodiment of the application comprises the following steps:

inputting a target sentence in a target business field into a trained proprietary phrase error correction model, and judging whether a proprietary phrase exists in the target sentence or not through the proprietary phrase error correction model; the proprietary phrase error correction model is obtained by training a sample sentence carrying a proprietary phrase based on a target business field;

if the target sentence does not exist, determining that the error correction result is that no proprietary phrase exists in the target sentence;

if the target sentence exists, determining a proprietary phrase tag corresponding to the target sentence, and determining a target segment from the target sentence according to the proprietary phrase tag;

when the target segment is judged to be the same as the exclusive phrase label, determining that the error correction result is that the exclusive phrase in the target sentence is correct;

when the target segment is judged to be inconsistent with the proprietary phrase tag, determining that the proprietary phrase in the target sentence is wrong, and determining that an error correction result is the proprietary phrase tag and the target segment; wherein the proprietary phrase tags characterize the correct proprietary phrase corresponding to the wrong target segment.

In some embodiments, in the text error correction method, inputting a target sentence in a target business domain into a trained proprietary phrase error correction model, and determining whether a proprietary phrase exists in the target sentence through the proprietary phrase error correction model includes:

inputting a target sentence in a target business field into a trained special phrase error correction model, analyzing the semantics of the target sentence through a classification module in the special phrase error correction model, and identifying a classification label of the target sentence; wherein the category labels include a proprietary phrase label and an absence of a proprietary phrase label;

when the classification tag of the target sentence is that no proprietary phrase tag exists, judging that no proprietary phrase exists in the target sentence;

and when the classification tag of the target sentence is the proprietary phrase tag, judging that the proprietary phrase exists in the target sentence.

In some embodiments, the text error correction method, wherein determining a target segment from the target sentence according to the proprietary phrase tag, includes:

dividing the target sentence into a plurality of segments based on the proprietary phrase tags;

and comparing the editing distance of each segment in the proprietary phrase tag and the target sentence to determine the target segment matched with the proprietary phrase tag.

In some embodiments, before inputting a target sentence in a target business domain into a trained proprietary phrase correction model, the text correction method further includes:

inputting a target text in a target service field into a preprocessing module, and cleaning the target text through the preprocessing module to obtain a cleaned target text;

and dividing the washed target text into at least one target sentence.

In some embodiments, in the text error correction method, the classification module in the proprietary phrase error correction model is trained based on the following training methods:

constructing a training sample set based on the target business field; the training sample set comprises sample sentences carrying special phrases, and each sample sentence corresponds to a classification label;

and inputting the sample sentences in the training sample set into a classification module in a proprietary phrase error correction model, and identifying classification labels of the sample sentences so as to train the proprietary phrase error correction model until a training end condition is met.

In some embodiments, in the text error correction method, the constructing a training sample based on a target business domain includes:

acquiring a text data source, and performing sentence division on a sample text in the text data source to obtain a sample sentence set; the set of sample sentences includes sample sentences;

screening a first sample sentence carrying a proprietary phrase and a second sample sentence not carrying the proprietary phrase from the sample sentence set;

determining that the classification label of the first sample sentence is a proprietary phrase carried by the sample sentence, and determining that the classification label of the second sample sentence is not present with the proprietary phrase;

updating the sample sentence subset for the first time to obtain a first updated sample sentence subset;

modifying the proprietary phrases in the first sample sentence according to a preset modification rule, and amplifying a plurality of first sample sentences which carry wrong proprietary phrases and are classified and labeled as the proprietary phrases;

and updating the sample sentence set for the second time to obtain a training sample set.

In some embodiments, the text error correction method, modifying the proprietary phrase in the first sample sentence according to a preset modification rule, includes:

modifying the proprietary phrases of the first sample sentence in multiple types to determine multiple types of wrong proprietary phrases;

wherein the multiple types of modifications include: adding, deleting, replacing phonetic characters and replacing shape characters.

There is also provided in some embodiments a text correction apparatus, the apparatus comprising:

the judging module is used for inputting a target sentence in a target service field into a trained proprietary phrase error correction model and judging whether a proprietary phrase exists in the target sentence or not through the proprietary phrase error correction model; the proprietary phrase error correction model is obtained by training a sample sentence carrying a proprietary phrase based on a target business field;

the first determining module is used for determining that the target sentence has no proprietary phrase as an error correction result when judging that the target sentence has no proprietary phrase;

the second determining module is used for determining a proprietary phrase tag corresponding to the target sentence when the proprietary phrase is judged to exist in the target sentence, and determining a target segment from the target sentence according to the proprietary phrase tag;

a third determining module, configured to determine that the error correction result is that the exclusive phrase in the target sentence is correct when it is determined that the target segment is the same as the exclusive phrase tag;

the fourth determining module is used for determining the proprietary phrase error in the target sentence when the target segment is judged to be inconsistent with the proprietary phrase tag, and determining the error correction result as the proprietary phrase tag and the target segment; wherein the proprietary phrase tags characterize the correct proprietary phrase corresponding to the wrong target segment.

There is also provided in some embodiments an electronic device comprising: a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory communicate via the bus when the electronic device is operating, and the machine-readable instructions, when executed by the processor, perform the steps of the text error correction method.

There is also provided in some embodiments a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the text correction method.

The implementation of the application provides a text error correction method, a text error correction device, electronic equipment and a medium, a target sentence in a target business field is input into a trained special phrase error correction model, the semantics of the target sentence is analyzed through the special phrase error correction model, and whether a special phrase exists in the target sentence is judged; if so, outputting a proprietary phrase tag corresponding to the target sentence based on the semantics of the sentence, determining a target segment from the target sentence according to the proprietary phrase tag, judging whether the proprietary phrase is wrong or not according to the target segment, and if so, outputting the wrong proprietary phrase and the correct proprietary phrase at the same time, so that error correction is facilitated; in the error correction method, the existence of the proprietary phrase is analyzed based on the sentence semantics, and the target proprietary phrase corresponding to the sentence is identified, so that the accuracy is higher and the calculation amount is less; meanwhile, the wrong proprietary phrase in the sentence is accurately identified based on the target proprietary phrase, and the correct and wrong proprietary phrases are output simultaneously, so that the modification by a user is facilitated.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 is a flowchart illustrating a method for text error correction according to an embodiment of the present application;

FIG. 2 illustrates an embodiment of the present application, in which a target sentence in a target business domain is input into a trained proprietary phrase error correction model, and whether a proprietary phrase exists in the target sentence is determined by the proprietary phrase error correction model;

FIG. 3 is a schematic diagram illustrating a method for determining a target segment from the target sentence according to the proprietary phrase tags and determining a target segment from the target sentence according to the proprietary phrase tags according to an embodiment of the present application;

FIG. 4 is a flow chart of another method for text correction according to an embodiment of the present application;

FIG. 5 is a diagram illustrating a training method of a classification module in the proprietary phrase error correction model according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram illustrating a text error correction apparatus according to an embodiment of the present application;

fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are only for illustration and description purposes and are not used to limit the protection scope of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.

In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the term "comprising" will be used in the embodiments of the present application to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

In the text content examination, the exclusive phrases refer to phrases which must be expressed strictly in a prescribed manner without any permission for modification. For example, in the medical field, a special expression "Raynaud's quintuplet syndrome" is used for expressing the disease of a patient, and the special expression cannot be written as the "Raynaud's quintuplet syndrome". For example, in the legal field, expressing the third person with good will, the third person with good will cannot write "the third person with good will", the mistakes of wrongly written characters, multi-written characters, few written characters, harmonic characters, etc. in the proprietary phrase will all cause the text content to have flaws, and the meaning of the text content will be seriously changed even; for example, the meaning of profit and profit is completely different, profit: the method refers to subjectively earning profits, but objectively possibly not earning profits; profit: the profit is objectively obtained, namely the profit is objectively obtained after the cost is objectively deducted; in the presentation, for example, for profit, the difference is large when writing for profit, and the error is easily not detected by the inertial thinking.

Based on the above, the present application provides a text error correction method, apparatus, electronic device, and medium, where a target sentence in a target business field is input into a trained proprietary phrase error correction model, and the semantic meaning of the target sentence is analyzed by the proprietary phrase error correction model, so as to determine whether a proprietary phrase exists in the target sentence; if the target sentence exists, outputting a proprietary phrase label corresponding to the target sentence based on the semantics of the sentence, determining a target segment from the target sentence according to the proprietary phrase label, judging whether the proprietary phrase is wrong or not according to the target segment, and if the proprietary phrase is wrong, outputting the wrong proprietary phrase and the correct proprietary phrase at the same time, so that error correction is facilitated; in the error correction method, the existence of the proprietary phrases is analyzed based on sentence semantics, and the target proprietary phrases corresponding to the sentences are identified, so that the accuracy is higher and the calculated amount is less; meanwhile, the wrong proprietary phrase in the sentence is accurately identified based on the target proprietary phrase, and the correct and wrong proprietary phrases are output simultaneously, so that the modification by a user is facilitated.

For example, the prior art is based entirely on edit distance matching, and does not consider sentence semantics; for example, "the patient was found to be Raynaud's syndrome after examination. The existing detection technology judges according to the distance of the character string completely, and cannot judge that the sentence is a medically related topic from the semantic perspective of the sentence. In the prior art, the detection needs to calculate the editing distance for many times, and the calculation amount is large. Still taking the above sentence as an example, in order to detect the wrong use of a specific phrase, it is not known whether there is a wrong use medical special expression for the sentence, and the possible location of the error. Therefore, the edit distances of the plurality of ngram segments in the sentence and the correct expressions of the Reynolds quintuplet syndrome, the Down syndrome and the like in the dictionary of the proprietary phrase need to be calculated, such as the 'detected distance', the 'patient is the thunder' and the 'Reynolds quintuplet syndrome', and the like, and the edit distances are respectively calculated with each proprietary phrase, so that the calculation amount is large.

Referring to fig. 1, fig. 1 shows a flowchart of a method of text error correction according to an embodiment of the present application, specifically, the method includes the following steps S101 to S105;

s101, inputting a target sentence in a target service field into a trained special phrase error correction model, and judging whether a special phrase exists in the target sentence or not through the special phrase error correction model; the proprietary phrase error correction model is obtained by training sample sentences which carry proprietary phrases and are based on a target service field;

s102, if the target sentence does not exist, determining that the error correction result is that no proprietary phrase exists in the target sentence;

s103, if the target sentence exists, determining a proprietary phrase tag corresponding to the target sentence, and determining a target segment from the target sentence according to the proprietary phrase tag;

s104, when the target segment is judged to be the same as the proprietary phrase tag, determining that the error correction result is that the proprietary phrase in the target sentence is correct;

s105, when the target segment is judged to be inconsistent with the exclusive phrase tag, determining an exclusive phrase error in the target sentence, and determining an error correction result as the exclusive phrase tag and the target segment; wherein the proprietary phrase tags characterize the correct proprietary phrase corresponding to the wrong target segment.

The implementation of the application provides a text error correction method, a target sentence in a target business field is input into a trained special phrase error correction model, the semantics of the target sentence is analyzed through the special phrase error correction model, and whether a special phrase exists in the target sentence is judged; if so, outputting a proprietary phrase tag corresponding to the target sentence based on the semantics of the sentence, determining a target segment from the target sentence according to the proprietary phrase tag, judging whether the proprietary phrase is wrong or not according to the target segment, and if so, outputting the wrong proprietary phrase and the correct proprietary phrase at the same time, so that error correction is facilitated; in the error correction method, the existence of the proprietary phrases is analyzed based on sentence semantics, and the target proprietary phrases corresponding to the sentences are identified, so that the accuracy is higher and the calculated amount is less; meanwhile, the wrong proprietary phrase in the sentence is accurately identified based on the target proprietary phrase, and the correct and wrong proprietary phrases are output simultaneously, so that the modification by a user is facilitated.

In step S101, a target sentence in a target business domain is input to a trained proprietary phrase error correction model, and whether a proprietary phrase exists in the target sentence is determined by the proprietary phrase error correction model.

Specifically, referring to fig. 2, the step of inputting a target sentence in a target business field into a trained proprietary phrase error correction model, and determining whether a proprietary phrase exists in the target sentence through the proprietary phrase error correction model includes the following steps S201 to S203;

s201, inputting a target sentence in a target business field into a trained special phrase error correction model, analyzing the semantics of the target sentence through a classification module in the special phrase error correction model, and identifying a classification label of the target sentence; wherein the category labels include a proprietary phrase label and an absence of a proprietary phrase label;

s202, when the classification tag of the target sentence is that no proprietary phrase tag exists, judging that no proprietary phrase exists in the target sentence;

s203, when the classification label of the target sentence is a proprietary phrase label, judging that the proprietary phrase exists in the target sentence.

Specifically, inputting a target sentence in a target business field into a trained proprietary phrase error correction model, analyzing the semantics of the target sentence through a classification module in the proprietary phrase error correction model, and identifying a classification tag of the target sentence, including: inputting a target sentence in a target business field into a trained special phrase error correction model, converting the target sentence into a target semantic vector through a classification module in the special phrase error correction model, processing the target semantic vector, and identifying a classification label of the target sentence.

Here, the classification module in the proprietary phrase error correction model adopts a textCNN model, and the structure of the classification module includes: an embedding layer, a convolution layer, a pooling layer, a full link layer, and a softmax layer. textCNN is a model for text classification using convolutional neural networks. The embedding layer represents the sentence input into the classification model as a text vector and inputs the text vector of the sentence into the convolution layer. Convolutional layers, which may also be referred to as convolutional neural networks. The convolutional neural network can well extract the characteristics of the text vector, thereby extracting the local segment characteristics of the text, and the semantic characteristics of the whole sentence can be extracted by superposing a plurality of layers of convolutional networks. In addition, the convolution operation is very efficient, the parallelism of the GPU can be effectively utilized, and the reasoning speed is very high.

In the embodiment of the application, the proprietary phrase error correction model is obtained by training sample sentences which carry proprietary phrases and are based on the target service field; the sample sentences carrying the proprietary phrases comprise sample sentences carrying correct proprietary phrases and a plurality of sample sentences carrying wrong proprietary phrases; the sample sentences of the plurality of false proprietary phrases are different. That is, the plurality of sample sentences of the incorrect exclusive phrase differ in the incorrect exclusive phrase.

Analyzing the semantics of the target sentence through a classification module in the proprietary phrase error correction model, and identifying a classification label of the target sentence, wherein the classification label of the target sentence comprises the proprietary phrase label and the non-proprietary phrase label; for K proprietary phrases in the target field, the classification module has K +1 tags, that is, the classification tags of the target sentence have K +1 categories; one of the tags is that no proprietary phrase exists, and the other K tags are proprietary phrases.

Here, the classification module may output a specific category label, and may also output a category label number, for example, when it is recognized that there is no proprietary phrase label, output a number K +1 where there is no proprietary phrase label; and when a proprietary phrase tag, a third good-minded person, is identified, outputting a number J without the proprietary phrase tag, wherein the value of the number J is 1-K.

That is to say, the target sentence is judged whether to have the exclusive phrase or not through the exclusive phrase error correction model, that is, the target sentence is classified to obtain the classification tag of the target sentence, and whether to have the exclusive phrase or not is determined according to the classification tag of the target sentence.

For example, assuming that the target domain has only one proprietary phrase, "Reynolds quintet", the trained textCNN model (classifier) is a binary class model. If the sentence "through the indexes of the patient, it can be seen that the patient belongs to the Reynolds quintuplet" is input, the textCNN model output is (0.91, 0.09), the probability that the Reynolds quintuplet exists in the sentence is 0.91, the probability that the proprietary phrase does not exist is 0,09, and the existence of the proprietary phrase "Reynolds quintuplet" in the sentence is determined.

In the embodiment of the application, whether the sentence has the special phrase or not and the corresponding special phrase label can be more accurately identified by analyzing the sentence semantics; because the proprietary phrases in the professional domain often have specific contexts when used; for example, the establishment, alteration, transfer, and elimination of property rights for ships, aircraft, and automobiles, etc., are unregistered and are not confronted with a good third; in the process of recognizing sentence categories by the textCNN model, when "establishment, alteration, transfer, and elimination of property rights of ships, aircraft, vehicles, and the like, unregistered, and impaireable persons are recognized", establishment, alteration, transfer, and elimination of property rights of ships, aircraft, vehicles, and the like, unregistered, and impaireable persons later, depending on the context, should be a third person of good will with a high probability. Therefore, even if the input sentence is "even good candidates" and "third good candidates", it can be recognized that the input target sentence has the category label of "third good candidates".

In step S102, if the target sentence does not exist, it is determined that the target sentence does not have the proprietary phrase as a result of the error correction.

Specifically, when the classification tag of the target sentence is that no proprietary phrase tag exists, it is determined that no proprietary phrase exists in the target sentence, and an error correction result is output as that no proprietary phrase exists in the target sentence.

In practice, the output error correction result is not necessarily "no proprietary phrase", and may be "no", or "no proprietary phrase" corresponding to the number K +1, and so on; even an error correction result such as "sentence normal" can be output.

In step S103, a specific phrase tag corresponding to the target sentence is determined, and a target segment is determined from the target sentence according to the specific phrase tag.

Here, after inputting the target sentence to the proper phrase error correction model, the classification module of the proper phrase error correction model outputs the proper phrase tags of the sentence. For example, "by patient indices, it can be seen that the patient belongs to raynaud quintuplet" and the corresponding proprietary phrase label is "raynaud quintuplet".

Referring to fig. 3, determining a target segment from the target sentence according to the proprietary phrase tags, and determining a target segment from the target sentence according to the proprietary phrase tags, includes the following steps S301 to S302;

s301, dividing the target sentence into a plurality of segments based on the proprietary phrase tags;

s302, comparing the proprietary phrase tags with the editing distance of each segment in the target sentence, and determining the target segment matched with the proprietary phrase tags.

Here, the target sentence is divided into a plurality of segments based on the proprietary phrase tags, in various ways.

In some embodiments, the positions of the target segments are quickly located based on matching of the words and the words in the proprietary phrase tags with the words and the words in the target sentence, so that the target segments are divided into a plurality of segments at the positions of the target segments, and the number of the segments divided from the target sentence is greatly reduced.

For example, the proprietary phrase tag-the third person of goodwill, from "the establishment, alteration, transfer and elimination of property rights for ships, aircraft, automobiles, and the like, unregistered, and not just a person of goodwill"; according to the "good will" in the "third person good will", quickly matching to the "good will" in the sentence, thereby dividing a plurality of fragments such as "impartial", "resist good will", "good person", "impartial", "resist good will", "person with good will", etc.; the number of segments into which the target sentence is divided is greatly reduced.

After the fragments are divided, only the editing distance between the proprietary phrase tags and each fragment in the target sentence is compared, and when the target fragment matched with the proprietary phrase tags can be determined, the editing distance between the divided fragments and each proprietary phrase in the proprietary phrase dictionary does not need to be calculated, so that the calculation amount is greatly reduced. For example, the "person who should not resist", "who should be taken", "who should not resist", "who should be taken" only needs to calculate the edit distance to the "third person who should be taken", and does not need to calculate the edit distance to "who should be taken", and the like, and the calculation amount is greatly reduced.

And comparing the editing distance of each segment in the proprietary phrase tags and the target sentence, and determining the segment with the minimum editing distance as the target segment matched with the proprietary phrase tags. For example, a "good person" is determined to be a target fragment of a "third good person" and "raynaud quintet" is a target fragment of "raynaud quintet".

And step S104, when the target segment is judged to be the same as the proprietary phrase tag, determining that the error correction result is that the proprietary phrase in the target sentence is correct.

That is, when the target segment is identical to the proprietary phrase tag, for example, the marked-out "third person with good will" is identical to the classification tag "third person with good will", it can be determined that the proprietary phrase in the target sentence is correct.

And determining that the error correction result is that the exclusive phrase in the target sentence is correct, wherein the 'correct' and 'normal' can be output without outputting the 'correct' and the like.

In the step S105, when the target segment is judged to be inconsistent with the exclusive phrase tag, determining that the exclusive phrase in the target sentence is incorrect, and determining that the error correction result is the exclusive phrase tag and the target segment; wherein the proprietary phrase tags characterize the correct proprietary phrase corresponding to the wrong target segment.

When the proprietary phrase in the target sentence is judged to be wrong, only the result of 'the proprietary phrase is wrong' is output, and the trouble of workers to check where the proprietary phrase is wrong again and how to correct the proprietary phrase is also caused; in the embodiment of the application, the error correction result is determined to be the proprietary phrase tag and the target segment, and the proprietary phrase tag and the target segment are output at the same time, the proprietary phrase tag represents the correct proprietary phrase corresponding to the wrong target segment, and a worker can quickly determine which proprietary vocabulary is wrong and how to modify the proprietary phrase. For example, if a "good-minded person" is a target fragment of a "good-minded third person" and the "good-minded third person" are output at the same time, it can be quickly seen that the good-minded person should be modified to be the "good-minded third person"; and simultaneously outputting the Reynolds quintuplet syndrome and the Reynolds quintuplet syndrome, and quickly determining that the Reynolds quintuplet syndrome is to be modified into the Reynolds quintuplet syndrome.

When the error correction result is output, the error correction result can be output according to a preset template, for example, the output error correction result is a "good person" error, and a modification to a "good third person" is suggested.

Referring to fig. 4, in the embodiment of the present application, before inputting a target sentence in a target business domain into a trained proprietary phrase error correction model, the method further includes the following steps S401-S402;

s401, inputting a target text in a target service field into a preprocessing module, and cleaning the target text through the preprocessing module to obtain a cleaned target text;

s402, dividing the cleaned target text into at least one target sentence.

In other words, in actual use, the target text can also be directly input, the target sentence in the target text is divided by the preprocessing module, and then the error proprietary phrase of the target sentence is detected by the proprietary phrase error correction model, so that the efficiency is higher.

Cleaning the target text through a preprocessing module, namely only retaining numbers, chinese characters, english characters and punctuations, and removing other things (such as expressions) and the like; alternatively, stop words in the target text are also removed.

In some embodiments, the target sentence may be divided first and then cleaned.

Referring to fig. 5, in the text error correction method according to the embodiment of the present application, the classification module in the proprietary phrase error correction model is obtained by training based on the following training method, where the training method includes the following steps S501-S502;

s501, constructing a training sample set based on a target service field; the training sample set comprises sample sentences carrying special phrases, and each sample sentence corresponds to a classification label;

s502, inputting the sample sentences in the training sample set into a classification module in a proprietary phrase error correction model, and identifying classification labels of the sample sentences to train the proprietary phrase error correction model until a training end condition is met.

Here, the training end condition is that a preset training number is reached, or the loss function output value of the classification module meets a preset threshold condition.

Inputting the sample sentences in the training sample set into a classification module in the proper phrase error correction model, after the classification labels of the sample sentences are recognized, inputting the recognized classification labels and the classification labels corresponding to the sample sentences into a loss function, judging that the loss function output value of the classification module meets a preset threshold condition, and if the loss function output value meets the preset threshold condition, stopping training to obtain the classification module in the trained proper phrase error correction model; if not, repeating the training process.

It should be noted that the classification module in the proprietary phrase error correction model may be trained separately or together with the proprietary phrase error correction model.

The construction of the training sample based on the target business field comprises the following steps:

acquiring a text data source, and carrying out sentence segmentation on a sample text in the text data source to obtain a sample sentence subset; the set of sample sentences includes sample sentences;

modifying the proprietary phrases in the first sample sentence according to a preset modification rule, and amplifying a plurality of first sample sentences which carry wrong proprietary phrases and are classified into the proprietary phrases;

Specifically, modifying the exclusive phrase in the first sample sentence according to a preset modification rule includes:

wherein the plurality of types of modifications comprise: adding, deleting, replacing phonetic characters and shape characters.

The text data source is obtained, specifically, a text data set of an open source on a network can be collected, and text data of a website related to a target field can be crawled by using a crawler.

Sample text in the collected text data source is parsed and further filtered, leaving sample sentences containing one proprietary phrase and sample sentences containing no proprietary phrase at all. If the sentence contains two or more than two proprietary phrases, the sentence is screened out. At this time, the labels of all sample sentences are normal, the classification label of the sample sentence containing a proprietary phrase is reserved as the proprietary phrase, and the classification label of the sample sentence completely not containing the proprietary phrase is no proprietary phrase.

Modifying the proprietary phrases in the first sample sentence according to a preset modification rule, and amplifying a plurality of first sample sentences which carry wrong proprietary phrases and are classified and labeled as the proprietary phrases; modifying a sentence containing a proprietary phrase to obtain a plurality of sentences with labels of the proprietary phrase; the specific modification methods are four as follows.

Firstly, the method comprises the following steps: and (5) increasing. Randomly selecting a word in the proprietary phrase in the sentence, wherein the word is not the first word and the last word of the proprietary phrase, and randomly adding a word in front of the word.

II, secondly: and (5) deleting. A word is randomly selected from the proprietary phrases in the sentence and deleted.

Thirdly, the steps of: and replacing the phonetic characters. A word is randomly selected from the proprietary phrases in the sentence and the near word of the word is used to replace the altered word.

Fourthly, the method comprises the following steps: and replacing the shape and the word. Randomly selecting a word from the proprietary phrase in the sentence, and replacing the word with the word-changed grapheme.

In a specific operation, assuming that the proprietary phrase in the sample sentence is s, and the length of s is L, the maximum number of modified characters is max (L/3, 1). Each sentence modification generates 5 sentences with classification labels s. Each time an augmented sample sentence is generated, a number n is randomly selected from (1, 2., L/3) as the number of characters of the modification operation, then n modification operations are randomly selected from the above four modification operations, and n characters are randomly selected from the sample sentence corresponding to the selected n operations, the sample sentence is modified, and the modified sample sentence is added to the sample sentence subset to update the sample sentence subset.

Here, if the modified sample sentence has already been generated, the above modification operation is repeated until the resulting sample sentence is usable.

In the embodiment of the application, when the proprietary phrase error correction model is trained based on the sample sentences which carry the proprietary phrases in the target business field, the sample sentences which carry the correct proprietary phrases are used for training, and a plurality of sample sentences which carry wrong proprietary phrases are amplified for training, so that the trained proprietary phrase error correction model can accurately identify the class labels of the sentences which wrongly use the proprietary phrases, and the error correction precision of texts is improved.

Based on the same inventive concept, a text error correction device corresponding to the text error correction method is also provided in the embodiments of the present application, and because the principle of solving the problem of the device in the embodiments of the present application is similar to the text error correction method in the embodiments of the present application, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a text error correction apparatus according to an embodiment of the present application; specifically, a text error correction apparatus includes:

a judging module 601, configured to input a target sentence in a target business field into a trained proprietary phrase error correction model, and judge whether a proprietary phrase exists in the target sentence through the proprietary phrase error correction model; the proprietary phrase error correction model is obtained by training a sample sentence carrying a proprietary phrase based on a target business field;

a first determining module 602, configured to determine that no proprietary phrase exists in a target sentence as an error correction result when it is determined that no proprietary phrase exists in the target sentence;

a second determining module 603, configured to determine, when it is determined that a proprietary phrase exists in a target sentence, a proprietary phrase tag corresponding to the target sentence, and determine a target segment from the target sentence according to the proprietary phrase tag;

a third determining module 604, configured to determine that the error correction result is that the exclusive phrase in the target sentence is correct when the target segment is determined to be the same as the exclusive phrase tag;

a fourth determining module 605, configured to determine that the proprietary phrase in the target sentence is incorrect and determine that an error correction result is the proprietary phrase tag and the target segment when the target segment is determined to be inconsistent with the proprietary phrase tag; wherein the proprietary phrase tags characterize the correct proprietary phrase corresponding to the wrong target segment.

Based on this, the present application provides a text error correction apparatus, which inputs a target sentence in a target business field into a trained proprietary phrase error correction model, analyzes the semantics of the target sentence through the proprietary phrase error correction model, and determines whether a proprietary phrase exists in the target sentence; if the target sentence exists, outputting a proprietary phrase label corresponding to the target sentence based on the semantics of the sentence, determining a target segment from the target sentence according to the proprietary phrase label, judging whether the proprietary phrase is wrong or not according to the target segment, and if the proprietary phrase is wrong, outputting the wrong proprietary phrase and the correct proprietary phrase at the same time, so that error correction is facilitated; in the error correction method, the existence of the proprietary phrase is analyzed based on the sentence semantics, and the target proprietary phrase corresponding to the sentence is identified, so that the accuracy is higher and the calculation amount is less; meanwhile, the wrong proprietary phrase in the sentence is accurately identified based on the target proprietary phrase, and the correct and wrong proprietary phrases are output simultaneously, so that the modification by a user is facilitated.

In some embodiments, in the text error correction apparatus, when the determining module inputs a target sentence in a target business domain into a trained proprietary phrase error correction model and determines whether a proprietary phrase exists in the target sentence through the proprietary phrase error correction model, the determining module is specifically configured to:

inputting a target sentence in a target service field into a trained special phrase error correction model, analyzing the semantics of the target sentence through a classification module in the special phrase error correction model, and identifying a classification label of the target sentence; wherein the category labels include a proprietary phrase label and an absence of a proprietary phrase label;

when the classification label of the target sentence is that no proprietary phrase label exists, judging that no proprietary phrase exists in the target sentence;

and when the classification label of the target sentence is a proprietary phrase label, judging that the proprietary phrase exists in the target sentence.

In some embodiments, when the target segment is determined from the target sentence according to the exclusive phrase tag, the second determining module in the text error correction apparatus is specifically configured to:

In some embodiments, the text error correction apparatus further includes a preprocessing module, where the preprocessing module is configured to input a target text in a target business field to the preprocessing module before inputting a target sentence in the target business field to the trained proprietary phrase error correction model, and clean the target text through the preprocessing module to obtain a cleaned target text;

and dividing the washed target text into at least one target sentence.

In some embodiments, the text error correction device further comprises a training module, wherein the training module is used for training a classification module in the proprietary phrase error correction model based on the following training method;

In some embodiments, the training module in the text error correction apparatus is specifically configured to, when constructing the training sample based on the target service domain:

In some embodiments, the training module in the text error correction apparatus, when modifying the proprietary phrase in the first sample sentence according to the preset modification rule, is specifically configured to:

wherein the multiple types of modifications include: adding, deleting, replacing phonetic characters and shape characters.

Based on the same inventive concept, the embodiment of the present application further provides an electronic device corresponding to the text error correction method, and as the principle of solving the problem of the electronic device in the embodiment of the present application is similar to that of the text error correction method in the embodiment of the present application, the implementation of the electronic device may refer to the implementation of the method, and repeated details are not described again.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application, specifically, the electronic device 700 includes: a processor 701, a memory 702 and a bus, wherein the memory 702 stores machine-readable instructions executable by the processor 701, the processor 701 and the memory 702 communicate via the bus when the electronic device 700 is running, and the machine-readable instructions, when executed by the processor 701, perform the steps of the text error correction method.

Based on the same inventive concept, a computer-readable storage medium corresponding to the text error correction method is also provided in the embodiments of the present application, and since the principle of solving the problem of the computer-readable storage medium in the embodiments of the present application is similar to that of the text error correction method in the embodiments of the present application, the implementation of the computer-readable storage medium can refer to the implementation of the method, and repeated details are not repeated.

A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the text error correction method.

In the embodiment of the present application, the text error correction method may be executed in a terminal device or a server; the terminal device may be a local terminal device, and when the text error correction method is executed on a server, the text error correction method may be implemented and executed based on a cloud interactive system, where the cloud interactive system at least includes the server and a client device (that is, the terminal device).

Specifically, for example, when the text error correction method is applied to a terminal device, the text error correction method is used to identify an incorrect proprietary phrase in a sentence and determine a correct proprietary phrase corresponding to the incorrect proprietary phrase.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some communication interfaces, indirect coupling or communication connection between devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a platform server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall cover the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for correcting text, the method comprising:

inputting a target sentence in a target business field into a trained special phrase error correction model, analyzing the semantics of the target sentence through a classification module in the special phrase error correction model, and identifying a classification label of the target sentence; wherein the category labels include a proprietary phrase label and an absence of a proprietary phrase label; the proprietary phrase error correction model is obtained by training a sample sentence carrying a proprietary phrase based on the target service field;

when the classification tag of the target sentence is a proprietary phrase tag, judging that the proprietary phrase exists in the target sentence;

when the target segment is judged to be the same as the exclusive phrase tag, determining that the error correction result is that the exclusive phrase in the target sentence is correct;

when the target segment is judged to be inconsistent with the exclusive phrase tag, determining that the exclusive phrase in the target sentence is wrong, and determining that the error correction result is the exclusive phrase tag and the target segment; wherein the proprietary phrase tags characterize the correct proprietary phrase corresponding to the wrong target segment.

2. The text correction method of claim 1 wherein determining a target segment from the target sentence based on the proprietary phrase tags comprises:

3. The text error correction method of claim 1, wherein before inputting the target sentences of the target business domain into the trained proprietary phrase error correction model, the method further comprises:

and dividing the washed target text into at least one target sentence.

4. The text error correction method of claim 1, wherein the classification module in the proprietary phrase error correction model is trained based on the following training method:

constructing a training sample set based on the target service field; the training sample set comprises sample sentences carrying special phrases, and each sample sentence corresponds to a classification label;

5. The text error correction method of claim 4, wherein the constructing of the training sample based on the target business domain comprises:

6. The text error correction method of claim 5 wherein modifying the proprietary phrase in the first sample sentence according to a predetermined modification rule comprises:

7. A text correction apparatus, characterized in that the apparatus comprises:

the system comprises a judging module, a classification module and a searching module, wherein the judging module is used for inputting a target sentence in a target service field into a trained special phrase error correction model, analyzing the semantics of the target sentence through the classification module in the special phrase error correction model and identifying the classification label of the target sentence; wherein the category labels include a proprietary phrase label and an absence of a proprietary phrase label; the proprietary phrase error correction model is obtained by training a sample sentence carrying a proprietary phrase based on the target service field;

8. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the text correction method of any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the text correction method according to any one of claims 1 to 6.