CN110619119B - Intelligent text editing method and device and computer readable storage medium - Google Patents

Intelligent text editing method and device and computer readable storage medium Download PDF

Info

Publication number
CN110619119B
CN110619119B CN201910668831.9A CN201910668831A CN110619119B CN 110619119 B CN110619119 B CN 110619119B CN 201910668831 A CN201910668831 A CN 201910668831A CN 110619119 B CN110619119 B CN 110619119B
Authority
CN
China
Prior art keywords
text
error
intelligent
training
editing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910668831.9A
Other languages
Chinese (zh)
Other versions
CN110619119A (en
Inventor
乔佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910668831.9A priority Critical patent/CN110619119B/en
Publication of CN110619119A publication Critical patent/CN110619119A/en
Application granted granted Critical
Publication of CN110619119B publication Critical patent/CN110619119B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Abstract

The invention relates to an artificial intelligence technology, and discloses an intelligent text editing method, which comprises the following steps: acquiring a correct text set and an error text set, preprocessing the error text set to obtain a standard error text set, and establishing a corresponding label set for the correct text set and the standard error text set; converting the correct text set and the standard error text set into word vectors through a word bag model, and storing the word vectors as a training set into a corpus; training a pre-constructed text intelligent editing model by utilizing the training set and the label set to obtain a trained text intelligent editing model; and receiving text data input by a user, intelligently editing the text data input by the user by using the trained text intelligent editing model, and outputting corresponding correct text data. The invention also provides an intelligent text editing device and a computer readable storage medium. The invention realizes intelligent editing of the text.

Description

Intelligent text editing method and device and computer readable storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a text intelligent editing method and device for text error correction and a computer readable storage medium.
Background
With the increasing informatization of society, people increasingly strongly want to interact with computers in natural language. Natural language processing is an attractive and challenging topic in computer science. From the point of view of computer science and in particular artificial intelligence, the task of natural language processing is to build a computer model that gives the human-like results of understanding, analyzing and answering natural language (the various popular languages that people use daily).
Natural language processing is to study how to make a computer understand and generate the language (such as chinese and english) that people use everyday, so that the computer can understand the meaning of natural language and answer questions that people put forward to the computer with natural language by means of dialogue. In addition, in the aspect of text editing, natural language processing has a great potential to correct text errors, and the existing correction method is a language proofreading method, wherein a voice synthesis system reads an input sentence, and then an input person or a proofreader checks the original manuscript. The method can find out the difference between the input manuscript and the original manuscript, reduce the workload of proofreading, but can not find out homophonic miswords, has no error prompt function, and can not find out punctuation errors in the original manuscript.
Disclosure of Invention
The invention provides a text intelligent editing method, a text intelligent editing device and a computer readable storage medium, and mainly aims to present an intelligent text editing method for a user when the user edits a text.
In order to achieve the above object, the present invention provides an intelligent text editing method, which includes:
receiving a correct text set and an error text set, carrying out preprocessing operation on the error text set to obtain a standard error text set, and establishing a corresponding label set for the correct text set and the standard error text set;
converting the correct text set and the standard error text set into word vectors through a word bag model, and storing the word vectors as a training set into a corpus;
inputting the training set and the label set into a pre-constructed intelligent text editing model, training the intelligent text editing model by using the training set to obtain a training value, inputting the training value and the label set into a loss function of the intelligent text editing model to obtain a loss function value, and quitting training of the intelligent text editing model when the loss function value is smaller than a preset threshold value;
and receiving text data input by a user, intelligently editing the text data input by the user by using the intelligent text editing model, and outputting corresponding correct text data.
Optionally, the preprocessing operation comprises:
performing word segmentation processing on the error text set to obtain a word segmentation result, and performing punctuation correction on the error text set by using the word segmentation result and according to a punctuation correction rule to obtain an error punctuation set in the error text set;
and utilizing the word binary continuing relation to check the word continuing relation near the target word string of the error text set by establishing an N-gram model to obtain the error word string of the error text set.
Optionally, the word segmentation processing includes:
segmenting the error text set by using a full segmentation method to obtain a plurality of word segmentation modes;
and calculating the probability of each word segmentation mode according to Markov, and selecting the word segmentation result of the word segmentation mode with the highest probability as the word segmentation result of the error text set.
Optionally, the converting the correct text set and the standard error text set into word vectors by a bag of words model includes:
calculating the distance between the data objects of the correct text set and the standard error text set by an Euclidean formula, and presetting n class clusters according to a clustering algorithm, wherein the class cluster Center of the kth class cluster is a CenterkCalculating the distance from each data of the correct text set and the standard error text set to the center of each cluster of the n clusters, and obtaining the characteristics of each data in the center of each cluster;
and training the features by using a classifier, and calculating the probability of each data in the center of the class cluster, so as to convert the correct text set and the standard error text set into word vectors.
Optionally, the training the intelligent text editing model by using the training set to obtain a training value includes:
inputting the training set into an input layer of a convolutional neural network of the text intelligent editing model, and performing convolution operation on the training set through a group of filters preset in the convolutional neural network convolutional layer to extract a feature vector;
and performing pooling operation on the feature vectors by using a pooling layer of the convolutional neural network, inputting the pooled feature vectors to a full-connection layer, and performing normalization processing and calculation on the pooled feature vectors through an activation function to obtain a training value.
In addition, in order to achieve the above object, the present invention further provides a text intelligent editing apparatus, which includes a memory and a processor, where the memory stores a text intelligent editing program that can run on the processor, and when the text intelligent editing program is executed by the processor, the following steps are implemented:
receiving a correct text set and an error text set, carrying out preprocessing operation on the error text set to obtain a standard error text set, and establishing a corresponding label set for the correct text set and the standard error text set;
converting the correct text set and the standard error text set into word vectors through a word bag model, and storing the word vectors as a training set into a corpus;
inputting the training set and the label set into a pre-constructed intelligent text editing model, training the intelligent text editing model by using the training set to obtain a training value, inputting the training value and the label set into a loss function of the intelligent text editing model to obtain a loss function value, and quitting training of the intelligent text editing model when the loss function value is smaller than a preset threshold value;
and receiving text data input by a user, intelligently editing the text data input by the user by using the intelligent text editing model, and outputting corresponding correct text data.
Optionally, the preprocessing operation comprises:
performing word segmentation processing on the error text set to obtain a word segmentation result, and performing punctuation correction on the error text set by using the word segmentation result and according to punctuation correction rules to obtain an error punctuation set in the error text set;
and utilizing the word binary continuing relation to check the word continuing relation near the target word string of the error text set by establishing an N-gram model to obtain an error word string set of the error text set.
Optionally, the word segmentation processing includes:
segmenting the error text set by using a full segmentation method to obtain a plurality of word segmentation modes;
and calculating the probability of each word segmentation mode according to Markov, and selecting the word segmentation result of the word segmentation mode with the highest probability as the word segmentation result of the error text set.
Optionally, the converting the correct text set and the standard error text set into word vectors by a bag of words model includes:
calculating the distance between the data objects of the correct text set and the standard error text set by an Euclidean formula, and presetting n class clusters according to a clustering algorithm, wherein the class cluster Center of the kth class cluster is a CenterkCalculating the distance from each data of the correct text set and the standard error text set to the center of each class cluster of the n class clusters, and obtaining the characteristics of each data in the center of each class cluster;
and training the features by using a classifier, and calculating the probability of each data in the center of the class cluster, so as to convert the correct text set and the standard error text set into word vectors.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium, which stores a text intelligent editing program, wherein the text intelligent editing program can be executed by one or more processors to implement the steps of the text intelligent editing method as described above.
According to the intelligent text editing method, the intelligent text editing device and the computer readable storage medium, when a user edits a text containing errors, the received correct text set, the received error text set and the established label set are combined to train a pre-established intelligent text editing model to obtain a trained intelligent text editing model, and the text containing errors is edited and input into the trained intelligent text editing model, so that an accurate editing result can be presented to the user.
Drawings
Fig. 1 is a schematic flowchart of an intelligent text editing method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an internal structure of an intelligent text editing apparatus according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a text intelligent editing program in the text intelligent editing apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an intelligent text editing method. Fig. 1 is a schematic flow chart of a text intelligent editing method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the intelligent text editing method includes:
s1, receiving a correct text set and an error text set, preprocessing the error text set to obtain a standard error text set, and establishing a corresponding label set for the correct text set and the standard error text set.
In a preferred embodiment of the present invention, the correct text set and the incorrect text set include the same text data, but the incorrect text set has incorrect data such as an incorrect word or a grammatical ambiguity, and the correct text set does not have any incorrect data such as an incorrect word or a grammatical ambiguity.
Further, in a preferred embodiment of the present invention, the preprocessing operation includes: performing word segmentation processing on the error text set to obtain word segmentation results, performing punctuation correction on the error text set according to punctuation correction rules to obtain and label the error punctuation set in the text; and checking the word continuation relation near the target word string by establishing an N-gram model by utilizing the word binary continuation relation to obtain and label the error word string set of the error text set. The specific implementation steps of the pretreatment are as follows:
a. and performing word segmentation processing on the error text set to obtain a word segmentation result.
In the preferred embodiment of the invention, the word segmentation processing is carried out on the error text set through the Markov model to obtain a word segmentation result.
The Markov model is a statistical model and is widely applied to the application fields of natural language processing and the like such as speech recognition, automatic part-of-speech tagging, phonetic-character conversion, probabilistic grammar and the like. In the preferred embodiment of the present invention, the sentence in the erroneous text set is preset as S, the sentence S is segmented by using a full segmentation method to obtain all possible chinese word segmentation modes, the probability of each word segmentation mode is calculated according to markov, and the word segmentation result in the word segmentation mode with the highest probability is selected as the final text word segmentation result.
The Markov property means that the probability of the n word appearing in the text is only related to the appearance of the n-1 word before the n word, but not all the words after the n word, so that the n word is in a word sequence { W }1,W2...WmIn the sentence S formed by the words W, the nth word W is in the condition that the first n-1 words appeariThe probability of occurrence is:
P(Wi|W1,...Wi-1)=P(Wi|Wi-n+1,...Wi-1)
therefore, the probability of the sentence S being arranged in the word order is:
P(S)=P(W1W2...Wm)=P(W1)P(W2|W1)...P(Wm|Wm-i+1,...Wm-1)
wherein the conditional probability P (W)m|Wm-i+1,...Wm-1) Represents: in the character string Wm-i+1,...Wm-1In the case of occurrence of WmThe probability of occurrence is determined by using a binary language model based on the training of a large-scale corpus, and thus, the probability model of the sentence S is:
Figure BDA0002139389050000061
the invention selects the word segmentation result corresponding to the maximum value of P (S) from all the calculated P (S) as the word segmentation result of the scheme:
Figure BDA0002139389050000062
b. and performing punctuation correction on the error text set according to punctuation correction rules to obtain and label the error punctuation set in the error text set.
In the preferred embodiment of the invention, punctuation marks are used as driving, specific error types are adopted, based on a preset rule, multi-pass scanning is adopted, and a method of combining context is adopted to correct punctuation marks in the error text set.
In detail, the invention corrects the error text set sentence by sentence, section by section and full text by constructing a local analyzer. Preferably, the principle of the local analyzer is as follows: and dividing the error text set into single sentences according to punctuation marks, and inputting the error text set into a local analyzer according to the text sequence by taking the single sentences as units. In the local analyzer, if the text conforms to the language rule in the local range, the text normally passes through, if the local area is abnormal, the analyzer refuses to accept, and the text is judged to be wrong until the input of the text set with the mistake to the local analyzer is finished. The local analyzer judges which type the punctuation belongs to for each punctuation appearing in the text, judges whether the punctuation symbol has an error by using a corresponding collation rule, and stores an error correction suggestion into an error correction suggestion buffer. The proof reading rule is as follows:
when the corrected punctuation mark is a comma, the punctuation mark except the mark 'is arranged at the position immediately before the comma, or the punctuation mark except the mark' is arranged at the position immediately after the comma, the punctuation mark is indicated by italics in the text to indicate errors, and an error correction suggestion is made to correct more punctuation marks and delete the punctuation mark and is stored in a buffer area. And continuously and sequentially judging punctuation marks from bottom to top.
And when the corrected punctuation mark is a pause sign, judging by utilizing automatic word segmentation and part-of-speech tagging and combining context information. If the two words before and after the pause number are both digital words, the punctuation is used in the text and is represented by a italic body to represent an error, and an error correction suggestion that the punctuation error is added for correction and the pause number is deleted is stored in a buffer area. And continuously and sequentially judging punctuation marks downwards.
When the corrected punctuation mark is an ellipsis, the following three cases are considered:
(r) if the ellipses are preceded by a divide. ","! ","? The punctuation mark outside the ' is used in the text and is represented by italics to represent errors, and the error correction suggestion ' corrects more punctuation errors and suggests to delete the punctuation mark in the front ' to be stored in a buffer area;
if punctuation marks are arranged behind the ellipses, the punctuation marks are used in the text and are represented by italics to represent errors, and an error correction suggestion that the punctuation marks are more corrected and the punctuation marks behind are deleted is suggested to be stored in a buffer zone;
and thirdly, if the ellipses are followed by one of the three words of 'and the like', the punctuation is italicized and used in the text to represent errors, and the error correction suggestion 'corrects the error of the punctuation more and suggests to delete the ellipses' and store the ellipses in a buffer area. And continuously and sequentially judging punctuation marks downwards.
In the punctuation mark proofreading rule, for the error correction suggestion stored in the error correction suggestion buffer, in the preferred embodiment of the present invention, after the punctuation mark proofreading is completed, the corresponding error punctuation marks are sequentially displayed in the interface according to the sequence of errors in sentences, so as to obtain an error punctuation mark set in the error text set.
c. And utilizing the binary word continuing relation to check the word continuing relation near the target word string of the error text set by establishing an N-gram model to obtain and label the error word string set of the error text set.
The continuation relation refers to the adjacent relation between words. The binary continuous relation refers to the investigation of the character string z1z2z3......zi-1zi......znMiddle ziWhen adjacent relation with adjacent words is formed, according to an N-Gram model in corpus linguistics, wherein the binary model theory obtained when N is 2 only needs to be consideredi-1And ziAnd ziAnd zi+1The relationship therebetween is sufficient. The invention analyzes and processes the large-scale corpus, and when z isi-1And ziIs (z) is determinedi/zi-1) When a certain threshold value is satisfied, z is judgedi-1And ziContinuing and identifying the character string z according to the result of the continuation judgmentiWhether an error has occurred. The preferred embodiment of the present invention first checks zi-1And ziIf it is not, then check ziAnd zi+1If the connection relation is not continuous, then determine the character string ziAnd (6) error occurrence.
In detail, the preferred embodiment of the present invention presets the sentences in the error text set as S ═ z1z2z3......zi-1zi......znWherein z isiAnd zi+1The capacity of the character strings in the Chinese language database is N, z for two adjacent character stringsiAnd zi+1The number of adjacent times is r (z)i,zi+1),ziAnd zi+1The independent occurrence times are respectively r (z)i)、r(zi+1) Then z isiAnd zi+1The probability of independent occurrence is:
p(zi)=r(zi)/N,
p(zi+1)=r(zi+1)/N;
ziand zi+1The co-occurrence probability of neighbors is:
p(zi,zi+1)=r(zi,zi+1)/N。
when r (z)i,zi+1)=N*p(zi,zi+1) When is equal to or more than tau, represents ziAnd zi+1Has a higher co-occurrence frequency of (c), and z is judgediAnd zi+1Continuously, illustrate said word string ziCorrect; on the contrary, when r (z)i,zi+1)=N*p(zi,zi+1) If τ, the word string z is describediAnd (4) an error. Wherein tau is a threshold value, and tau is preset to be 0.8. Preferably, the present invention obtains the error string set of the error text set by performing traversal check on the error text set.
Further, in the preferred embodiment of the present invention, a standard error text set is obtained according to the error punctuation mark set and the error string set obtained by the preprocessing.
And S2, converting the correct text set and the standard error text set into word vectors through a bag-of-words model, and storing the word vectors as a training set into a corpus.
The bag-of-words model is used to represent text as feature vectors, and the basic idea is to assume that for a text, its word order and syntax, syntax are ignored, and only it is considered as a collection of words.
In detail, the converting the correct text set and the standard error text set into word vectors by the bag-of-words model according to the preferred embodiment of the present invention includes:
A. and calculating the distance between the data objects of the correct text set and the standard error text set by using a Euclidean formula.
Preset xi、xjData of the correct text set and the standard error text set respectively, D represents the attribute number of the data object of the correct text set and the standard error text set, and the European expressionThe formula is as follows:
Figure BDA0002139389050000081
B. presetting n clusters according to a clustering algorithm, wherein the cluster Center of the kth cluster is a CenterkThe CenterkRepresenting a vector containing attributes of data objects, said CenterkThe formula of (1) is:
Figure BDA0002139389050000082
wherein, CkIndicating the number of data objects in the kth class cluster.
Further, the invention passes the European formula and the CenterkAnd calculating the distance from each data of the correct text set and the standard error text set to the center of each class cluster in the n class clusters by using an updating formula, and obtaining the characteristics of each data in the center of each class cluster.
C. And training the features by using a classifier, and calculating the probability of each data of the correct text set and the standard error text set in the cluster center, so as to convert the correct text set and the standard error text set into word vectors.
The classifier is a naive bayes classifier, which is a series of simple probabilistic classifiers based on the strong (naive) independence between assumed features using bayes theorem.
In a preferred embodiment of the present invention, the method for calculating the probability of each data in the correct text set and the standard error text set at the center of the cluster is as follows:
presetting independence hypothesis among the characteristics, wherein preset data samples are as follows: x ═ x1,x2,...,xd)T(ii) a Calculating the center w of each data in the clusteriProbability of being:
Figure BDA0002139389050000091
Wherein d is a characteristic dimension of data in the predetermined data sample, xkIs the value of the sample on the kth feature;
smoothing the data in the preset data sample by using the following formula to achieve the purpose of avoiding data sparseness:
Figure BDA0002139389050000092
wherein, ckRepresenting the number of possible values of the k-dimension characteristic, wherein alpha is a coefficient;
the maximum likelihood estimation method is used for obtaining:
Figure BDA0002139389050000093
wherein the molecule represents the cluster-like center wiSet D ofiThe value of the kth characteristic is xkThe number of samples of (c).
S3, inputting the training set and the label set into a pre-constructed intelligent text editing model, training the intelligent text editing model by using the training set to obtain a training value, inputting the training value and the label set into a loss function of the intelligent text editing model to obtain a loss function value, and quitting training of the intelligent text editing model when the loss function value is smaller than a preset threshold value.
In a preferred embodiment of the present invention, the intelligent text editing model includes a convolutional neural network. The convolutional neural network is a feedforward neural network, the artificial neurons of the convolutional neural network can respond to surrounding units within a part of coverage range, the basic structure of the convolutional neural network comprises two layers, one layer is a characteristic extraction layer, the input of each neuron is connected with a local receiving domain of the previous layer, and the local characteristics are extracted. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal.
In a preferred embodiment of the present invention, the convolutional neural network comprises an input layer, a convolutional layer, a pooling layer, and an output layer. In a preferred embodiment of the present invention, the input layer of the convolutional neural network model receives the training set and the label set, and performs convolution operation on the training set by presetting a set of filters in the convolutional layer to extract feature vectors, where the filters may be { filter0,filter1-generating a set of features on similar channels and dissimilar channels, respectively; and performing pooling operation on the feature vectors by using the pooling layer, inputting the pooled feature vectors into a full-connection layer, performing normalization processing and calculation on the pooled feature vectors through an activation function to obtain a training value, inputting a calculation result into an output layer, and outputting correct text data by the output layer. The normalization process is to "compress" a K-dimensional vector containing arbitrary real numbers to another K-dimensional real vector such that each element ranges between (0, 1) and the sum of all elements is 1.
In the embodiment of the present invention, the activation function is a softmax function, and a calculation formula is as follows:
Figure BDA0002139389050000101
wherein, OjRepresenting the correct text data output value, I, of the jth neuron of the convolutional neural network output layerjAnd the input value of the jth neuron of the convolutional neural network output layer is represented, t represents the total quantity of the neurons of the output layer, and e is an infinite acyclic decimal.
In a preferred embodiment of the present invention, the threshold of the predetermined loss function value is 0.01, and the loss function is a minimum two-fold multiplication:
Figure BDA0002139389050000102
wherein s is an error value between the output correct text data and the output error text data, k is the number of the text sets, yiIs the erroneous text data, y'iIs the correct text data.
And S4, receiving text data input by a user, intelligently editing the text data input by the user by using the intelligent text editing model, and outputting corresponding correct text data.
The preferred embodiment of the invention utilizes the intelligent text editing model to automatically correct and edit the text data input by the user to obtain the corrected text data, and can realize the output of the text data with the correction mark and the correct text data.
The invention also provides an intelligent text editing device. Fig. 2 is a schematic diagram of an internal structure of an intelligent text editing apparatus according to an embodiment of the present invention.
In the present embodiment, the intelligent text editing apparatus 1 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server. The intelligent text editing device 1 at least comprises a memory 11, a processor 12, a communication bus 13 and a network interface 14.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may be an internal storage unit of the intelligent text editing apparatus 1 in some embodiments, for example, a hard disk of the intelligent text editing apparatus 1. The memory 11 may also be an external storage device of the intelligent text editing apparatus 1 in other embodiments, such as a plug-in hard disk provided on the intelligent text editing apparatus 1, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit and an external storage device of the intelligent text editing apparatus 1. The memory 11 can be used for storing not only the application software installed in the intelligent text editing apparatus 1 and various data, such as the code of the intelligent text editing program 01, but also temporarily storing data that has been output or is to be output.
Processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, is configured to execute program codes or process data stored in memory 11, such as executing text intelligent editor 01.
The communication bus 13 is used to realize connection communication between these components.
The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface), and is typically used to establish a communication link between the apparatus 1 and other electronic devices.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display may also be referred to as a display screen or a display unit, where appropriate, for displaying information processed in the intelligent text editing apparatus 1 and for displaying a visual user interface.
Fig. 2 shows only the intelligent editing apparatus 1 having the components 11 to 14 and the intelligent editing program 01, and those skilled in the art will understand that the structure shown in fig. 1 does not constitute a limitation of the intelligent editing apparatus 1, and may include fewer or more components than those shown, or combine some components, or arrange different components.
In the embodiment of the apparatus 1 shown in fig. 2, a text intelligent editing program 01 is stored in the memory 11; the following steps are implemented when the processor 12 executes the text intelligent editing program 01 stored in the memory 11:
step one, receiving a correct text set and an error text set, carrying out preprocessing operation on the error text set to obtain a standard error text set, and establishing a corresponding label set for the correct text set and the standard error text set.
In a preferred embodiment of the present invention, the correct text set and the incorrect text set include the same text data, but the incorrect text set has incorrect data such as an incorrect word or a grammatical ambiguity, and the correct text set does not have any incorrect data such as an incorrect word or a grammatical ambiguity.
Further, in a preferred embodiment of the present invention, the preprocessing operation includes: performing word segmentation processing on the error text set to obtain word segmentation results, performing punctuation correction on the error text set according to punctuation correction rules to obtain and label the error punctuation set in the text; and checking the word continuation relation near the target word string by establishing an N-gram model by utilizing the word binary continuation relation to obtain and label the error word string set of the error text set. The specific implementation steps are as follows:
a. and performing word segmentation processing on the error text set to obtain a word segmentation result.
In the preferred embodiment of the invention, the word segmentation processing is carried out on the error text set through the Markov model to obtain a word segmentation result.
The Markov model is a statistical model and is widely applied to the application fields of natural language processing and the like such as speech recognition, automatic part-of-speech tagging, phonetic-character conversion, probabilistic grammar and the like. In the preferred embodiment of the present invention, the sentence in the erroneous text set is preset as S, the sentence S is segmented by using a full segmentation method to obtain all possible chinese word segmentation modes, the probability of each word segmentation mode is calculated according to markov, and the word segmentation result in the word segmentation mode with the highest probability is selected as the final text word segmentation result.
The Markov property means that the probability of the n word appearing in the text is only related to the appearance of the n-1 word before the n word, but not all the words after the n word, so that the n word is in a word sequence { W }1,W2...WmIn the sentence S formed by the words, under the condition that the first n-1 words appear, the probability of the nth word appearing is as follows:
P(Wn|W1,...Wn-1)=P(Wn|Wn-i+1,...Wn-1)
therefore, the probability of the sentence S being arranged in the word order is:
P(S)=P(W1W2...Wm)=P(W1)P(W2|W1)...P(Wm|Wm-i+1,...Wm-1)
wherein the conditional probability P (W)m|Wm-i+1,...Wm-1) Represents: in the character string Wm-i+1,...Wm-1In the case of occurrence of WmThe probability of occurrence is determined by using a binary grammar model based on the training of a large-scale corpus, and thus the probability model of the sentence S is:
Figure BDA0002139389050000131
the invention selects the word segmentation result corresponding to the maximum value of P (S) from all the calculated P (S) as the word segmentation result of the scheme:
Figure BDA0002139389050000132
b. and performing punctuation correction on the error text set according to punctuation correction rules to obtain and label the error punctuation set in the error text set.
In the preferred embodiment of the invention, punctuation marks are used as driving, specific error types are adopted, based on a preset rule, multi-pass scanning is adopted, and a method of combining context is adopted to correct punctuation marks in the error text set.
In detail, the invention corrects the error text set sentence by sentence, section by section and full text by constructing a local analyzer. Preferably, the principle of the local analyzer is as follows: and dividing the error text set into single sentences according to punctuation marks, and inputting the error text set into a local analyzer according to the text sequence by taking the single sentences as units. In the local analyzer, if the text conforms to the language rule in the local range, the text normally passes through, if the local area is abnormal, the analyzer refuses to accept, and the text is judged to be wrong until the input of the text set with the mistake to the local analyzer is finished. The local analyzer judges which type the punctuation belongs to for each punctuation appearing in the text, judges whether the punctuation symbol has an error by using a corresponding collation rule, and stores an error correction suggestion into an error correction suggestion buffer. The calibration rule is as follows:
when the corrected punctuation mark is a comma, the punctuation mark except the mark 'is arranged at the position immediately before the comma, or the punctuation mark except the mark' is arranged at the position immediately after the comma, the punctuation mark is indicated by italics in the text to indicate errors, and an error correction suggestion is made to correct more punctuation marks and delete the punctuation mark and is stored in a buffer area. And continuously and sequentially judging punctuation marks downwards.
And when the corrected punctuation mark is a pause sign, judging by utilizing automatic word segmentation and part-of-speech tagging and combining context information. If the two words before and after the pause number are both digital words, the punctuation is used in the text and is represented by an italic, so that errors are represented, and an error correction suggestion that the punctuation errors are more corrected and the pause number is deleted is suggested to be stored in a buffer area. And continuously and sequentially judging punctuation marks downwards.
When the corrected punctuation mark is an ellipsis, the following three cases are considered:
(r) if the ellipses are preceded by a divide. ","! ","? The punctuation mark outside the ' is used in the text and is represented by italics to represent errors, and the error correction suggestion ' corrects more punctuation errors and suggests to delete the punctuation mark in the front ' to be stored in a buffer area;
if punctuation marks are arranged behind the ellipses, the punctuation marks are used in the text and are represented by italics to represent errors, and an error correction suggestion that the punctuation marks are more corrected and the punctuation marks behind are deleted is suggested to be stored in a buffer zone;
and thirdly, if one of three words of 'and the like', equal and 'like' is followed by the ellipses, the punctuation is used in the text and is indicated by italics, and the error correction suggestion 'corrects the error with more punctuation and then deletes the ellipses' is suggested to be stored in a buffer area. And continuously and sequentially judging punctuation marks downwards.
In the punctuation mark proofreading rule, for the error correction suggestion stored in the error correction suggestion buffer, in the preferred embodiment of the present invention, after the punctuation mark proofreading is completed, the corresponding error punctuation marks are sequentially displayed in the interface according to the sequence of errors in sentences, so as to obtain an error punctuation mark set in the error text set.
c. And utilizing the word binary continuing relation to check the word continuing relation near the target word string of the error text set by establishing an N-gram model, so as to obtain and label the error word string set of the error text set.
The continuation relation refers to the adjacent relation between words. The binary continuous relation refers to the investigation of the character string z1z2z3......zi-1zi......znMiddle ziWhen adjacent relation with adjacent words is formed, according to an N-Gram model in corpus linguistics, wherein the binary model theory obtained when N is 2 only needs to be consideredi-1And ziAnd ziAnd zi+1The relationship therebetween is sufficient. The invention analyzes and processes the large-scale corpus, and when z isi-1And ziIs (z) is determinedi/zi-1) When a certain threshold value is satisfied, z is judgedi-1And ziContinuing and identifying the character string z according to the result of the continuation judgmentiWhether an error has occurred. The preferred embodiment of the present invention first checks zi-1And ziIf it is not, then check ziAnd zi+1If the connection relation is not continuous, then determine the character string ziAnd (6) making a mistake.
In detail, the preferred embodiment of the present invention presets the sentences in the error text set as S ═ z1z2z3......zi-1zi......znWherein z isiAnd zi+1The capacity of the character strings in the Chinese language database is N, z for two adjacent character stringsiAnd zi+1The number of adjacent times is r (z)i,zi+1),ziAnd zi+1The independent occurrence times are respectively r (z)i)、r(zi+1) Then z isiAnd zi+1The probability of independent occurrence is:
p(zi)=r(zi)/N,
p(zi+1)=r(zi+1)/N;
ziand zi+1The co-occurrence probability of neighbors is:
p(zi,zi+1)=r(zi,zi+1)/N。
when r (z)i,zi+1)=N*p(zi,zi+1) When is equal to or more than tau, represents ziAnd zi+1Has a higher co-occurrence frequency of (c), and z is judgediAnd zi+1Continuously, illustrate said word string ziCorrect; on the contrary, when r (z)i,zi+1)=N*p(zi,zi+1) If τ, the word string z is describediAnd (4) an error. Wherein tau is a threshold value, and tau is preset to be 0.8. Preferably, the present invention obtains the error string set of the error text set by performing traversal check on the error text set.
Further, in the preferred embodiment of the present invention, a standard error text set is obtained according to the error punctuation mark set and the error string set obtained by the preprocessing.
And step two, converting the correct text set and the standard error text set into word vectors through a word bag model, and storing the word vectors as a training set into a corpus.
The bag-of-words model is used to represent text as feature vectors, and the basic idea is to assume that for a text, its word order and syntax, syntax are ignored, and only it is considered as a collection of words.
In detail, the converting the correct text set and the standard error text set into word vectors by the bag-of-words model according to the preferred embodiment of the present invention includes:
A. and calculating the distance between the data objects of the correct text set and the standard error text set by using a Euclidean formula.
Presetting xi、xjD represents the attribute number of the data objects of the correct text set and the standard error text set, and the European formula is as follows:
Figure BDA0002139389050000151
B. presetting n clusters according to a clustering algorithm, wherein the cluster Center of the kth cluster is a CenterkThe CenterkRepresenting a vector containing attributes of data objects, said CenterkThe formula of (1) is:
Figure BDA0002139389050000152
wherein, CkIndicating the number of data objects in the kth class cluster. Further, the invention passes the European formula and the CenterkAnd calculating the distance from each data of the correct text set and the standard error text set to the center of each class cluster in the n class clusters by using an updating formula, and obtaining the characteristics of each data in the center of each class cluster.
C. And training the features by using a classifier, and calculating the probability of each data of the correct text set and the standard error text set in the cluster-like center, thereby converting the correct text set and the standard error text set into word vectors.
The classifier is a naive bayes classifier, which is a series of simple probabilistic classifiers based on the strong (naive) independence between assumed features using bayes theorem.
In a preferred embodiment of the present invention, the method for calculating the probability of each data in the correct text set and the standard error text set at the center of the cluster is as follows:
presetting independence hypothesis among the characteristics, wherein preset data samples are as follows: x ═ x1,x2,...,xd)T(ii) a Calculating the center w of each data in the clusteriThe probability is:
Figure BDA0002139389050000161
wherein d is a characteristic dimension of data in the predetermined data sample, xkIs the value of the sample on the kth feature;
smoothing the data in the preset data sample by using the following formula to achieve the purpose of avoiding data sparseness:
Figure BDA0002139389050000162
wherein, ckRepresenting the number of possible values of the k-dimension characteristic, wherein alpha is a coefficient;
the maximum likelihood estimation method obtains:
Figure BDA0002139389050000163
wherein the molecule represents the cluster-like center wiSet D ofiThe value of the kth characteristic is xkThe number of samples of (c).
Inputting the training set and the label set into a pre-constructed intelligent text editing model, training the intelligent text editing model by using the training set to obtain a training value, inputting the training value and the label set into a loss function of the intelligent text editing model to obtain a loss function value, and exiting the training of the intelligent text editing model when the loss function value is smaller than a preset threshold value.
In a preferred embodiment of the present invention, the intelligent text editing model includes a convolutional neural network. The convolutional neural network is a feedforward neural network, the artificial neurons of the convolutional neural network can respond to surrounding units within a part of coverage range, the basic structure of the convolutional neural network comprises two layers, one layer is a characteristic extraction layer, the input of each neuron is connected with a local receiving domain of the previous layer, and the local characteristics are extracted. Once the local feature is extracted, the position relation between the local feature and other features is determined; the other is a feature mapping layer, each calculation layer of the network is composed of a plurality of feature mappings, each feature mapping is a plane, and the weights of all neurons on the plane are equal.
In a preferred embodiment of the present invention, the convolutional neural network comprises an input layer, a convolutional layer, a pooling layer, and an output layer. In a preferred embodiment of the present invention, the input layer of the convolutional neural network model receives the training set and the label set, and performs convolution operation on the training set by presetting a set of filters in the convolutional layer to extract feature vectors, where the filters may be { filter0,filter1-generating a set of features on similar channels and dissimilar channels, respectively; and performing pooling operation on the feature vectors by using the pooling layer, inputting the pooled feature vectors into a full-connection layer, performing normalization processing and calculation on the pooled feature vectors through an activation function to obtain a training value, inputting a calculation result into an output layer, and outputting correct text data by the output layer. The normalization process is to "compress" a K-dimensional vector containing arbitrary real numbers to another K-dimensional real vector such that each element ranges between (0, 1) and the sum of all elements is 1.
In the embodiment of the present invention, the activation function is a softmax function, and a calculation formula is as follows:
Figure BDA0002139389050000171
wherein, OjRepresenting the correct text data output value, I, of the jth neuron of the convolutional neural network output layerjAnd the input value of the jth neuron of the convolutional neural network output layer is represented, t represents the total quantity of the neurons of the output layer, and e is an infinite acyclic decimal.
In a preferred embodiment of the present invention, the threshold of the predetermined loss function value is 0.01, and the loss function is a minimum two-fold multiplication:
Figure BDA0002139389050000172
wherein s is an error value between the output correct text data and the output error text data, k is the number of the text sets, yiIs the erroneous text data, y'iIs the correct text data.
And step four, receiving text data input by a user, intelligently editing the text data input by the user by using the text intelligent editing model, and outputting corresponding correct text data.
The preferred embodiment of the invention utilizes the intelligent text editing model to automatically correct and edit the text data input by the user to obtain the corrected text data, and can realize the output of the text data with the correction mark and the correct text data.
Alternatively, in other embodiments, the text intelligent editing program may be divided into one or more modules, and one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention.
For example, referring to fig. 3, a schematic diagram of program modules of a text intelligent editing program in an embodiment of the text intelligent editing apparatus of the present invention is shown, in this embodiment, the text intelligent editing program may be divided into a text preprocessing module 10, a model training module 20, and a text intelligent editing module 30, exemplarily:
the keyword received text preprocessing module 10 is configured to: receiving a correct text set and an error text set, preprocessing the error text set to obtain a standard error text set, establishing a corresponding label set for the correct text set and the standard error text set, converting the correct text set and the standard error text set into word vectors through a word bag model, and storing the word vectors as a training set in a corpus.
The model training module 20 is configured to: inputting the training set and the label set into a pre-constructed intelligent text editing model, training the intelligent text editing model by using the training set to obtain a training value, inputting the training value and the label set into a loss function of the intelligent text editing model to obtain a loss function value, and quitting training of the intelligent text editing model when the loss function value is smaller than a preset threshold value.
The intelligent text editing module 30 is configured to: and receiving text data input by a user, intelligently editing the text data input by the user by using the text intelligent editing model, and outputting corresponding correct text data.
The functions or operation steps of the program modules such as the text preprocessing module 10, the model training module 20, and the text intelligent editing module 30 when executed are substantially the same as those of the above embodiments, and are not repeated herein.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a text intelligent editing program is stored on the computer-readable storage medium, where the text intelligent editing program is executable by one or more processors to implement the following operations:
receiving text data input by a user, intelligently editing the text data input by the user by using the text intelligent editing model, and outputting corresponding correct text data
The embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the text intelligent editing apparatus and method, and will not be described herein again.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. The term "comprising" is used to specify the presence of stated features, integers, steps, operations, elements, components, groups, integers, operations, elements, components, groups, elements, groups, integers, operations, elements, groups, etc., without limitation to any particular feature or element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the present specification and drawings, or used directly or indirectly in other related fields, are included in the scope of the present invention.

Claims (6)

1. A text intelligent editing method is characterized by comprising the following steps:
receiving a correct text set and an error text set, performing preprocessing operation on the error text set to obtain a standard error text set, and establishing a corresponding label set for the correct text set and the standard error text set, wherein the preprocessing operation comprises the following steps: performing word segmentation processing on the error text set to obtain a word segmentation result, performing punctuation correction on the error text set by using the word segmentation result and according to punctuation correction rules to obtain an error punctuation set in the error text set, and performing word continuation relation check on the error text set by establishing an N-gram model and using a word binary continuation relation to obtain an error string set of the error text set;
calculating the distance between the data objects of the correct text set and the standard error text set by an Euclidean formula, and presetting n class clusters according to a clustering algorithm, wherein the class cluster Center of the kth class cluster is a CenterkCalculating the distance from each data of the correct text set and the standard error text set to the center of each class cluster of the n class clusters, and obtaining the characteristics of each data in the center of each class cluster;
training the features by using a classifier, calculating the probability of each data in the cluster center, converting the correct text set and the standard error text set into word vectors, and storing the word vectors as a training set into a corpus;
inputting the training set and the label set into a pre-constructed intelligent text editing model, training the intelligent text editing model by using the training set to obtain a training value, inputting the training value and the label set into a loss function of the intelligent text editing model to obtain a loss function value, and quitting training of the intelligent text editing model when the loss function value is smaller than a preset threshold value;
and receiving text data input by a user, intelligently editing the text data input by the user by using the intelligent text editing model, and outputting corresponding correct text data.
2. The intelligent text editing method according to claim 1, wherein the word segmentation process comprises:
segmenting the error text set by using a full segmentation method to obtain a plurality of word segmentation modes;
and calculating the probability of each word segmentation mode according to Markov, and selecting the word segmentation result of the word segmentation mode with the highest probability as the word segmentation result of the error text set.
3. The intelligent text editing method of claim 1, wherein the training the intelligent text editing model with the training set to obtain a training value comprises:
inputting the training set into an input layer of a convolutional neural network of the text intelligent editing model, and performing convolution operation on the training set through a group of filters preset in the convolutional neural network convolutional layer to extract a feature vector;
and performing pooling operation on the feature vectors by using a pooling layer of the convolutional neural network, inputting the pooled feature vectors to a full-connection layer, and performing normalization processing and calculation on the pooled feature vectors through an activation function to obtain a training value.
4. An intelligent text editing apparatus, comprising a memory and a processor, wherein the memory stores a intelligent text editing program operable on the processor, and the intelligent text editing program when executed by the processor implements the following steps:
receiving a correct text set and an error text set, performing preprocessing operation on the error text set to obtain a standard error text set, and establishing a corresponding label set for the correct text set and the standard error text set, wherein the preprocessing operation comprises the following steps: performing word segmentation processing on the error text set to obtain a word segmentation result, performing punctuation correction on the error text set by using the word segmentation result and according to punctuation correction rules to obtain an error punctuation set in the error text set, and performing word continuation relation check on the error text set by establishing an N-gram model and by using a word binary continuation relation to obtain an error string set of the error text set, wherein words near a target string are subjected to word continuation relation check;
calculating the distance between the data objects of the correct text set and the standard error text set by an Euclidean formula, and presetting n class clusters according to a clustering algorithm, wherein the class cluster Center of the kth class cluster is the CenterkCalculating the distance from each data of the correct text set and the standard error text set to the center of each class cluster of the n class clusters, and obtaining the characteristics of each data in the center of each class cluster;
training the features by using a classifier, calculating the probability of each data in the cluster center, converting the correct text set and the standard error text set into word vectors, and storing the word vectors as a training set into a corpus;
inputting the training set and the label set into a pre-constructed intelligent text editing model, training the intelligent text editing model by using the training set to obtain a training value, inputting the training value and the label set into a loss function of the intelligent text editing model to obtain a loss function value, and quitting training of the intelligent text editing model when the loss function value is smaller than a preset threshold value;
and receiving text data input by a user, intelligently editing the text data input by the user by using the intelligent text editing model, and outputting corresponding correct text data.
5. The intelligent editing device for text according to claim 4, wherein the word segmentation process comprises:
segmenting the error text set by using a full segmentation method to obtain a plurality of word segmentation modes;
and calculating the probability of each word segmentation mode according to Markov, and selecting the word segmentation result of the word segmentation mode with the highest probability as the word segmentation result of the error text set.
6. A computer-readable storage medium having stored thereon a text intelligent editing program executable by one or more processors to perform the steps of the text intelligent editing method of any one of claims 1 to 3.
CN201910668831.9A 2019-07-23 2019-07-23 Intelligent text editing method and device and computer readable storage medium Active CN110619119B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910668831.9A CN110619119B (en) 2019-07-23 2019-07-23 Intelligent text editing method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910668831.9A CN110619119B (en) 2019-07-23 2019-07-23 Intelligent text editing method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110619119A CN110619119A (en) 2019-12-27
CN110619119B true CN110619119B (en) 2022-06-10

Family

ID=68921735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910668831.9A Active CN110619119B (en) 2019-07-23 2019-07-23 Intelligent text editing method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110619119B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626047A (en) * 2020-04-23 2020-09-04 平安科技(深圳)有限公司 Intelligent text error correction method and device, electronic equipment and readable storage medium
CN111985491A (en) * 2020-09-03 2020-11-24 深圳壹账通智能科技有限公司 Similar information merging method, device, equipment and medium based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045778A (en) * 2015-06-24 2015-11-11 江苏科技大学 Chinese homonym error auto-proofreading method
KR20160054751A (en) * 2014-11-07 2016-05-17 한국전자통신연구원 System for editing a text and method thereof
CN107357775A (en) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence
CN109766538A (en) * 2018-11-21 2019-05-17 北京捷通华声科技股份有限公司 A kind of text error correction method, device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140214401A1 (en) * 2013-01-29 2014-07-31 Tencent Technology (Shenzhen) Company Limited Method and device for error correction model training and text error correction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160054751A (en) * 2014-11-07 2016-05-17 한국전자통신연구원 System for editing a text and method thereof
CN105045778A (en) * 2015-06-24 2015-11-11 江苏科技大学 Chinese homonym error auto-proofreading method
CN107357775A (en) * 2017-06-05 2017-11-17 百度在线网络技术(北京)有限公司 The text error correction method and device of Recognition with Recurrent Neural Network based on artificial intelligence
CN109766538A (en) * 2018-11-21 2019-05-17 北京捷通华声科技股份有限公司 A kind of text error correction method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110619119A (en) 2019-12-27

Similar Documents

Publication Publication Date Title
CN109241524B (en) Semantic analysis method and device, computer-readable storage medium and electronic equipment
CN109241255B (en) Intention identification method based on deep learning
CN110795543B (en) Unstructured data extraction method, device and storage medium based on deep learning
CN113255755B (en) Multi-modal emotion classification method based on heterogeneous fusion network
CN108804423B (en) Medical text feature extraction and automatic matching method and system
US20060206313A1 (en) Dictionary learning method and device using the same, input method and user terminal device using the same
CN110765260A (en) Information recommendation method based on convolutional neural network and joint attention mechanism
CN109637537B (en) Method for automatically acquiring annotated data to optimize user-defined awakening model
CN111368075A (en) Article quality prediction method and device, electronic equipment and storage medium
CN110263325A (en) Chinese automatic word-cut
CN110866098B (en) Machine reading method and device based on transformer and lstm and readable storage medium
WO2021135457A1 (en) Recurrent neural network-based emotion recognition method, apparatus, and storage medium
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
CN111353306A (en) Entity relationship and dependency Tree-LSTM-based combined event extraction method
CN110619119B (en) Intelligent text editing method and device and computer readable storage medium
US20230298630A1 (en) Apparatuses and methods for selectively inserting text into a video resume
CN112632244A (en) Man-machine conversation optimization method and device, computer equipment and storage medium
CN114612921B (en) Form recognition method and device, electronic equipment and computer readable medium
CN115730597A (en) Multi-level semantic intention recognition method and related equipment thereof
CN113469163B (en) Medical information recording method and device based on intelligent paper pen
CN112632956A (en) Text matching method, device, terminal and storage medium
CN113158656A (en) Ironic content identification method, ironic content identification device, electronic device, and storage medium
CN113705207A (en) Grammar error recognition method and device
US11854537B2 (en) Systems and methods for parsing and correlating solicitation video content
CN112183106A (en) Semantic understanding method and device based on phoneme association and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant