CN113255652A - Text correction method, device, equipment and medium - Google Patents

Text correction method, device, equipment and medium Download PDF

Info

Publication number
CN113255652A
CN113255652A CN202110775077.6A CN202110775077A CN113255652A CN 113255652 A CN113255652 A CN 113255652A CN 202110775077 A CN202110775077 A CN 202110775077A CN 113255652 A CN113255652 A CN 113255652A
Authority
CN
China
Prior art keywords
text
modification
recognition result
symbol
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110775077.6A
Other languages
Chinese (zh)
Other versions
CN113255652B (en
Inventor
赵明
田科
阳锋
章宏武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Century TAL Education Technology Co Ltd
Original Assignee
Beijing Century TAL Education Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Century TAL Education Technology Co Ltd filed Critical Beijing Century TAL Education Technology Co Ltd
Priority to CN202110775077.6A priority Critical patent/CN113255652B/en
Publication of CN113255652A publication Critical patent/CN113255652A/en
Application granted granted Critical
Publication of CN113255652B publication Critical patent/CN113255652B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Discrimination (AREA)

Abstract

The present disclosure provides a text correction method, apparatus, device, and medium, wherein the method includes: acquiring a text image to be processed; detecting text characters and modification symbols contained in the text image by adopting a detection model obtained by pre-training to obtain a first position of the text characters and a second position of the modification symbols; extracting the content to be identified from the text image based on the first position and the second position; recognizing the content to be recognized by using a recognition model obtained by pre-training to obtain a text character recognition result and a modification symbol recognition result; and correcting the text character recognition result based on the modified symbol recognition result. The method well guarantees the accuracy and reliability of text recognition, and can effectively improve user experience.

Description

Text correction method, device, equipment and medium
Technical Field
The present disclosure relates to the field of image processing, and in particular, to a method, an apparatus, a device, and a medium for text correction.
Background
The image recognition technology is an important application branch of artificial intelligence, in which a text detection/recognition technology capable of automatically detecting and recognizing characters in an image is gradually emerging, and is widely applied to a plurality of fields such as an online education field such as a photographing judgment question, a logistics field such as a photographing recognition address, an editing field such as photographing and character inputting, and the like.
However, the inventor researches and discovers that a user may need to modify the written content by using various document modification symbols such as a delete symbol, a change symbol, an add symbol and the like due to the occurrence of a stroke error, a sequential expression error, a missing word and the like in the writing process, but the related text recognition technology can only recognize the written content, but the recognized content is substantially wrong and not the expression of the written content modified by the user, that is, the related text recognition technology is difficult to correctly recognize the text image with the document modification symbol, and the user experience is poor.
Disclosure of Invention
To solve the above technical problem or at least partially solve the above technical problem, the present disclosure provides a text correction method, apparatus, device, and medium.
According to an aspect of the present disclosure, there is provided a text correction method including: acquiring a text image to be processed; detecting text characters and modification symbols contained in the text image by adopting a detection model obtained by pre-training to obtain a first position of the text characters and a second position of the modification symbols; extracting content to be identified from the text image based on the first position and the second position; recognizing the content to be recognized by using a recognition model obtained by pre-training to obtain a text character recognition result and a modification symbol recognition result; and correcting the text character recognition result based on the modified symbol recognition result.
According to another aspect of the present disclosure, there is provided a text correction apparatus including: the image acquisition module is used for acquiring a text image to be processed; the detection module is used for detecting the text characters and the modification symbols contained in the text image by adopting a detection model obtained by pre-training to obtain a first position of the text characters and a second position of the modification symbols; the content extraction module is used for extracting content to be identified from the text image based on the first position and the second position; the recognition module is used for recognizing the content to be recognized by using a recognition model obtained by pre-training to obtain a text character recognition result and a modified symbol recognition result; and the correction module is used for correcting the text character recognition result based on the modified symbol recognition result.
According to another aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory storing a program, wherein the program includes instructions that, when executed by the processor, cause the processor to perform the text correction method described above.
According to another aspect of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above-described text correction method.
According to the technical scheme provided by the embodiment of the disclosure, a text image to be processed is obtained, a detection model obtained through pre-training is adopted to detect text characters and modification symbols contained in the text image, a first position of the text characters and a second position of the modification symbols are obtained, contents to be recognized are extracted from the text image based on the first position and the second position, then the recognition model obtained through pre-training can be further used for recognizing the contents to be recognized, a text character recognition result and a modification symbol recognition result are obtained, and finally the text character recognition result can be corrected based on the modification symbol recognition result. By the mode, the text characters and the modification symbols in the text image can be detected and recognized by means of the model, the text character recognition result can be corrected based on the modification symbol recognition result, and the corrected text character recognition result is the real expression meaning of the user, so that the accuracy and the reliability of text recognition are well guaranteed, and the user experience can be effectively improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a text correction method according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a modified symbol provided by an embodiment of the present disclosure;
fig. 3 is a schematic diagram of an encoding provided by the embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a character insertion position according to an embodiment of the present disclosure;
fig. 5 is a schematic decoding diagram provided in the embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a text correction system according to an embodiment of the present disclosure;
FIG. 7 is a schematic structural diagram of another text correction system provided in an embodiment of the present disclosure;
fig. 8 is a flowchart of a text correction method provided in an embodiment of the present disclosure;
fig. 9 is a block diagram of a text correction apparatus according to an embodiment of the present disclosure;
fig. 10 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and its variants as used in this disclosure are intended to be inclusive, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In view of the fact that the related text recognition technology is difficult to correctly recognize text images with the manuscript modification symbols, the embodiments of the present disclosure provide a text correction method, apparatus, device, and medium, which can be applied to any scene that needs to recognize text images, and have a better recognition effect on text images containing the manuscript modification symbols, so as to obtain a true expression of a user after modifying an original text with the manuscript modification symbols, thereby better ensuring accuracy and reliability of text recognition, and effectively improving user experience. For ease of understanding, embodiments of the present disclosure are described in detail as follows:
fig. 1 is a flowchart of a text correction method according to an embodiment of the present disclosure, where the method may be executed by a text correction apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 1, the method mainly includes the following steps S102 to S110:
step S102, a text image to be processed is obtained. The text image is an image containing various text characters such as chinese, english, etc., and may further contain a modification symbol, which may also be referred to as a document modification symbol or an article modification symbol, including but not limited to a delete number, a resume number, a call number, a change number, an addition number, a reset number, a hint number, a dispatch number, a start number, a merge number, a reduction number, a forward number, an empty number, a split number, etc., which are listed by way of example only and are not all listed. For ease of understanding, reference may be made to a schematic diagram of a modifier symbol shown in FIG. 2, which illustrates a representation of a portion of the modifier symbol.
In practical application, an image acquired by a camera of the electronic device may be used as a text image to be processed, and an image downloaded from a network or an image uploaded by a user through a designated port may also be used as a text image to be processed.
And step S104, detecting the text characters and the modification symbols contained in the text image by adopting a detection model obtained by pre-training to obtain a first position of the text characters and a second position of the modification symbols.
In some specific implementation modes, the first position and the second position are both expressed by pixel coordinates, that is, the detection model can obtain pixel coordinates of each text character and each modification symbol. In other embodiments, the positions of the text image and the modification symbol may be further characterized by the image box, such as by directly selecting the text character and the modification symbol in the text image in the form of an image box.
And step S106, extracting the content to be recognized from the text image based on the first position and the second position.
The content to be recognized includes text characters and modification symbols in the text image. In some embodiments, the positions of the text characters and the modification symbols in the text image are known, that is, the text characters and the modification symbols can be directly extracted based on the positions, and the extracted text characters and the modification symbols are used as the content to be identified; in other embodiments, based on the first position and the second position, the text characters and the modification symbols may be framed by using an image frame, and a framed region may be extracted, where the extracted region is used as the extracted content to be identified.
In addition, it is understood that, in a normal case, the text character and the modification symbol are combined together (a user usually adds the modification symbol directly on the basis of the original text to modify the original text), and the obtained content to be recognized may still be the combined text character and the modification symbol or may be the separated text character and the modification symbol. For example, if the frame selection area is used as the content to be recognized, the frame selection area may include text characters and modification symbols which are combined together, and then the whole content can be recognized; if the text characters and the modification symbols are directly extracted based on the positions as the content to be recognized, the extracted text characters and the modification symbols can be separated from each other and can be recognized independently subsequently.
And S108, recognizing the content to be recognized by using the recognition model obtained by pre-training to obtain a text character recognition result and a modified symbol recognition result.
In some embodiments, the text characters and the modification symbols in the content to be recognized are combined together, the obtained text character recognition result and the modification symbol recognition result are also combined together, and the combination mode of the text character recognition result and the modification symbol recognition result is consistent with the combination mode of the text characters and the modification symbols in the content to be recognized; for example, a modification symbol M is added between the text character A1 and the text character a2 in the text image, and the modification symbol recognition result M output by the recognition model is also inserted between the text character recognition results A1 and a2, i.e., A1MA 2. In other embodiments, the text characters and the modified symbols in the content to be recognized are separated from each other, and the obtained text character recognition result and the modified symbol recognition result may also be separated from each other, such as the recognition model outputting the text character recognition result A1a2 and the modified symbol recognition result M separately.
And step S110, correcting the text character recognition result based on the modified symbol recognition result.
After the modified symbol recognition result is obtained, a modification mode corresponding to the modified symbol recognition result can be searched, and then the text character recognition result is modified based on the modification mode. For example, the original text character recognition result is "teacher goes to class and lets me write a math question" and the modified symbol recognition result is "dialer", and the position of the text character and the position of the modified symbol obtained by detecting the model are known, "dialer" is related to "teacher goes to class", that is, the expression of "dialer" is: the positions of the teacher and the teaching are exchanged, so that the corrected result of the teacher giving lessons and writing mathematics questions is that the teacher giving lessons all day writes mathematics questions all day.
According to the text correction method provided by the embodiment of the disclosure, the text characters and the modification symbols in the text image can be detected and recognized by means of the model, the text character recognition result can be corrected based on the modification symbol recognition result, and the corrected text character recognition result is the real expression meaning of the user, so that the accuracy and reliability of text recognition are better guaranteed, and the user experience can be effectively improved.
It can be understood that, in the embodiment of the present disclosure, the text characters and the modification symbols in the text image can be detected through the detection model obtained by pre-training, and the specific contents of the text characters and the modification symbols are identified by using the recognition model, so as to facilitate understanding, the following description is made for the detection model and the recognition model respectively.
(I) detection model
The detection model provided by the embodiment of the disclosure can detect a text image to obtain text characters and modified symbols in the text image, and if the effect is to be achieved, the detection model obtained by the pre-training can be obtained by training according to the following steps:
(1) acquiring a first training sample set; the first set of training samples includes text image samples labeled with modified symbol positions and text character positions. It will be appreciated that the first training sample set will contain a plurality of text image samples, each of which is labeled with a modifier symbol position and a text character position. In some specific embodiments, the text image samples in the first training sample set may be randomly divided into two parts, and one part is used for training the neural network model to obtain a detection model; and the other part is used for verifying the detection effect of the detection model.
(2) And training a first neural network model for detection according to the first training sample set until a training end condition is met, and obtaining a detection model. In some embodiments, a preset loss function and a back propagation algorithm may be used to adjust parameters of the first neural network model, and the training is stopped until a training end condition is met, at which time the first neural network model may output a result meeting an expectation, and the first neural network model when the training is stopped is used as the detection model. The training end condition includes, for example, that the loss function converges to a specified degree and/or the number of times of model training reaches a preset value. In the loss function, the text character position and the modified symbol position output by the first neural network model and the real modified symbol position and the text character position marked on the text image sample can be calculated and compared, and the larger the difference is, the larger the loss value is, the parameter of the first neural network model still needs to be adjusted. In a specific implementation example, the loss function is a DIOU (Distance Intersection over Union) loss function, the image frame may be used to mark the text character position and the modified symbol position, and then one or more of the indexes of the model detection result, such as accuracy, recall, F1, etc., are calculated by calculating the output of the first neural network model and the DIOU (which may also be simply understood as a degree of overlap) between the marked frames of the text image sample, and whether the currently obtained model meets the requirement is judged according to the indexes.
The detection model obtained by training in the above way can accurately and reliably detect the text image to obtain the first position of the text character and the second position of the modification symbol in the text image, and in some embodiments, the detection model directly outputs the pixel point coordinates of the text character and the modification symbol in the text image; in other embodiments, the detection model outputs a boxed region of the text character and the modification symbol in the text image.
The network structure of the detection model is not limited in the embodiments of the present disclosure, such as being implemented by using a Yolov5 model structure, and of course, being implemented by using network structures of other target detection models.
(II) recognition model
The recognition model provided by the embodiment of the disclosure can recognize the content to be recognized (text characters and modification symbols in the text image) to obtain a text character recognition result and a modification symbol recognition result. If the effect is to be achieved, the recognition model obtained by the pre-training can be obtained by training according to the following steps:
(1) acquiring a second training sample set; the second training sample set comprises text image samples marked with recognition results; the recognition result includes a recognition result of the modified symbol included in the text image sample and a recognition result of the text character. It will be appreciated that the second set of training samples will contain a plurality of text image samples, each labeled with a modified symbol recognition result and a text character recognition result. Illustratively, the text image sample may be directly labeled with the type of the modification symbol (such as a deletion number, a tone mark, etc.) as the result of identifying the modification symbol, or may be labeled with the type code of the modification symbol (such as a code corresponding to the type of each modification symbol is preset, and illustratively, the type code of the deletion number is/rmv) as the result of identifying the modification symbol.
In some specific embodiments, the text image samples in the second training sample set may be randomly divided into two parts, and one part is used for training the neural network model to obtain the recognition model; and the other part is used for verifying the recognition effect of the recognition model.
In addition, in order to improve the acquisition efficiency of the training samples, in some embodiments, the text image samples selected in the second training sample set for training the recognition model and the first training sample set for training the detection model may be the same, but the labeling content is different.
(2) And training a second neural network model for recognition according to the second training sample set until a training end condition is met, and obtaining a recognition model. In some embodiments, the parameters of the second neural network model may be adjusted by using a preset loss function and a back propagation algorithm, and the training is stopped until a training end condition is met, at this time, the second neural network model may output a result meeting an expectation, and the second neural network model when the training is stopped is used as the recognition model. The training end condition includes, for example, that the loss function converges to a specified degree and/or the number of times of model training reaches a preset value. It should be noted that the training end condition corresponding to the detection model may be referred to as a first training end condition, the training end condition when the recognition model is trained may be referred to as a second training end condition, and the first training end condition and the second training end condition may be the same or different. In the loss function, the text character recognition result and the modified symbol recognition result output by the second neural network model and the real text character recognition result and the modified symbol recognition result marked on the text image sample can be compared through calculation, and if the difference is larger, the loss value is larger, which indicates that the parameters of the second neural network model still need to be adjusted. In a specific implementation example, the loss function is related to a CER (character error Rate), a CER index of the second neural network model can be calculated according to the recognition result output by the second neural network model and the recognition result labeled by the text image sample, and whether the currently obtained model meets the requirement is judged through the CER index.
The recognition model obtained by training in the above way can more accurately and reliably recognize the content to be recognized, and obtain the recognition result of the text character and the recognition result of the modified symbol in the text image.
In some embodiments, the step of recognizing the content to be recognized by using the recognition model obtained by pre-training to obtain the text character recognition result and the modified symbol recognition result (i.e. the step S108) mainly includes the following two steps: the method comprises the following steps: carrying out region division on the content to be recognized in a characteristic form by using a recognition model obtained by pre-training to obtain a first region characteristic vector sequence; step two: and decoding according to the first region feature vector sequence to obtain a text character recognition result and a modification symbol recognition result. The steps can be realized by adopting a recognition model obtained by pre-training. By the above-mentioned manner of firstly performing region division according to the characteristic form and then decoding the vector sequence obtained by division, the identification result can be comprehensively and reliably analyzed. For convenience of understanding, a specific embodiment that the recognition model recognizes the content to be recognized to obtain the text character recognition result and the modified symbol recognition result is given below, and may be implemented by referring to the following steps 1 to 3 (where the following step 1 corresponds to the foregoing step one, and the following steps 2 and 3 correspond to the foregoing step two together).
Step 1, carrying out region division on the content to be recognized in a characteristic mode by using a recognition model obtained by pre-training to obtain a first region characteristic vector sequence.
In some implementations, the recognition model can treat the recognized content first (assume I)n) Extracting features to obtain multiple feature vectors, dividing the feature vectors according to regions to obtain a feature vector (region feature vector) corresponding to each region, forming a vector sequence by the multiple region feature vectors, and using [ a ] for distinguishing from subsequent vector sequences, the vector sequence is referred to as a first region feature vector sequence1,a2,…aW*H]Characterization, each vector representing InA regional characteristic of (1).
And 2, generating a second region feature vector sequence which shows the position relation among the region feature vectors based on the attention mechanism algorithm and the first region feature vector sequence. The recognition model is provided with an attention mechanism (also called as an attention mechanism configuration algorithm), the first region feature vector sequence is processed by the attention mechanism algorithm, and a second region feature vector sequence can be obtained, wherein the region feature vectors in the second region feature vector sequence all represent position relations.
In some embodiments, the step 2 above mainly comprises: step 2a, converting the feature vectors in the first region feature vector sequence into two-dimensional feature vectors with a specified number of feature points; and 2b, generating a second region feature vector sequence which shows the position relation among the region feature vectors based on the attention mechanism algorithm and the two-dimensional feature vectors. The above manner can utilize the attention mechanism algorithm to process the two-dimensional feature vectors with the specified number of feature points, so that the position relationship among the feature vectors of each region can be more accurately determined, and the subsequent second region feature vector sequence which embodies the position relationship among the feature vectors of each region can be further effectively identified.
For ease of understanding, reference may be made to the following steps 2.1 to 2.3 (where step 2.1 and step 2.2 together correspond to the aforementioned step 2a and step 2.3 corresponds to the aforementioned step 2 b), for example.
And 2.1, converting the feature vectors in the first region feature vector sequence into three-dimensional feature vectors. Such as, InHas a size of [ WXH]Performing a plurality of groups of convolution pooling operations on the feature vectors in the first region feature vector sequence to obtain [ W '× H' × 512]The three-dimensional feature vector of (1).
And 2.2, processing the three-dimensional characteristic vector by adopting matrix transformation operation to obtain a two-dimensional characteristic vector with a specified number of characteristic points. The matrix transformation operation may also be referred to as a reshape operation, and is used to adjust the dimension and shape of the matrix, and the reshape operation may convert the three-dimensional feature vector into a two-dimensional feature vector. For example, reshape operation is performed on the three-dimensional feature vector of [ W '× H' × 512] to obtain a two-dimensional feature vector having 512 feature points, which can also be referred to as obtaining a two-dimensional feature vector sequence with feature number of 512 and slice of [ W '× H' ].
And 2.3, adding a position code to the two-dimensional feature vector based on an attention mechanism algorithm, and generating a second region feature vector sequence which shows the position relation among the region feature vectors. In order to compensate for the problem that the reshape operation can cause the position information of the image to be lost, a attention mechanism algorithm is used for adding position codes to the two-dimensional feature vectors, namely, the position of each feature vector is coded, corresponding position codes are marked, and the dimension of the position codes is consistent with the feature vectors so as to facilitate splicing or unit operation. In some embodiments, the position at position p encodes the corresponding 2 i-th and 2i + 1-th features v2iAnd v2i+1Expressed as:
Figure 327191DEST_PATH_IMAGE001
where f is a constant, which may take the value of 10000 in some embodiments. The purpose of using 2i in the above description is to correspond to a sinusoid for each position code, the wavelengths forming multiples of 2 pi, thereby facilitating model learning. In addition, the vector variation trend at the position p + k can be expressed based on a sine formula and a cosine formula, the position relation among all the characteristic vectors is determined through a sine formula and a cosine formula, and a second region characteristic vector sequence [ beta ] showing the position relation among all the region characteristic vectors is generated12,…βW*H]。
The processes of step 1 and step 2 may be referred to as encoding processes, and may be implemented by an encoder for identifying a model. For understanding, referring to a coding schematic diagram shown in fig. 3, the content to be identified is a region "today teacher class and lets me write math problem", which is simply illustrated as being divided into 4 regions, and a region feature vector sequence (i.e., a second region feature vector sequence) is obtained by coding.
And 3, decoding the second region feature vector sequence to obtain a text character recognition result and a modified symbol recognition result.
In some embodiments, for β, = { β =, { β =12,…βW*HDecoding to generate a prediction result Y = { Y = }1,y2,…ymWherein p (y)p+1|y1,y2,…yi,β)=softmax(ω*h+b∈RK). ω represents the weight of the linear layer, h represents the remaining connections, b represents the deviation, and K represents the word vector table. The decoding process may be implemented by a decoder that recognizes the model. Prediction result Y = { Y = { (Y)1,y2,…ymThat is, the recognition result containing the text character recognition result and the modification symbol recognition result can be regarded as the combined text character recognition result and modification symbol recognition result.
To facilitate subsequent processing, in some embodiments, the modified symbol recognition result includes a type encoding of the modified symbol. That is, the modification symbol is represented by type coding. In practical applications, each type (i.e., each type) of modification symbol may be configured with a corresponding type code in advance, and then directly represented in the recognition result by the type code.
After the steps of recognizing the content to be recognized by using the recognition model obtained by pre-training to obtain the text character recognition result and the modification symbol recognition result, the method further comprises the following steps: and according to the relative position relation between the first position and the second position, marking a target text character related to the modification symbol in the text character recognition result, and adopting the type code of the modification symbol to replace the modification symbol appearing in the target text character. In some embodiments, if the first position of the text character (such as the pixel coordinates of each text character) and the second position of the modification symbol (such as the pixel coordinates of each modification symbol) are known, the target text character associated with each modification symbol can be determined according to a preset rule, including but not limited to: the distance between the first position and the second position is within a preset distance threshold range, wherein the preset distance threshold ranges corresponding to different modification symbols may be different. After the target text character associated with the modification symbol is found, the type code of the modification symbol may be used to replace the modification symbol present in the target text character for subsequent processing.
For easy understanding, in some embodiments, the step of marking out a target text character related to a modification symbol in the text character recognition result, and using type coding of the modification symbol to replace the modification symbol appearing in the target text character may be implemented by referring to the following steps (1) - (3):
(1) a plurality of target text characters associated with the modification symbol are determined based on the type code, the first position, and the second position of the modification symbol. For example, a preset distance threshold range may be determined according to the type code of the modification symbol, and then a text character with the first position within the distance threshold range from the second position is taken as a target text character related to the modification symbol according to the first position of the text character output by the detection model and the second position of the modification symbol.
(2) A preset modification start is inserted between the first target text character and its adjacent non-target text character and a preset modification stop is inserted between the last target text character and its adjacent non-target text character to mark the character between the modification start and modification stop as the target text character associated with the modification symbol. That is, by marking the modification start character and the modification stop character at the head end and the tail end of the target text character, respectively, the character sandwiched between the modification start character and the modification stop character is the target text character.
(3) And according to the relative position relation of the first position and the second position, determining the insertion positions of the type codes of the modification symbols among the target text characters, and inserting the type codes of the modification symbols among the target text characters according to the insertion positions to replace the modification symbols appearing in the target text characters.
For the sake of understanding, referring to a schematic diagram of character insertion positions shown in fig. 4, a "teacher" in "teacher class-today" having i write a mathematical problem "and a" class-in "having a modification symbol" pair key ", the target text character associated with the modification symbol" pair key "is confirmed as" teacher class ", at this time, a preset modification start"/start "is inserted between the first text character" old "and the non-target text character" day "adjacent thereto, and a preset modification stop"/end "is inserted between the last target text character" class "and the non-target text character" let "adjacent thereto, so that the character" teacher class-in "between the modification start"/start "and the modification stop"/end "is regarded as the target text character labeled as being associated with the modification symbol. In addition, the type code of the 'opposite signature'/swap 'can be added according to the position of the' opposite signature 'in the' teaching of the teacher today 'to let me write the math problem', so as to replace the 'opposite signature' in the follow-up process, and the final obtained effect is 'today/start teacher/swap class/end to let me write the math problem'.
In some embodiments, the step of marking out the target text character related to the modification symbol in the text character recognition result and using the type code of the modification symbol to replace the modification symbol appearing in the target text character may be specifically performed by the recognition model, where the recognition model outputs the text character recognition result and the modification symbol recognition result combined together, that is, the modification symbol recognition result is directly marked in the form of the type code at the corresponding position of the target text character in the text character recognition result to replace the originally appearing modification symbol. In other embodiments, the recognition model may independently output the text character recognition result and the modification symbol recognition result, and then perform post-processing on the independent text character recognition result and the modification symbol recognition result by using the above method, that is, mark out a target text character related to the modification symbol in the text character recognition result, and replace the modification symbol in the target text character by using the type code of the modification symbol.
On the basis, the step of correcting the text character recognition result based on the modified symbol recognition result comprises the following steps: searching a modification mode corresponding to the type code; and correcting the target text character based on the modification mode. For example, the modification mode corresponding to the type code "/swap" is "the position of adjacent characters, words or phrases is changed", at this time, the target text character "teacher goes lesson" in "the teacher goes lesson and lets me write the math question", and the modified text character recognition result is "the teacher goes lesson today" and lets me write the math question ". In practical application, the step of searching for the modification mode corresponding to the type code includes: and searching a modification mode corresponding to the type coding through a coding table. The coding table records the corresponding relation between the modification symbols of each type and the preset type coding and modification modes respectively.
In practical applications, the coding table may be constructed in advance, and for easy understanding, the coding table may be exemplarily referred to table 1:
name (R) Type coding Modification mode
Delete number /rmv Deleting words, punctuation, words, phrases, long sentences or paragraphs
Recovery number (also known as reservation number) /rov Recovering deleted words or symbols
Number of opposite tones /swap Exchanging position of adjacent words, phrases, or phrases
Number of correction /corr Correcting incorrect words or symbols to be correct
Add number /add Adding new characters or symbols between characters or sentences or paragraphs
Dispatching number /mov Long distance regulating and shifting character, punctuation mark, word, sentence and segment
Number of segments /n Dividing a segment of characters into two segments to represent another segment
Parallel section number /sp The following text, when appended above, indicates that it should not be segmented
Reduction number /tab The top grid characters of a line are contracted into two grids to represent another starting section, and the characters are extended and moved backwards
Antedisplacement number /back Character advancing or top grid
Start mark /start Start of symbol
End mark /end End of symbol
TABLE 1
It should be understood that the above table 1 is only an exemplary illustration of the correspondence relationship between some modification symbols and type encoding and modification modes, and not all the modification symbols are listed. In addition, the type coding corresponding to the start flag (modification start symbol) and the end flag (modification end symbol) of the modification symbol is also shown in the above coding table.
In practical application, refer to a kind shown in fig. 5Decoding schematic diagram, on the basis of fig. 3, illustrating that the second region feature vector sequence [ beta ] is obtained12,…βW*H]And then decoding the text character to obtain a final corrected text character recognition result. In fig. 5, two decoding processes are illustrated, the first decoding process being performed by the recognition model for the second region feature vector sequence [ β [ ]12,…βW*H]Decoding to obtain a combined text character recognition result and a modified symbol recognition result: "today/start teacher/swap class/end lets me write math problem", then the post-processing unit is adopted to decode the result of the first decoding (i.e. the output result of the recognition model) for the second time, specifically, the type code, the modification start symbol and the modification end symbol in the first decoding result can be recognized, the modification mode of the target text character to be modified and the modification symbol is determined, thereby modifying the first decoding result, and finally obtaining "today teacher lets me write math problem".
On the basis of the foregoing, an embodiment of the present disclosure provides a text modification system, and referring to a schematic structural diagram of a text modification system shown in fig. 6, the text modification system includes a detection model, a recognition model and a post-processing unit, where an input of the system (i.e., an input of the detection model) is a text image, and an output of the system (i.e., an output of the post-processing unit) is a modified text character recognition result. The specific implementation manner and output result of the detection model and the recognition model can be as described above, and are not described herein again; in addition, the content of the output of the recognition model is briefly summarized as a recognition result in fig. 6, and the recognition result includes a text character recognition result and a modified symbol recognition result; the post-processing unit can also be called as a post-processing module and is mainly used for correcting the text character recognition result based on the modified symbol recognition result to obtain a corrected text character recognition result. On the basis of fig. 6, the embodiment of the present disclosure further provides a specific implementation example of a text modification system, and refer to a schematic structural diagram of another text modification system shown in fig. 7, which includes a Yolo5 network model, a Seq2Seq network model with attention mechanism, and a post-processing unit. Wherein, the Yolo5 network model is a detection model, and the Seq2Seq network model with attention mechanism is an identification model. Of course, fig. 7 is only an example, and the detection model and the recognition model may be implemented by using other network structures, which is not limited herein.
It should be understood that fig. 6 and 7 are only exemplary, and in practical applications, the text modification system may further include more functional modules, which are not limited herein.
Based on the text correction system shown in fig. 7, referring to a flowchart of a text correction method shown in fig. 8, the method includes the following steps S802 to S814:
step S802, acquiring a text image to be processed;
step S804, detecting the text image by using a Yolo5 network model to obtain pixel point coordinates of text characters and pixel point coordinates of modification symbols in the text image;
step S806, extracting the content to be identified based on the pixel point coordinates of the text characters in the text image and the pixel point coordinates of the modification symbols;
step S808, identifying the content to be identified by using a Seq2Seq network model with an attention mechanism to obtain a text character identification result and a type code of a modification symbol;
step S810, marking a target text character related to a modification symbol in the text character recognition result by adopting a modification start character and a modification stop character, and replacing the modification symbol appearing in the target text character by adopting type coding of the modification symbol;
step S812, searching a modification mode corresponding to the type code through a code table;
step S814, the target text character is corrected based on the modification mode, so as to obtain a corrected text character recognition result.
By the text modification method, the text characters and the modification symbols in the text image can be detected and recognized by means of the model, the target text characters related to the modification symbols can be visually and clearly marked out by the modification start symbols and the modification stop symbols, and the modification symbols appearing in the target text characters are replaced by the type codes of the modification symbols, so that subsequent modification is facilitated, the modified text character recognition result is the real expression meaning of a user, the accuracy and the reliability of text recognition are well guaranteed, and the user experience can be effectively improved.
Corresponding to any one of the text modification methods, the embodiment of the disclosure also provides a text modification device, which can be implemented by software and/or hardware and can be generally integrated in an electronic device. Referring to a block diagram of a structure of a text correction apparatus shown in fig. 9, the text correction apparatus 900 mainly includes the following modules:
an image obtaining module 902, configured to obtain a text image to be processed;
a detection module 904, configured to detect a text character and a modification symbol included in a text image by using a detection model obtained through pre-training to obtain a first position of the text character and a second position of the modification symbol;
a content extraction module 906, configured to extract content to be identified from the text image based on the first position and the second position;
the recognition module 908 is configured to recognize the content to be recognized by using a recognition model obtained through pre-training, so as to obtain a text character recognition result and a modified symbol recognition result;
and a modification module 910, configured to modify the text character recognition result based on the modified symbol recognition result.
The text correction device provided by the embodiment of the disclosure can detect and recognize text characters and modification symbols in a text image by means of the model, and can correct the text character recognition result based on the modification symbol recognition result, and the corrected text character recognition result is the real expression meaning of a user, so that the accuracy and reliability of text recognition are better guaranteed, and the user experience can be effectively improved.
In some embodiments, the identification module 908 is specifically configured to: carrying out region division on the content to be recognized in a characteristic form by using a recognition model obtained by pre-training to obtain a first region characteristic vector sequence; and decoding according to the first region feature vector sequence to obtain a text character recognition result and a modification symbol recognition result.
In some embodiments, the identification module 908 is specifically configured to: generating a second region feature vector sequence which shows the position relation among the feature vectors of the regions based on the attention mechanism algorithm and the first region feature vector sequence; and decoding the second region feature vector sequence to obtain a text character recognition result and a modified symbol recognition result.
In some embodiments, the identification module 908 is specifically configured to: converting the feature vectors in the first region feature vector sequence into two-dimensional feature vectors with a specified number of feature points; and generating a second region feature vector sequence which shows the position relation among the region feature vectors based on the attention mechanism algorithm and the two-dimensional feature vectors.
In some embodiments, the identification module 908 is specifically configured to: converting the feature vectors in the first region feature vector sequence into three-dimensional feature vectors; processing the three-dimensional characteristic vector by adopting matrix transformation operation to obtain a two-dimensional characteristic vector with a specified number of characteristic points; and adding position codes to the two-dimensional feature vectors based on an attention mechanism algorithm to generate a second region feature vector sequence which shows the position relation among the region feature vectors.
In some embodiments, modifying the symbol recognition result comprises modifying a type encoding of the symbol; the identification module 908 is further configured to: and according to the relative position relation between the first position and the second position, marking a target text character related to the modification symbol in the text character recognition result, and adopting the type code of the modification symbol to replace the modification symbol appearing in the target text character.
In some embodiments, the identification module 908 is further configured to: determining a plurality of target text characters related to the modification symbol according to the type code of the modification symbol and the relative position relation between the first position and the second position; inserting a preset modification start character between the first target text character and the non-target text character adjacent to the first target text character, and inserting a preset modification stop character between the last target text character and the non-target text character adjacent to the last target text character, so as to mark the character between the modification start character and the modification stop character as the target text character related to the modification symbol; and according to the relative position relation of the first position and the second position, determining the insertion positions of the type codes of the modification symbols among the target text characters, and inserting the type codes of the modification symbols among the target text characters according to the insertion positions to replace the modification symbols appearing in the target text characters.
In some embodiments, the modification module 910 is specifically configured to: searching a modification mode corresponding to the type code; and correcting the target text character based on the modification mode.
In some embodiments, the above apparatus further comprises: the coding table constructing module is used for constructing a coding table; the coding table records the corresponding relation between the modification symbols of each type and the preset type codes and modification modes respectively; on this basis, the modification module 910 is specifically configured to: and searching a modification mode corresponding to the type coding through a coding table.
In some embodiments, the apparatus further comprises a detection model training module for obtaining a first training sample set; the first training sample set comprises text image samples marked with modified symbol positions and text character positions; and training a first neural network model for detection according to the first training sample set until a training end condition is met, and obtaining a detection model.
In some embodiments, the apparatus further comprises a recognition model training module for obtaining a second training sample set; the second training sample set comprises text image samples marked with recognition results; the recognition result comprises a recognition result of a modification symbol contained in the text image sample and a recognition result of a text character; and training a second neural network model for recognition according to the second training sample set until a training end condition is met, and obtaining a recognition model.
The text correction device provided by the embodiment of the disclosure can execute the text correction method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatus embodiments may refer to corresponding processes in the method embodiments, and are not described herein again.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
An exemplary embodiment of the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, is for causing the electronic device to perform a method according to an embodiment of the disclosure.
The disclosed exemplary embodiments also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.
The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the text correction method provided by embodiments of the present disclosure. The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Referring to fig. 10, a block diagram of a structure of an electronic device 1000, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 10, the electronic device 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored. The calculation unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
A number of components in the electronic device 1000 are connected to the I/O interface 1005, including: input section 1006, output section 1007, storage section 1008, and communication section 1009. The input unit 1006 may be any type of device capable of inputting information to the electronic device 1000, and the input unit 1006 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 1007 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 1004 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 1009 allows the electronic device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers, and/or chipsets, such as bluetooth (TM) devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.
Computing unit 1001 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1001 executes the respective methods and processes described above. For example, in some embodiments, the text correction method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto electronic device 1000 via ROM 1002 and/or communications unit 1009. In some embodiments, the computing unit 1001 may be configured to perform the text correction method in any other suitable manner (e.g., by means of firmware).
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. A text correction method, comprising:
acquiring a text image to be processed;
detecting text characters and modification symbols contained in the text image by adopting a detection model obtained by pre-training to obtain a first position of the text characters and a second position of the modification symbols;
extracting content to be identified from the text image based on the first position and the second position;
recognizing the content to be recognized by using a recognition model obtained by pre-training to obtain a text character recognition result and a modification symbol recognition result;
and correcting the text character recognition result based on the modified symbol recognition result.
2. The method of claim 1, wherein the step of recognizing the content to be recognized by using the recognition model obtained by pre-training to obtain the recognition result of the text character and the recognition result of the modified symbol comprises:
and carrying out region division on the content to be recognized in a characteristic form by using a recognition model obtained by pre-training to obtain a first region characteristic vector sequence, and decoding according to the first region characteristic vector sequence to obtain a text character recognition result and a modification symbol recognition result.
3. The text correction method of claim 2, wherein the step of decoding according to the first region feature vector sequence to obtain a text character recognition result and a modified symbol recognition result comprises:
generating a second region feature vector sequence which shows the position relation among the region feature vectors based on an attention mechanism algorithm and the first region feature vector sequence;
and decoding the second region feature vector sequence to obtain a text character recognition result and a modified symbol recognition result.
4. The text correction method of claim 3, wherein the step of generating a second sequence of region feature vectors that embodies the positional relationship between the region feature vectors based on the attention mechanism algorithm and the first sequence of region feature vectors comprises:
converting the feature vectors in the first region feature vector sequence into two-dimensional feature vectors with a specified number of feature points;
and generating a second region feature vector sequence which shows the position relation among the region feature vectors based on the attention mechanism algorithm and the two-dimensional feature vectors.
5. The text correction method of claim 4, wherein the step of converting the feature vector in the first sequence of region feature vectors into a two-dimensional feature vector having a specified number of feature points comprises:
converting the feature vectors in the first region feature vector sequence into three-dimensional feature vectors;
processing the three-dimensional characteristic vector by adopting matrix transformation operation to obtain a two-dimensional characteristic vector with a specified number of characteristic points;
the step of generating a second region feature vector sequence representing a positional relationship between the region feature vectors based on the attention mechanism algorithm and the two-dimensional feature vector includes:
and adding a position code to the two-dimensional feature vector based on an attention mechanism algorithm to generate a second region feature vector sequence which shows the position relation among the region feature vectors.
6. The text correction method according to any one of claims 1 to 5, wherein the modification symbol recognition result includes a type code of a modification symbol;
after the steps of recognizing the content to be recognized by using the recognition model obtained by pre-training to obtain a text recognition result and modifying a symbol recognition result, the method further comprises:
and according to the relative position relation between the first position and the second position, marking a target text character related to the modification symbol in the text recognition result, and adopting the type code of the modification symbol to replace the modification symbol appearing in the target text character.
7. The text correction method according to claim 6, wherein the step of labeling a target text character related to the modification symbol in the text recognition result based on the relative positional relationship between the first position and the second position and replacing the modification symbol appearing in the target text character with a type code of the modification symbol comprises:
determining a plurality of target text characters related to the modification symbol according to the type code of the modification symbol and the relative position relation between the first position and the second position;
inserting a preset modification start character between a first one of the target text characters and a non-target text character adjacent thereto, and inserting a preset modification stop character between a last one of the target text characters and a non-target text character adjacent thereto, so as to mark a character between the modification start character and the modification stop character as a target text character associated with the modification symbol;
and according to the relative position relation between the first position and the second position, determining the insertion positions of the type codes of the modifying symbols among the target text characters, and inserting the type codes of the modifying symbols among the target text characters according to the insertion positions to replace the modifying symbols appearing in the target text characters.
8. The text correction method of claim 6, wherein the step of correcting the text character recognition result based on the modifier symbol recognition result comprises:
searching a modification mode corresponding to the type code;
and correcting the target text character based on the modification mode.
9. The text correction method of claim 8, wherein the method further comprises:
constructing a coding table; the coding table records the corresponding relation between the modification symbols of each type and the preset type coding and modification modes respectively;
the step of searching for the modification mode corresponding to the type code includes:
and searching the modification mode corresponding to the type code through the code table.
10. The text correction method of claim 1, wherein the pre-trained detection model is trained according to the following steps:
acquiring a first training sample set; the first training sample set comprises text image samples marked with modified symbol positions and text character positions;
and training a first neural network model for detection according to the first training sample set until a training end condition is met, and obtaining a detection model.
11. The text correction method of claim 1, wherein the pre-trained recognition model is trained according to the following steps:
acquiring a second training sample set; the second training sample set comprises text image samples marked with recognition results; the recognition result comprises a recognition result of a modification symbol contained in the text image sample and a recognition result of a text character;
and training a second neural network model for recognition according to the second training sample set until a training end condition is met to obtain a recognition model.
12. A text correction apparatus, comprising:
the image acquisition module is used for acquiring a text image to be processed;
the detection module is used for detecting the text characters and the modification symbols contained in the text image by adopting a detection model obtained by pre-training to obtain a first position of the text characters and a second position of the modification symbols;
the content extraction module is used for extracting content to be identified from the text image based on the first position and the second position;
the recognition module is used for recognizing the content to be recognized by using a recognition model obtained by pre-training to obtain a text character recognition result and a modified symbol recognition result;
and the correction module is used for correcting the text character recognition result based on the modified symbol recognition result.
13. An electronic device, comprising:
a processor; and
a memory for storing a program, wherein the program is stored in the memory,
wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the text correction method according to any one of claims 1-11.
14. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the text correction method according to any one of claims 1 to 11.
CN202110775077.6A 2021-07-09 2021-07-09 Text correction method, device, equipment and medium Active CN113255652B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110775077.6A CN113255652B (en) 2021-07-09 2021-07-09 Text correction method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110775077.6A CN113255652B (en) 2021-07-09 2021-07-09 Text correction method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN113255652A true CN113255652A (en) 2021-08-13
CN113255652B CN113255652B (en) 2021-10-29

Family

ID=77191026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110775077.6A Active CN113255652B (en) 2021-07-09 2021-07-09 Text correction method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113255652B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807416A (en) * 2021-08-30 2021-12-17 国泰新点软件股份有限公司 Model training method and device, electronic equipment and storage medium
CN114089676A (en) * 2021-11-23 2022-02-25 中国航空工业集团公司洛阳电光设备研究所 Key symbol monitoring link and monitoring method
CN115984470A (en) * 2022-12-30 2023-04-18 江苏创英医疗器械有限公司 Dental implant modeling method and system based on image recognition technology
WO2023138361A1 (en) * 2022-01-21 2023-07-27 北京有竹居网络技术有限公司 Image processing method and apparatus, and readable storage medium and electronic device
WO2023222097A1 (en) * 2022-05-20 2023-11-23 华为技术有限公司 Text recognition method and related apparatus

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5907852A (en) * 1995-02-01 1999-05-25 Nec Corporation Document editing apparatus
CN101452444A (en) * 2007-12-04 2009-06-10 哈尔滨工业大学深圳研究生院 Rapid editing and typesetting method for handwriting information and edition symbol identification method
CN102664832A (en) * 2012-05-21 2012-09-12 李艳平 Method and device for commenting and additionally modifying message
CN110135427A (en) * 2019-04-11 2019-08-16 北京百度网讯科技有限公司 The method, apparatus, equipment and medium of character in image for identification
CN110633673A (en) * 2019-09-17 2019-12-31 江苏科技大学 Handwriting track modification method
CN112749695A (en) * 2019-10-31 2021-05-04 北京京东尚科信息技术有限公司 Text recognition method and device
CN112883968A (en) * 2021-02-24 2021-06-01 北京有竹居网络技术有限公司 Image character recognition method, device, medium and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5907852A (en) * 1995-02-01 1999-05-25 Nec Corporation Document editing apparatus
CN101452444A (en) * 2007-12-04 2009-06-10 哈尔滨工业大学深圳研究生院 Rapid editing and typesetting method for handwriting information and edition symbol identification method
CN102664832A (en) * 2012-05-21 2012-09-12 李艳平 Method and device for commenting and additionally modifying message
CN110135427A (en) * 2019-04-11 2019-08-16 北京百度网讯科技有限公司 The method, apparatus, equipment and medium of character in image for identification
CN110633673A (en) * 2019-09-17 2019-12-31 江苏科技大学 Handwriting track modification method
CN112749695A (en) * 2019-10-31 2021-05-04 北京京东尚科信息技术有限公司 Text recognition method and device
CN112883968A (en) * 2021-02-24 2021-06-01 北京有竹居网络技术有限公司 Image character recognition method, device, medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王国余等: "基于神经网络的手写电气元件符号识别系统", 《江苏理工大学学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807416A (en) * 2021-08-30 2021-12-17 国泰新点软件股份有限公司 Model training method and device, electronic equipment and storage medium
CN113807416B (en) * 2021-08-30 2024-04-05 国泰新点软件股份有限公司 Model training method and device, electronic equipment and storage medium
CN114089676A (en) * 2021-11-23 2022-02-25 中国航空工业集团公司洛阳电光设备研究所 Key symbol monitoring link and monitoring method
WO2023138361A1 (en) * 2022-01-21 2023-07-27 北京有竹居网络技术有限公司 Image processing method and apparatus, and readable storage medium and electronic device
WO2023222097A1 (en) * 2022-05-20 2023-11-23 华为技术有限公司 Text recognition method and related apparatus
CN115984470A (en) * 2022-12-30 2023-04-18 江苏创英医疗器械有限公司 Dental implant modeling method and system based on image recognition technology
CN115984470B (en) * 2022-12-30 2023-11-28 江苏创英医疗器械有限公司 Dental implant modeling method and system based on image recognition technology

Also Published As

Publication number Publication date
CN113255652B (en) 2021-10-29

Similar Documents

Publication Publication Date Title
CN113255652B (en) Text correction method, device, equipment and medium
CN110363252B (en) End-to-end trend scene character detection and identification method and system
CN110795938B (en) Text sequence word segmentation method, device and storage medium
CN113360699B (en) Model training method and device, and image question-answering method and device
CN113254654B (en) Model training method, text recognition method, device, equipment and medium
CN113205160B (en) Model training method, text recognition method, model training device, text recognition device, electronic equipment and medium
CN116543404A (en) Table semantic information extraction method, system, equipment and medium based on cell coordinate optimization
CN110610180A (en) Method, device and equipment for generating recognition set of wrongly-recognized words and storage medium
CN113177435A (en) Test paper analysis method and device, storage medium and electronic equipment
CN109753976B (en) Corpus labeling device and method
CN114218940B (en) Text information processing and model training method, device, equipment and storage medium
CN114581926A (en) Multi-line text recognition method, device, equipment and medium
CN113688955B (en) Text recognition method, device, equipment and medium
CN110858307A (en) Character recognition model training method and device and character recognition method and device
CN112836013A (en) Data labeling method and device, readable storage medium and electronic equipment
CN111126059B (en) Short text generation method, short text generation device and readable storage medium
CN116909435A (en) Data processing method and device, electronic equipment and storage medium
CN115273057A (en) Text recognition method and device, dictation correction method and device and electronic equipment
CN110851597A (en) Method and device for sentence annotation based on similar entity replacement
CN115294581A (en) Method and device for identifying error characters, electronic equipment and storage medium
CN114627464A (en) Text recognition method and device, electronic equipment and storage medium
CN110888976B (en) Text abstract generation method and device
CN113886748A (en) Method, device and equipment for generating editing information and outputting information of webpage content
CN113052156A (en) Optical character recognition method, device, electronic equipment and storage medium
CN117236314B (en) Information extraction method, system, device and storage medium supporting super-long answers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant