CN112036405A - Detection and identification method for handwritten document text - Google Patents
Detection and identification method for handwritten document text Download PDFInfo
- Publication number
- CN112036405A CN112036405A CN202010896671.6A CN202010896671A CN112036405A CN 112036405 A CN112036405 A CN 112036405A CN 202010896671 A CN202010896671 A CN 202010896671A CN 112036405 A CN112036405 A CN 112036405A
- Authority
- CN
- China
- Prior art keywords
- text
- text line
- network
- picture
- line
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000012952 Resampling Methods 0.000 claims abstract description 10
- 239000011159 matrix material Substances 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 18
- 238000013527 convolutional neural network Methods 0.000 claims description 16
- 230000009466 transformation Effects 0.000 claims description 12
- 238000002372 labelling Methods 0.000 claims description 6
- 238000003672 processing method Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 125000004122 cyclic group Chemical group 0.000 claims description 4
- 238000013519 translation Methods 0.000 claims description 4
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 238000003704 image resize Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 6
- 238000013135 deep learning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 206010047571 Visual impairment Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 208000029257 vision disease Diseases 0.000 description 1
- 230000004393 visual impairment Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention particularly relates to a detection and identification method of a handwritten document text. The detection and identification method of the handwritten document text comprises two parts, namely text line positioning and text line detection; the text line positioning network uses the deformed VGG-11 to train a picture through the network, so that the possible starting position of the text line is found on the picture; the text line detection network incrementally transmits the text line along the text line in the forward direction, the obtained starting position and the rotation angle of the text line are obtained, a viewing window is obtained by resampling, the viewing window is input to a CNN network to carry out regression to obtain the rotation angle of the next position until the edge of the picture is reached, a normalized text line picture is finally generated and is input to a text line identification network to identify the text line picture and output an identification result. The method for detecting and identifying the handwritten document text not only can overcome interference factors in natural scenes and accurately detect and identify the text, but also can accurately advance recursively along the extension direction of the text lines, and finally detects the curved text lines.
Description
Technical Field
The invention relates to the technical field of deep learning, in particular to a detection and identification method for a handwritten document text.
Background
The problem of text block position detection in complex color images in natural scenes is firstly proposed at the end of the twentieth century. The solution to the problem has great economic and cultural benefits, so the problem quickly becomes a hotspot in the fields of computer vision and document analysis. In the decades after the above problems have been addressed, various text detection and recognition methods have been proposed.
For text detection, there are currently mainly the following methods:
1. based on a capability minimization method, most methods are based on a conditional random field and a Markov random field, and the problem of detecting text lines is regarded as an energy minimization problem so as to solve the interference between the text lines;
2. the method based on the connected domain has the core idea that small parts are found to form large parts, then non-character parts are removed through a classifier, and finally characters are extracted from an image and combined into a character area, wherein the method based on the connected domain is most representative of a maximum stable extremum area (MSER) and stroke width conversion (SWT);
3. based on a deep learning method, a convolutional neural network is utilized to extract high-dimensional features from the image, and text detection and identification are realized.
For text recognition, there are currently mainly the following methods:
1. a character-based approach that performs character-level text recognition, successful recognition of characters making bottom-up text recognition easier to implement;
2. the text recognition is regarded as word recognition based on a method of a word group;
3. the method based on the sequence converts a text recognition problem into a sequence recognition problem, the text is represented by a character sequence, and a convolution cyclic neural network is utilized to process a sequence with any length.
Text detection recognition in handwritten documents in natural scenes is different from conventional OCR recognition. Text detection recognition in handwritten documents in natural scenes presents a very large challenge compared to OCR:
firstly, scene complexity, noise, deformation, non-uniform illumination, local shielding, confusion of characters and backgrounds and the like all influence the detection and identification effect;
secondly, the character diversity, color, size, direction, font, language, character partial deformity, etc. can also affect the detection and identification effect.
The problem is solved with great cultural economic benefits, such as helping people with visual impairment to read documents, realizing real-time photographing and translation, and the like. However, because many interference factors exist in handwritten document pictures shot in natural scenes, the traditional text detection and recognition method cannot be well applied to natural scenes. Based on the method, the invention provides a method for detecting and identifying the handwritten document text.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a simple and efficient method for detecting and identifying the handwritten document text.
The invention is realized by the following technical scheme:
a detection and recognition method of handwritten document text is characterized by comprising the following steps: the method comprises two parts of text line positioning and text line detection;
the text line positioning network uses the deformed VGG-11 to perform network training on a picture, and the (x) is obtained by regression0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0Finding possible starting positions of the text lines on the picture;
the text line detection network incrementally propagates forward along the text line, and the text line start position and rotation angle (x) obtained by the text line positioning networki,yi,θi) Resampling to obtain a viewing window, inputting the viewing window into CNN network to obtain (x) of the next positioni+1,yi+1,θi+1) And repeating the process until the picture edge is reached, finally generating a normalized text line picture, inputting the normalized text line picture into a text line identification network, identifying the text line picture by the text line identification network and outputting an identification result.
Before inputting to the text line positioning network, processing the data set, outputting all text line pictures, and simultaneously outputting json labeling information, wherein the json labeling information comprises an image path, area coordinates of each line of text, coordinates of an area where each word is located in each line, and text content of each line of text.
The processing method of the text line positioning network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2. bringing the input image resize to 512 pixels wide and sampling 256 x 256 image patches over the entire picture, allowing each patch to extend outside the image using the average color fill of the image patch edges;
s3, inputting each 16-by-16 input image block into a deformed VGG-11 network for training, and obtaining (x) through network training regression0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0;
S4, after training, making p0=1,(x0,y0) Coordinates, scale s0And degree of rotation theta0Equal to 0;
and S5, after determining the starting position of the text line in the picture by using the text line positioning module, the text line detection network gradually advances along the path increment of the text line to determine the finished text line area.
The deformed VGG-11 network omits a full connection layer and a last pooling layer in a classic VGG-11 network, all convolution layers are convolution kernels with the same size, the size is 3 x 3, the step size stride is 1, and the padding is 1.
In step S4, the training process uses a loss function proposed for the multi-box target detection problem to align between the maximum probability predicted text line start position and the target position, where the loss function is as follows:
wherein, tmIs the target position, pnIs the likelihood of SOL occurrence, Xnm are N predicted positions and M target positionsA two-way alignment matrix therebetween, alpha is a parameter for measuring the relative importance between the position loss and the confidence loss, and the default is 0.01, lnIs an initial prediction (x) of the convolutional neural networkn,yn,sn,θn) Given the (L, p, t) calculation to minimize L Xnm, LnThe calculation formula of (a) is as follows:
ln=(-sin(θn)sn+xn,-cos(θn)sn+yn,sin(θn)sn+xn,cos(θn)sn+yn)
the processing method of the text line detection network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2, the text line detection network operates in a recursion increment mode, and the text line starting position and the rotation angle (x) obtained through the text line positioning networki,yi,θi) Resampling to obtain a viewing window;
s3, inputting the data to a CNN network to carry out regression to obtain (x) of the next positioni+1,yi+1,θi+1);
S4, repeating the steps until the picture edge is reached, wherein the size of the viewing window is determined by the dimension s0 predicted by the text line positioning module and is kept unchanged.
In step S2, the process of resampling the viewing window is similar to the spatial transformation network, and the image coordinates are mapped to the viewing image coordinates by using the radial transformation matrix;
the first matrix of viewing windows is W0=AWSOLThe matrix A is a forward propagation matrix and is responsible for providing context information for the text line detection network to correctly position the text line;
the matrix A and the matrix WSOLThe calculation formula of (a) is as follows:
the parameters are obtained by prediction of a text line positioning network;
according to WiExtracting a 32 x 32 viewing window from the matrix, and then finding a network regression by text lines to obtain xi,yiAnd thetaiX obtained by regressioni,yi,θiAnd a prediction matrix PiCalculate the next matrix Wi=PiWi-1;
The prediction matrix PiThe calculation formula of (a) is as follows:
to locate a line of text, the line of text is treated as a series of pairs of upper and lower coordinate points pu,iAnd pl,IThe coordinate pair is calculated by the upper midpoint and the lower midpoint of the prediction window;
a Mean Square Error (Mean Square Error) loss function is used in the training process of the convolutional neural network, and the calculation formula is as follows:
the text detection network starts at the first target position, tu,0And tl,0Resetting the corresponding position point every 4 steps, which aims to recover the correct path without introducing a large amount of errors in the training process when the text line detects that the network deviates from the handwritten text line;
in order to enhance the robustness of the text line detection network, after the target position is reset, translation transformation of delta x, delta y epsilon-2, 2 pixels and rotation transformation of delta theta epsilon-0.1, 0.1 radian are randomly applied to the target position.
The text line detection network outputs a normalized text line picture and inputs the normalized text line picture into a text line identification network; the text line identification network uses a traditional convolutional neural network and a bidirectional cyclic neural network, and uses CTC to perform Loss calculation on the top layer of the frame, so as to identify the input indefinite text line image and output a text line identification result.
The invention has the beneficial effects that: the method for detecting and identifying the handwritten document text not only can overcome interference factors in natural scenes and accurately detect and identify the text, but also can accurately advance recursively along the extension direction of the text lines, and finally detects the curved text lines.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of a system for detecting and recognizing handwritten document texts according to the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The detection and identification method of the handwritten document text is based on a deep learning technology and comprises two parts, namely text line positioning and text line detection;
the text line positioning network uses the deformed VGG-11 to perform network training on a picture, and the (x) is obtained by regression0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0Finding possible starting positions of the text lines on the picture;
the text line detection network incrementally propagates forward along the text line, and the text line start position and rotation angle (x) obtained by the text line positioning networki,yi,θi) Resampling to obtain a viewing window, inputting the viewing window into CNN network to obtain (x) of the next positioni+1,yi+1,θi+1) And repeating the process until the picture edge is reached, finally generating a normalized text line picture, inputting the normalized text line picture into a text line identification network, identifying the text line picture by the text line identification network and outputting an identification result.
Before inputting to the text line positioning network, processing the data set, outputting all text line pictures, and simultaneously outputting json labeling information, wherein the json labeling information comprises an image path, area coordinates of each line of text, coordinates of an area where each word is located in each line, and text content of each line of text.
The processing method of the text line positioning network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2. bringing the input image resize to 512 pixels wide and sampling 256 x 256 image patches over the entire picture, allowing each patch to extend outside the image using the average color fill of the image patch edges;
s3, inputting each 16-by-16 input image block into a deformed VGG-11 network for training, and obtaining (x) through network training regression0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0;
S4, after training, making p0=1,(x0,y0) The coordinates,Dimension s0And degree of rotation theta0Equal to 0;
and S5, after determining the starting position of the text line in the picture by using the text line positioning module, the text line detection network gradually advances along the path increment of the text line to determine the finished text line area.
The deformed VGG-11 network omits a full connection layer and a last pooling layer in a classic VGG-11 network, all convolution layers are convolution kernels with the same size, the size is 3 x 3, the step size stride is 1, and the padding is 1.
In step S4, the training process uses a loss function proposed for the multi-box target detection problem to align between the maximum probability predicted text line start position and the target position, where the loss function is as follows:
wherein, tmIs the target position, pnIs the likelihood of SOL occurrence, Xnm is a bi-directional alignment matrix between the N predicted positions and the M target positions, α is a parameter that measures the relative importance between position loss and confidence loss, and is taken to be 0.01 by default, lnIs an initial prediction (x) of the convolutional neural networkn,yn,sn,θn) Given the (L, p, t) calculation to minimize L Xnm, LnThe calculation formula of (a) is as follows:
ln=(-sin(θn)sn+xn,-cos(θn)sn+yn,sin(θn)sn+xn,cos(θn)sn+yn)
the processing method of the text line detection network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2, the recursive incremental operation of the text line detection network, the text line starting position obtained by the text line positioning network and the sum ofAngle of rotation (x)i,yi,θi) Resampling to obtain a viewing window;
s3, inputting the data to a CNN network to carry out regression to obtain (x) of the next positioni+1,yi+1,θi+1);
S4, repeating the steps until the picture edge is reached, wherein the size of the viewing window is determined by the dimension s0 predicted by the text line positioning module and is kept unchanged.
In step S2, the process of resampling the viewing window is similar to the spatial transformation network, and the image coordinates are mapped to the viewing image coordinates by using the radial transformation matrix;
the first matrix of viewing windows is W0=AWSOLThe matrix A is a forward propagation matrix and is responsible for providing context information for the text line detection network to correctly position the text line;
the matrix A and the matrix WSOLThe calculation formula of (a) is as follows:
the parameters are obtained by prediction of a text line positioning network;
according to WiExtracting a 32 x 32 viewing window from the matrix, and then finding a network regression by text lines to obtain xi,yiAnd thetaiX obtained by regressioni,yi,θiAnd a prediction matrix PiCalculate the next matrix Wi=PiWi-1;
The prediction matrix PiThe calculation formula of (a) is as follows:
to locate a line of text, the line of text is treated as a series of pairs of upper and lower coordinate points pu,iAnd pl,IThe coordinate pair is calculated by the upper midpoint and the lower midpoint of the prediction window;
a Mean Square Error (Mean Square Error) loss function is used in the training process of the convolutional neural network, and the calculation formula is as follows:
the text detection network starts at the first target position, tu,0And tl,0Resetting the corresponding position point every 4 steps, which aims to recover the correct path without introducing a large amount of errors in the training process when the text line detects that the network deviates from the handwritten text line;
in order to enhance the robustness of the text line detection network, after the target position is reset, translation transformation of delta x, delta y epsilon-2, 2 pixels and rotation transformation of delta theta epsilon-0.1, 0.1 radian are randomly applied to the target position.
The text line detection network outputs a normalized text line picture and inputs the normalized text line picture into a text line identification network; the text line identification network uses a traditional convolutional neural network and a bidirectional cyclic neural network, and uses CTC to perform Loss calculation on the top layer of the frame, so as to identify the input indefinite text line image and output a text line identification result.
The above-described embodiment is only one specific embodiment of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.
Claims (9)
1. A detection and recognition method of handwritten document text is characterized by comprising the following steps: the method comprises two parts of text line positioning and text line detection;
the text line positioning network uses the deformed VGG-11 to perform network training on a picture, and the (x) is obtained by regression0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0Finding possible starting positions of the text lines on the picture;
the text line detection network incrementally propagates forward along the text line, and the text line start position and rotation angle (x) obtained by the text line positioning networki,yi,θi) Resampling to obtain a viewing window, inputting the viewing window into CNN network to obtain (x) of the next positioni+1,yi+1,θi+1) And repeating the process until the picture edge is reached, finally generating a normalized text line picture, inputting the normalized text line picture into a text line identification network, identifying the text line picture by the text line identification network and outputting an identification result.
2. The method for detecting and recognizing handwritten document text according to claim 1, characterized in that: before inputting to the text line positioning network, processing the data set, outputting all text line pictures, and simultaneously outputting json labeling information, wherein the json labeling information comprises an image path, area coordinates of each line of text, coordinates of an area where each word is located in each line, and text content of each line of text.
3. The method for detecting and recognizing handwritten document text according to claim 1 or 2, characterized in that: the processing method of the text line positioning network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2. bringing the input image resize to 512 pixels wide and sampling 256 x 256 image patches over the entire picture, allowing each patch to extend outside the image using the average color fill of the image patch edges;
s3, inputting each 16-by-16 input image block into a deformed VGG-11 network for training, and obtaining the image block through network training regression(x0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0;
S4, after training, making p0=1,(x0,y0) Coordinates, scale s0And degree of rotation theta0Equal to 0;
and S5, after determining the starting position of the text line in the picture by using the text line positioning module, the text line detection network gradually advances along the path increment of the text line to determine the finished text line area.
4. The method for detecting and recognizing handwritten document text according to claim 3, characterized in that: the deformed VGG-11 network omits a full connection layer and a last pooling layer in a classic VGG-11 network, all convolution layers are convolution kernels with the same size, the size is 3 x 3, the step size stride is 1, and the padding is 1.
5. The method for detecting and recognizing handwritten document text according to claim 3 or 4, characterized in that: in step S4, the training process uses a loss function proposed for the multi-box target detection problem to align between the maximum probability predicted text line start position and the target position, where the loss function is as follows:
wherein, tmIs the target position, pnIs the likelihood of SOL occurrence, Xnm is a bi-directional alignment matrix between the N predicted positions and the M target positions, α is a parameter that measures the relative importance between position loss and confidence loss, and is taken to be 0.01 by default, lnIs an initial prediction (x) of the convolutional neural networkn,yn,sn,θn) Given the (L, p, t) calculation to minimize L Xnm, LnThe calculation formula of (a) is as follows:
ln=(-sin(θn)sn+xn,-cos(θn)sn+yn,sin(θn)sn+xn,cos(θn)sn+yn)。
6. the method for detecting and recognizing handwritten document text according to claim 1 or 2, characterized in that: the processing method of the text line detection network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2, the text line detection network operates in a recursion increment mode, and the text line starting position and the rotation angle (x) obtained through the text line positioning networki,yi,θi) Resampling to obtain a viewing window;
s3, inputting the data to a CNN network to carry out regression to obtain (x) of the next positioni+1,yi+1,θi+1);
S4, repeating the steps until the picture edge is reached, wherein the size of the viewing window is determined by the dimension s0 predicted by the text line positioning module and is kept unchanged.
7. The method for detecting and recognizing handwritten document text according to claim 6, characterized in that: in step S2, the process of resampling the viewing window is similar to the spatial transformation network, and the image coordinates are mapped to the viewing image coordinates by using the radial transformation matrix;
the first matrix of viewing windows is W0=AWSOLThe matrix A is a forward propagation matrix and is responsible for providing context information for the text line detection network to correctly position the text line;
the matrix A and the matrix WSOLThe calculation formula of (a) is as follows:
the parameters are obtained by prediction of a text line positioning network;
according to WiExtracting a 32 x 32 viewing window from the matrix, and then finding a network regression by text lines to obtain xi,yiAnd thetaiX obtained by regressioni,yi,θiAnd a prediction matrix PiCalculate the next matrix Wi=PiWi-1;
The prediction matrix PiThe calculation formula of (a) is as follows:
to locate a line of text, the line of text is treated as a series of pairs of upper and lower coordinate points pu,iAnd pl,IThe coordinate pair is calculated by the upper midpoint and the lower midpoint of the prediction window;
8. the method for detecting and recognizing handwritten document text according to claim 7, characterized in that: a Mean Square Error (Mean Square Error) loss function is used in the training process of the convolutional neural network, and the calculation formula is as follows:
the text detection network starts at the first target position, tu,0And tl,0Resetting the corresponding location point every 4 steps, the purpose of which is to recover the correct path when the text line detection network deviates from the handwritten text line, without introducing a large number of mistakes during the training processError;
in order to enhance the robustness of the text line detection network, after the target position is reset, translation transformation of delta x, delta y epsilon-2, 2 pixels and rotation transformation of delta theta epsilon-0.1, 0.1 radian are randomly applied to the target position.
9. The method for detecting and recognizing handwritten document text according to claim 6, 7 or 8, characterized in that: the text line detection network outputs a normalized text line picture and inputs the normalized text line picture into a text line identification network; the text line identification network uses a traditional convolutional neural network and a bidirectional cyclic neural network, and uses CTC to perform Loss calculation on the top layer of the frame, so as to identify the input indefinite text line image and output a text line identification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010896671.6A CN112036405A (en) | 2020-08-31 | 2020-08-31 | Detection and identification method for handwritten document text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010896671.6A CN112036405A (en) | 2020-08-31 | 2020-08-31 | Detection and identification method for handwritten document text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112036405A true CN112036405A (en) | 2020-12-04 |
Family
ID=73586026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010896671.6A Pending CN112036405A (en) | 2020-08-31 | 2020-08-31 | Detection and identification method for handwritten document text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112036405A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447078A (en) * | 2018-10-23 | 2019-03-08 | 四川大学 | A kind of detection recognition method of natural scene image sensitivity text |
CN110287960A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院信息工程研究所 | The detection recognition method of curve text in natural scene image |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN110837835A (en) * | 2019-10-29 | 2020-02-25 | 华中科技大学 | End-to-end scene text identification method based on boundary point detection |
-
2020
- 2020-08-31 CN CN202010896671.6A patent/CN112036405A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN109447078A (en) * | 2018-10-23 | 2019-03-08 | 四川大学 | A kind of detection recognition method of natural scene image sensitivity text |
CN110287960A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院信息工程研究所 | The detection recognition method of curve text in natural scene image |
CN110837835A (en) * | 2019-10-29 | 2020-02-25 | 华中科技大学 | End-to-end scene text identification method based on boundary point detection |
Non-Patent Citations (2)
Title |
---|
朱健菲;应自炉;陈鹏飞;: "回归――聚类联合框架下的手写文本行提取", 中国图象图形学报, no. 08 * |
王涛;江加和;: "基于语义分割技术的任意方向文字识别", 应用科技, no. 03 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Show, attend and read: A simple and strong baseline for irregular text recognition | |
US10817741B2 (en) | Word segmentation system, method and device | |
US8755595B1 (en) | Automatic extraction of character ground truth data from images | |
CN113591546A (en) | Semantic enhanced scene text recognition method and device | |
CN110751154B (en) | Complex environment multi-shape text detection method based on pixel-level segmentation | |
Wang et al. | Towards end-to-end text spotting in natural scenes | |
Bukhari et al. | High performance layout analysis of Arabic and Urdu document images | |
CN113537227B (en) | Structured text recognition method and system | |
CN112364862B (en) | Histogram similarity-based disturbance deformation Chinese character picture matching method | |
CN109886274A (en) | Social security card identification method and system based on opencv and deep learning | |
CN111476232A (en) | Water washing label detection method, equipment and storage medium | |
Sahare et al. | Robust character segmentation and recognition schemes for multilingual Indian document images | |
Sanjrani et al. | Handwritten optical character recognition system for Sindhi numerals | |
CN113313113A (en) | Certificate information acquisition method, device, equipment and storage medium | |
CN115810197A (en) | Multi-mode electric power form recognition method and device | |
CN112686219B (en) | Handwritten text recognition method and computer storage medium | |
Panda et al. | Odia offline typewritten character recognition using template matching with unicode mapping | |
Wicht et al. | Camera-based sudoku recognition with deep belief network | |
CN116704523B (en) | Text typesetting image recognition system for publishing and printing equipment | |
CN112949523A (en) | Method and system for extracting key information from identity card image picture | |
Razzak et al. | Fuzzy based preprocessing using fusion of online and offline trait for online urdu script based languages character recognition | |
Dat et al. | An improved CRNN for Vietnamese Identity Card Information Recognition. | |
Karthik et al. | Segmentation and Recognition of Handwritten Kannada Text Using Relevance Feedback and Histogram of Oriented Gradients–A Novel Approach | |
Gao et al. | Recurrent calibration network for irregular text recognition | |
CN112036405A (en) | Detection and identification method for handwritten document text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |