CN112036405A - Detection and identification method for handwritten document text - Google Patents

Detection and identification method for handwritten document text Download PDF

Info

Publication number
CN112036405A
CN112036405A CN202010896671.6A CN202010896671A CN112036405A CN 112036405 A CN112036405 A CN 112036405A CN 202010896671 A CN202010896671 A CN 202010896671A CN 112036405 A CN112036405 A CN 112036405A
Authority
CN
China
Prior art keywords
text
text line
network
picture
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010896671.6A
Other languages
Chinese (zh)
Inventor
崔炜炜
魏金雷
尹洪义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202010896671.6A priority Critical patent/CN112036405A/en
Publication of CN112036405A publication Critical patent/CN112036405A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention particularly relates to a detection and identification method of a handwritten document text. The detection and identification method of the handwritten document text comprises two parts, namely text line positioning and text line detection; the text line positioning network uses the deformed VGG-11 to train a picture through the network, so that the possible starting position of the text line is found on the picture; the text line detection network incrementally transmits the text line along the text line in the forward direction, the obtained starting position and the rotation angle of the text line are obtained, a viewing window is obtained by resampling, the viewing window is input to a CNN network to carry out regression to obtain the rotation angle of the next position until the edge of the picture is reached, a normalized text line picture is finally generated and is input to a text line identification network to identify the text line picture and output an identification result. The method for detecting and identifying the handwritten document text not only can overcome interference factors in natural scenes and accurately detect and identify the text, but also can accurately advance recursively along the extension direction of the text lines, and finally detects the curved text lines.

Description

Detection and identification method for handwritten document text
Technical Field
The invention relates to the technical field of deep learning, in particular to a detection and identification method for a handwritten document text.
Background
The problem of text block position detection in complex color images in natural scenes is firstly proposed at the end of the twentieth century. The solution to the problem has great economic and cultural benefits, so the problem quickly becomes a hotspot in the fields of computer vision and document analysis. In the decades after the above problems have been addressed, various text detection and recognition methods have been proposed.
For text detection, there are currently mainly the following methods:
1. based on a capability minimization method, most methods are based on a conditional random field and a Markov random field, and the problem of detecting text lines is regarded as an energy minimization problem so as to solve the interference between the text lines;
2. the method based on the connected domain has the core idea that small parts are found to form large parts, then non-character parts are removed through a classifier, and finally characters are extracted from an image and combined into a character area, wherein the method based on the connected domain is most representative of a maximum stable extremum area (MSER) and stroke width conversion (SWT);
3. based on a deep learning method, a convolutional neural network is utilized to extract high-dimensional features from the image, and text detection and identification are realized.
For text recognition, there are currently mainly the following methods:
1. a character-based approach that performs character-level text recognition, successful recognition of characters making bottom-up text recognition easier to implement;
2. the text recognition is regarded as word recognition based on a method of a word group;
3. the method based on the sequence converts a text recognition problem into a sequence recognition problem, the text is represented by a character sequence, and a convolution cyclic neural network is utilized to process a sequence with any length.
Text detection recognition in handwritten documents in natural scenes is different from conventional OCR recognition. Text detection recognition in handwritten documents in natural scenes presents a very large challenge compared to OCR:
firstly, scene complexity, noise, deformation, non-uniform illumination, local shielding, confusion of characters and backgrounds and the like all influence the detection and identification effect;
secondly, the character diversity, color, size, direction, font, language, character partial deformity, etc. can also affect the detection and identification effect.
The problem is solved with great cultural economic benefits, such as helping people with visual impairment to read documents, realizing real-time photographing and translation, and the like. However, because many interference factors exist in handwritten document pictures shot in natural scenes, the traditional text detection and recognition method cannot be well applied to natural scenes. Based on the method, the invention provides a method for detecting and identifying the handwritten document text.
Disclosure of Invention
In order to make up for the defects of the prior art, the invention provides a simple and efficient method for detecting and identifying the handwritten document text.
The invention is realized by the following technical scheme:
a detection and recognition method of handwritten document text is characterized by comprising the following steps: the method comprises two parts of text line positioning and text line detection;
the text line positioning network uses the deformed VGG-11 to perform network training on a picture, and the (x) is obtained by regression0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0Finding possible starting positions of the text lines on the picture;
the text line detection network incrementally propagates forward along the text line, and the text line start position and rotation angle (x) obtained by the text line positioning networki,yi,θi) Resampling to obtain a viewing window, inputting the viewing window into CNN network to obtain (x) of the next positioni+1,yi+1,θi+1) And repeating the process until the picture edge is reached, finally generating a normalized text line picture, inputting the normalized text line picture into a text line identification network, identifying the text line picture by the text line identification network and outputting an identification result.
Before inputting to the text line positioning network, processing the data set, outputting all text line pictures, and simultaneously outputting json labeling information, wherein the json labeling information comprises an image path, area coordinates of each line of text, coordinates of an area where each word is located in each line, and text content of each line of text.
The processing method of the text line positioning network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2. bringing the input image resize to 512 pixels wide and sampling 256 x 256 image patches over the entire picture, allowing each patch to extend outside the image using the average color fill of the image patch edges;
s3, inputting each 16-by-16 input image block into a deformed VGG-11 network for training, and obtaining (x) through network training regression0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0
S4, after training, making p0=1,(x0,y0) Coordinates, scale s0And degree of rotation theta0Equal to 0;
and S5, after determining the starting position of the text line in the picture by using the text line positioning module, the text line detection network gradually advances along the path increment of the text line to determine the finished text line area.
The deformed VGG-11 network omits a full connection layer and a last pooling layer in a classic VGG-11 network, all convolution layers are convolution kernels with the same size, the size is 3 x 3, the step size stride is 1, and the padding is 1.
In step S4, the training process uses a loss function proposed for the multi-box target detection problem to align between the maximum probability predicted text line start position and the target position, where the loss function is as follows:
Figure BDA0002658669180000031
wherein, tmIs the target position, pnIs the likelihood of SOL occurrence, Xnm are N predicted positions and M target positionsA two-way alignment matrix therebetween, alpha is a parameter for measuring the relative importance between the position loss and the confidence loss, and the default is 0.01, lnIs an initial prediction (x) of the convolutional neural networkn,yn,sn,θn) Given the (L, p, t) calculation to minimize L Xnm, LnThe calculation formula of (a) is as follows:
ln=(-sin(θn)sn+xn,-cos(θn)sn+yn,sin(θn)sn+xn,cos(θn)sn+yn)
the processing method of the text line detection network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2, the text line detection network operates in a recursion increment mode, and the text line starting position and the rotation angle (x) obtained through the text line positioning networki,yi,θi) Resampling to obtain a viewing window;
s3, inputting the data to a CNN network to carry out regression to obtain (x) of the next positioni+1,yi+1,θi+1);
S4, repeating the steps until the picture edge is reached, wherein the size of the viewing window is determined by the dimension s0 predicted by the text line positioning module and is kept unchanged.
In step S2, the process of resampling the viewing window is similar to the spatial transformation network, and the image coordinates are mapped to the viewing image coordinates by using the radial transformation matrix;
the first matrix of viewing windows is W0=AWSOLThe matrix A is a forward propagation matrix and is responsible for providing context information for the text line detection network to correctly position the text line;
the matrix A and the matrix WSOLThe calculation formula of (a) is as follows:
Figure BDA0002658669180000041
Figure BDA0002658669180000042
the parameters are obtained by prediction of a text line positioning network;
according to WiExtracting a 32 x 32 viewing window from the matrix, and then finding a network regression by text lines to obtain xi,yiAnd thetaiX obtained by regressioni,yi,θiAnd a prediction matrix PiCalculate the next matrix Wi=PiWi-1
The prediction matrix PiThe calculation formula of (a) is as follows:
Figure BDA0002658669180000043
to locate a line of text, the line of text is treated as a series of pairs of upper and lower coordinate points pu,iAnd pl,IThe coordinate pair is calculated by the upper midpoint and the lower midpoint of the prediction window;
Figure BDA0002658669180000044
a Mean Square Error (Mean Square Error) loss function is used in the training process of the convolutional neural network, and the calculation formula is as follows:
Figure BDA0002658669180000045
the text detection network starts at the first target position, tu,0And tl,0Resetting the corresponding position point every 4 steps, which aims to recover the correct path without introducing a large amount of errors in the training process when the text line detects that the network deviates from the handwritten text line;
in order to enhance the robustness of the text line detection network, after the target position is reset, translation transformation of delta x, delta y epsilon-2, 2 pixels and rotation transformation of delta theta epsilon-0.1, 0.1 radian are randomly applied to the target position.
The text line detection network outputs a normalized text line picture and inputs the normalized text line picture into a text line identification network; the text line identification network uses a traditional convolutional neural network and a bidirectional cyclic neural network, and uses CTC to perform Loss calculation on the top layer of the frame, so as to identify the input indefinite text line image and output a text line identification result.
The invention has the beneficial effects that: the method for detecting and identifying the handwritten document text not only can overcome interference factors in natural scenes and accurately detect and identify the text, but also can accurately advance recursively along the extension direction of the text lines, and finally detects the curved text lines.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of a system for detecting and recognizing handwritten document texts according to the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the embodiment of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The detection and identification method of the handwritten document text is based on a deep learning technology and comprises two parts, namely text line positioning and text line detection;
the text line positioning network uses the deformed VGG-11 to perform network training on a picture, and the (x) is obtained by regression0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0Finding possible starting positions of the text lines on the picture;
the text line detection network incrementally propagates forward along the text line, and the text line start position and rotation angle (x) obtained by the text line positioning networki,yi,θi) Resampling to obtain a viewing window, inputting the viewing window into CNN network to obtain (x) of the next positioni+1,yi+1,θi+1) And repeating the process until the picture edge is reached, finally generating a normalized text line picture, inputting the normalized text line picture into a text line identification network, identifying the text line picture by the text line identification network and outputting an identification result.
Before inputting to the text line positioning network, processing the data set, outputting all text line pictures, and simultaneously outputting json labeling information, wherein the json labeling information comprises an image path, area coordinates of each line of text, coordinates of an area where each word is located in each line, and text content of each line of text.
The processing method of the text line positioning network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2. bringing the input image resize to 512 pixels wide and sampling 256 x 256 image patches over the entire picture, allowing each patch to extend outside the image using the average color fill of the image patch edges;
s3, inputting each 16-by-16 input image block into a deformed VGG-11 network for training, and obtaining (x) through network training regression0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0
S4, after training, making p0=1,(x0,y0) The coordinates,Dimension s0And degree of rotation theta0Equal to 0;
and S5, after determining the starting position of the text line in the picture by using the text line positioning module, the text line detection network gradually advances along the path increment of the text line to determine the finished text line area.
The deformed VGG-11 network omits a full connection layer and a last pooling layer in a classic VGG-11 network, all convolution layers are convolution kernels with the same size, the size is 3 x 3, the step size stride is 1, and the padding is 1.
In step S4, the training process uses a loss function proposed for the multi-box target detection problem to align between the maximum probability predicted text line start position and the target position, where the loss function is as follows:
Figure BDA0002658669180000061
wherein, tmIs the target position, pnIs the likelihood of SOL occurrence, Xnm is a bi-directional alignment matrix between the N predicted positions and the M target positions, α is a parameter that measures the relative importance between position loss and confidence loss, and is taken to be 0.01 by default, lnIs an initial prediction (x) of the convolutional neural networkn,yn,sn,θn) Given the (L, p, t) calculation to minimize L Xnm, LnThe calculation formula of (a) is as follows:
ln=(-sin(θn)sn+xn,-cos(θn)sn+yn,sin(θn)sn+xn,cos(θn)sn+yn)
the processing method of the text line detection network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2, the recursive incremental operation of the text line detection network, the text line starting position obtained by the text line positioning network and the sum ofAngle of rotation (x)i,yi,θi) Resampling to obtain a viewing window;
s3, inputting the data to a CNN network to carry out regression to obtain (x) of the next positioni+1,yi+1,θi+1);
S4, repeating the steps until the picture edge is reached, wherein the size of the viewing window is determined by the dimension s0 predicted by the text line positioning module and is kept unchanged.
In step S2, the process of resampling the viewing window is similar to the spatial transformation network, and the image coordinates are mapped to the viewing image coordinates by using the radial transformation matrix;
the first matrix of viewing windows is W0=AWSOLThe matrix A is a forward propagation matrix and is responsible for providing context information for the text line detection network to correctly position the text line;
the matrix A and the matrix WSOLThe calculation formula of (a) is as follows:
Figure BDA0002658669180000071
Figure BDA0002658669180000072
the parameters are obtained by prediction of a text line positioning network;
according to WiExtracting a 32 x 32 viewing window from the matrix, and then finding a network regression by text lines to obtain xi,yiAnd thetaiX obtained by regressioni,yi,θiAnd a prediction matrix PiCalculate the next matrix Wi=PiWi-1
The prediction matrix PiThe calculation formula of (a) is as follows:
Figure BDA0002658669180000073
to locate a line of text, the line of text is treated as a series of pairs of upper and lower coordinate points pu,iAnd pl,IThe coordinate pair is calculated by the upper midpoint and the lower midpoint of the prediction window;
Figure BDA0002658669180000074
a Mean Square Error (Mean Square Error) loss function is used in the training process of the convolutional neural network, and the calculation formula is as follows:
Figure BDA0002658669180000081
the text detection network starts at the first target position, tu,0And tl,0Resetting the corresponding position point every 4 steps, which aims to recover the correct path without introducing a large amount of errors in the training process when the text line detects that the network deviates from the handwritten text line;
in order to enhance the robustness of the text line detection network, after the target position is reset, translation transformation of delta x, delta y epsilon-2, 2 pixels and rotation transformation of delta theta epsilon-0.1, 0.1 radian are randomly applied to the target position.
The text line detection network outputs a normalized text line picture and inputs the normalized text line picture into a text line identification network; the text line identification network uses a traditional convolutional neural network and a bidirectional cyclic neural network, and uses CTC to perform Loss calculation on the top layer of the frame, so as to identify the input indefinite text line image and output a text line identification result.
The above-described embodiment is only one specific embodiment of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (9)

1. A detection and recognition method of handwritten document text is characterized by comprising the following steps: the method comprises two parts of text line positioning and text line detection;
the text line positioning network uses the deformed VGG-11 to perform network training on a picture, and the (x) is obtained by regression0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0Finding possible starting positions of the text lines on the picture;
the text line detection network incrementally propagates forward along the text line, and the text line start position and rotation angle (x) obtained by the text line positioning networki,yi,θi) Resampling to obtain a viewing window, inputting the viewing window into CNN network to obtain (x) of the next positioni+1,yi+1,θi+1) And repeating the process until the picture edge is reached, finally generating a normalized text line picture, inputting the normalized text line picture into a text line identification network, identifying the text line picture by the text line identification network and outputting an identification result.
2. The method for detecting and recognizing handwritten document text according to claim 1, characterized in that: before inputting to the text line positioning network, processing the data set, outputting all text line pictures, and simultaneously outputting json labeling information, wherein the json labeling information comprises an image path, area coordinates of each line of text, coordinates of an area where each word is located in each line, and text content of each line of text.
3. The method for detecting and recognizing handwritten document text according to claim 1 or 2, characterized in that: the processing method of the text line positioning network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2. bringing the input image resize to 512 pixels wide and sampling 256 x 256 image patches over the entire picture, allowing each patch to extend outside the image using the average color fill of the image patch edges;
s3, inputting each 16-by-16 input image block into a deformed VGG-11 network for training, and obtaining the image block through network training regression(x0,y0) Coordinates, scale s0Degree of rotation theta0And likelihood of text line occurrence p0
S4, after training, making p0=1,(x0,y0) Coordinates, scale s0And degree of rotation theta0Equal to 0;
and S5, after determining the starting position of the text line in the picture by using the text line positioning module, the text line detection network gradually advances along the path increment of the text line to determine the finished text line area.
4. The method for detecting and recognizing handwritten document text according to claim 3, characterized in that: the deformed VGG-11 network omits a full connection layer and a last pooling layer in a classic VGG-11 network, all convolution layers are convolution kernels with the same size, the size is 3 x 3, the step size stride is 1, and the padding is 1.
5. The method for detecting and recognizing handwritten document text according to claim 3 or 4, characterized in that: in step S4, the training process uses a loss function proposed for the multi-box target detection problem to align between the maximum probability predicted text line start position and the target position, where the loss function is as follows:
Figure FDA0002658669170000021
wherein, tmIs the target position, pnIs the likelihood of SOL occurrence, Xnm is a bi-directional alignment matrix between the N predicted positions and the M target positions, α is a parameter that measures the relative importance between position loss and confidence loss, and is taken to be 0.01 by default, lnIs an initial prediction (x) of the convolutional neural networkn,yn,sn,θn) Given the (L, p, t) calculation to minimize L Xnm, LnThe calculation formula of (a) is as follows:
ln=(-sin(θn)sn+xn,-cos(θn)sn+yn,sin(θn)sn+xn,cos(θn)sn+yn)。
6. the method for detecting and recognizing handwritten document text according to claim 1 or 2, characterized in that: the processing method of the text line detection network comprises the following steps:
s1, reading a json file of an image label, traversing the json file, and removing a part with an error label;
s2, the text line detection network operates in a recursion increment mode, and the text line starting position and the rotation angle (x) obtained through the text line positioning networki,yi,θi) Resampling to obtain a viewing window;
s3, inputting the data to a CNN network to carry out regression to obtain (x) of the next positioni+1,yi+1,θi+1);
S4, repeating the steps until the picture edge is reached, wherein the size of the viewing window is determined by the dimension s0 predicted by the text line positioning module and is kept unchanged.
7. The method for detecting and recognizing handwritten document text according to claim 6, characterized in that: in step S2, the process of resampling the viewing window is similar to the spatial transformation network, and the image coordinates are mapped to the viewing image coordinates by using the radial transformation matrix;
the first matrix of viewing windows is W0=AWSOLThe matrix A is a forward propagation matrix and is responsible for providing context information for the text line detection network to correctly position the text line;
the matrix A and the matrix WSOLThe calculation formula of (a) is as follows:
Figure FDA0002658669170000031
Figure FDA0002658669170000032
the parameters are obtained by prediction of a text line positioning network;
according to WiExtracting a 32 x 32 viewing window from the matrix, and then finding a network regression by text lines to obtain xi,yiAnd thetaiX obtained by regressioni,yi,θiAnd a prediction matrix PiCalculate the next matrix Wi=PiWi-1
The prediction matrix PiThe calculation formula of (a) is as follows:
Figure FDA0002658669170000033
to locate a line of text, the line of text is treated as a series of pairs of upper and lower coordinate points pu,iAnd pl,IThe coordinate pair is calculated by the upper midpoint and the lower midpoint of the prediction window;
Figure FDA0002658669170000034
8. the method for detecting and recognizing handwritten document text according to claim 7, characterized in that: a Mean Square Error (Mean Square Error) loss function is used in the training process of the convolutional neural network, and the calculation formula is as follows:
Figure FDA0002658669170000035
the text detection network starts at the first target position, tu,0And tl,0Resetting the corresponding location point every 4 steps, the purpose of which is to recover the correct path when the text line detection network deviates from the handwritten text line, without introducing a large number of mistakes during the training processError;
in order to enhance the robustness of the text line detection network, after the target position is reset, translation transformation of delta x, delta y epsilon-2, 2 pixels and rotation transformation of delta theta epsilon-0.1, 0.1 radian are randomly applied to the target position.
9. The method for detecting and recognizing handwritten document text according to claim 6, 7 or 8, characterized in that: the text line detection network outputs a normalized text line picture and inputs the normalized text line picture into a text line identification network; the text line identification network uses a traditional convolutional neural network and a bidirectional cyclic neural network, and uses CTC to perform Loss calculation on the top layer of the frame, so as to identify the input indefinite text line image and output a text line identification result.
CN202010896671.6A 2020-08-31 2020-08-31 Detection and identification method for handwritten document text Pending CN112036405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010896671.6A CN112036405A (en) 2020-08-31 2020-08-31 Detection and identification method for handwritten document text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010896671.6A CN112036405A (en) 2020-08-31 2020-08-31 Detection and identification method for handwritten document text

Publications (1)

Publication Number Publication Date
CN112036405A true CN112036405A (en) 2020-12-04

Family

ID=73586026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010896671.6A Pending CN112036405A (en) 2020-08-31 2020-08-31 Detection and identification method for handwritten document text

Country Status (1)

Country Link
CN (1) CN112036405A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110837835A (en) * 2019-10-29 2020-02-25 华中科技大学 End-to-end scene text identification method based on boundary point detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN110837835A (en) * 2019-10-29 2020-02-25 华中科技大学 End-to-end scene text identification method based on boundary point detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱健菲;应自炉;陈鹏飞;: "回归――聚类联合框架下的手写文本行提取", 中国图象图形学报, no. 08 *
王涛;江加和;: "基于语义分割技术的任意方向文字识别", 应用科技, no. 03 *

Similar Documents

Publication Publication Date Title
Li et al. Show, attend and read: A simple and strong baseline for irregular text recognition
US10817741B2 (en) Word segmentation system, method and device
US8755595B1 (en) Automatic extraction of character ground truth data from images
CN113591546A (en) Semantic enhanced scene text recognition method and device
CN110751154B (en) Complex environment multi-shape text detection method based on pixel-level segmentation
Wang et al. Towards end-to-end text spotting in natural scenes
Bukhari et al. High performance layout analysis of Arabic and Urdu document images
CN113537227B (en) Structured text recognition method and system
CN112364862B (en) Histogram similarity-based disturbance deformation Chinese character picture matching method
CN109886274A (en) Social security card identification method and system based on opencv and deep learning
CN111476232A (en) Water washing label detection method, equipment and storage medium
Sahare et al. Robust character segmentation and recognition schemes for multilingual Indian document images
Sanjrani et al. Handwritten optical character recognition system for Sindhi numerals
CN113313113A (en) Certificate information acquisition method, device, equipment and storage medium
CN115810197A (en) Multi-mode electric power form recognition method and device
CN112686219B (en) Handwritten text recognition method and computer storage medium
Panda et al. Odia offline typewritten character recognition using template matching with unicode mapping
Wicht et al. Camera-based sudoku recognition with deep belief network
CN116704523B (en) Text typesetting image recognition system for publishing and printing equipment
CN112949523A (en) Method and system for extracting key information from identity card image picture
Razzak et al. Fuzzy based preprocessing using fusion of online and offline trait for online urdu script based languages character recognition
Dat et al. An improved CRNN for Vietnamese Identity Card Information Recognition.
Karthik et al. Segmentation and Recognition of Handwritten Kannada Text Using Relevance Feedback and Histogram of Oriented Gradients–A Novel Approach
Gao et al. Recurrent calibration network for irregular text recognition
CN112036405A (en) Detection and identification method for handwritten document text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination