CN107609549B - Text detection method for certificate image in natural scene - Google Patents

Text detection method for certificate image in natural scene Download PDF

Info

Publication number
CN107609549B
CN107609549B CN201710854505.8A CN201710854505A CN107609549B CN 107609549 B CN107609549 B CN 107609549B CN 201710854505 A CN201710854505 A CN 201710854505A CN 107609549 B CN107609549 B CN 107609549B
Authority
CN
China
Prior art keywords
text
image
pixel
training
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710854505.8A
Other languages
Chinese (zh)
Other versions
CN107609549A (en
Inventor
张楠
靳晓宁
张文文
段禹心
贺思源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201710854505.8A priority Critical patent/CN107609549B/en
Publication of CN107609549A publication Critical patent/CN107609549A/en
Application granted granted Critical
Publication of CN107609549B publication Critical patent/CN107609549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a text detection method of a certificate image in a natural scene, which comprises the following steps: selecting common Chinese characters to manufacture Chinese character pictures to form a data set 1, performing random rotation and cutting operations on the marked certificate images, and fusing different background pictures by using a Poisson clone mode to form a data set 2; training a character classification model of the VGG16 network by adopting a data set 1, initializing a full convolution neural network model by using the obtained parameters after the model is converged, and training the model by using a data set 2; processing the image by using a trained full convolution neural network model, and obtaining the classification condition of each pixel point according to a maximum probability method to form a text-non-text binary image; obtaining a text region by using a connected region method, binarizing an original image, and extracting only character information in the text region in a text-non-text region binary image to obtain a text binary image; correcting the image by a maximum variance method; and projecting the corrected image again, and refining the text-non-text region binary image.

Description

Text detection method for certificate image in natural scene
Technical Field
The invention belongs to an image processing method, and particularly relates to a text detection method of a certificate image in a natural scene.
Background
The rapid development of the internet technology and the popularization of smart phones greatly facilitate our lives. In many scenarios, an operator needs a user to upload a certificate (such as an identity card, a business license, and other certificates) and verify the identity and qualification of the user. The user mobile phone shoots the certificate, uploads the certificate for verification, and is convenient and efficient. The shooting background of the user in a natural scene is complex, and the environmental interference factors are many. The shooting background in the natural scene is various, and the user may shoot in the possible living scene such as the desktop, the sheet and the like with complex textures, and the textures are difficult to distinguish from the characters. There are also situations where text is partially occluded in the captured picture, which also poses a significant challenge to text detection. When a user takes a picture in different environments, different shooting modes and different shooting equipment, the image has the conditions of text rotation, text inclination, uneven illumination, blurring, deformation, more noise points and the like. The traditional text detection technology aiming at the scanned image is difficult to achieve a good effect.
The detection of characters in natural scenes is one of the important research subjects of computer vision and pattern recognition technology in the field of target detection and recognition. The method is used for detecting the characters in the natural scene, and the final purpose of the method is to provide help for subsequent character recognition and semantic understanding. As an important component in a character recognition system, the natural scene character detection technology can assist people in understanding natural scene contents. The natural scene character detection is used as the first step of finishing image acquisition post-processing by a natural scene character recognition system, and the detection performance is directly related to the recognition rate of the whole system. Therefore, how to detect the characters quickly and accurately is a very critical problem in the technology of character recognition in natural scenes.
At present, there are two main algorithms for text detection of pictures: a method based on a sliding window and a method based on a connected region. The method based on the sliding window is that all possible positions of a picture are scanned through a sliding sub-window with variable size to detect text information, and a trained classifier is used for judging whether the text information exists in the window; secondly, the method based on the connected region firstly rapidly segments text and non-text pixels through a bottom filter, and then connects the text pixels with similar attributes to form a text component. Such methods view text in an image as some particular areas or with some particular textural features. First, we can use some features or methods to extract candidate regions in natural images as candidates for text, and these features include color features, texture features, edge features, stroke width transformation, extremum regions, and so on. And filtering out candidate regions without characters after screening, regarding the remaining regions as characters and fusing the characters into text line candidates, and screening the text line candidates to obtain a final text detection result. The filtering and screening method can select a threshold value for screening through manually designing the characteristics, or learn the characteristics by using a statistical model or a machine learning algorithm, and adaptively screen the character candidate region.
Stroke Width Transformation (SWT) and Most Stable Extremum Region (MSER) algorithms are representative of the second class of methods, and are also the predominant classical algorithms in recent years.
The method for extracting text candidates by SWT (Stroke Width Transform) is based on a series of general assumptions: characters are all formed by strokes, the strokes have certain widths, the stroke widths of the same line of text are close, and non-character parts are not formed by strokes, so that the stroke widths are not available. Based on the assumption, the stroke width transformation is carried out on the image, the width value of the stroke where each pixel point in the input image is located is calculated, and the connected region is used as a character candidate.
MSER (maximally stable extremal region) methods use MSER regions that are regions that can maintain shape and size over a range of gray level thresholds. They have sharp edges and a strong grey value contrast with the background. Generally, due to morphological characteristics, characters contain rich edge information, and characters are used as an information transmission mode, so that the characters have strong color and gray value contrast with a background in order to be seen clearly, and therefore the characters are basically MSER regions.
The existing method has the following defects:
(1) the sliding window needs multi-scale traversal images and judges each detection window, so that the detection time is long and the efficiency is low.
(2) Lack of precision and difficulty in coping with complex backgrounds.
SWT is largely affected by noise and blurring because SWT is based on successful edge detection and then detection is performed according to the character stroke width. When the background is complex and the edge is not detected, the method fails. Meanwhile, the method can falsely detect a plurality of objects with regular lines and similar characters, such as rings, grids, bricks and the like, as the characters. The shooting requirements of users on multiple scenes in the natural environment cannot be met.
MSER does not work well for fuzzy, illumination, color, texture variations, low contrast word processing.
The SWT method and the MSER method both detect single characters, detection results are inconvenient for an OCR module to use, detected single characters need to be combined according to character spacing, height difference and other characteristics, and calculated amount is increased.
Disclosure of Invention
The invention provides a text detection method of a certificate image in a natural scene, which is used for detecting text region information in the certificate image shot by a user in the natural scene and outputting an independent text line region in the image, and can tolerate the conditions of distortion, inclination, rotation, light change, complex background and the like of the image at a certain angle.
In order to achieve the purpose, the invention adopts the following technical scheme:
a text detection method of a certificate image in a natural scene comprises the following steps:
step 1, establishing a training data set. Selecting 3816 common Chinese characters, and manufacturing Chinese character pictures with different font types to form a data set 1, wherein the training images in the data set 1 are the Chinese characters with different font types, and the labels are designated labels corresponding to the Chinese characters;
step 2, carrying out operations such as random rotation, cutting, blurring, inversion, brightness conversion, gamma conversion and the like on the marked certificate image, and fusing different background images in a Poisson clone mode to form a data set 2, wherein the training image in the data set 2 is a text image, and the label is a text-non-text binary image with corresponding size;
step 3, training a character classification model of a VGG16(Visual Geometry Group-16Net) network by using a data set 1, removing a full connection layer of the VGG16 network after the model is converged, changing the model into a full convolution neural network (FCN), initializing a full convolution neural network model by using the obtained parameters, and training the full convolution neural network model by using a data set 2;
step 4, processing the image by adopting the trained full convolution neural network model to obtain a text-non-text probability distribution map, and obtaining the classification condition of each pixel point by a maximum probability method to form a text-non-text binary image;
step 5, obtaining a text region by using a region connection method according to the text-non-text region binary image;
step 6, binarizing the original image, and extracting only the character information in the text region in the text-non-text region binary image in the step 5 to obtain a text binary image;
step 7, rotating the text image obtained in the step 6 by different angles, transversely projecting, and correcting the image by a maximum variance method;
and 8, projecting the corrected image again, judging the horizontal/vertical of the region according to the number of horizontal (vertical) pixel points of the region, dividing character lines, and refining the text-non-text region binary image obtained in the step 5.
The method fuses convolution characteristics of different layers by performing convolution processing on the image, and does not need a sliding window of multiple sizes to traverse the image. The result is more accurate due to the pixel-by-pixel prediction. The features are extracted through convolution, the network structure is a full convolution neural network, a full connection layer is not provided, and the processing speed can reach real time. By fusing the characteristics of different convolution layers, the space characteristic and the texture characteristic are provided. Under the condition that the image background is complex, the text area can still be well detected. The training sample is enlarged in a Poisson cloning mode, so that overfitting of the model is effectively prevented, and scenes of the training sample are enriched.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 shows a network structure of a convolution layer portion shared by VGG16 and a text region detection model;
FIG. 3 is a network structure of VGG 16;
FIG. 4 is an overall network structure of a text detection model;
FIG. 5a is a sample view of a document image;
FIG. 5b is a background image to be fused;
FIG. 5c is a fused document image;
FIG. 5d is a text-to-non-text binary image of full convolution neural network prediction;
FIG. 5e is a certificate text information graph derived from a text-to-non-text binary image;
FIG. 5f is a diagram of the document text information after maximum variance correction;
FIG. 5g is a binary image of the refined text region;
FIG. 5h is a text region map of the rectified certificate image determined from the refined text region binary map.
Detailed Description
As shown in fig. 1, the present invention provides a text detection method for a certificate image in a natural scene, which includes the following steps:
step 1, 3816 common Chinese characters are selected, Chinese character pictures are made by using different font types such as Song style, black body, regular script, clerical script and the like, and a certain amount of salt and pepper noise and Gaussian noise are added to the Chinese character pictures to form a data set 1, wherein the training images in the data set 1 are the Chinese characters with different font types, and the labels are designated labels corresponding to the Chinese characters.
And 2, the neural network model has a plurality of parameters, and a large amount of data is required for training to prevent overfitting. Due to the high cost of the labeled sample, the limited labeled sample needs to be expanded. And randomly rotating the marked certificate image by a rotation angle rotate belonging to [ -30,30 ]. And randomly clipping, wherein the width and height of the original image are in an order of 0.7 Xwidth, width and height of the new image, and the newHeight belongs to an order of 0.7 Xheight, height. Random Gaussian blur, KernelSize ∈ [3,9], semma ∈ [1,9 ]. Converting BGR image into HSV representation, after separating channels, adding a random value hue _ vari to image brightness H, hue _ vari belongs to-8, saturation S is multiplied by sat _ vari randomly, sat _ vari belongs to 0.5,1.5, brightness V is multiplied by val _ vari randomly, val _ vari belongs to 0.7,1.3, random gamma transformation, gamma belongs to 0.5, 2.0.
Figure GDA0001501091160000031
Image pixel value pixel according to gamma tableiAnd (6) mapping. And then randomly fusing the images of different backgrounds in a Poisson cloning modeIf fig. 5a and fig. 5b are fused, fig. 5c is obtained, which can enrich the sample and the image scene. And 2, in the data set 2, the training image in the data set 2 is a text-containing image, and the label is a binary image of a text-non-text region with a corresponding size.
And 3, reforming the picture in the data set 1 into a fixed size of 28 multiplied by 28 pixels, normalizing the pixel value to be between 0 and 1, and inputting the pixel value into a VGG16 neural network model, wherein the network structure of the VGG16 model is shown in FIG. 2. Pre-training a VGG16 neural network by adopting a gradient descent mode, wherein neurons in the neural network adopt a ReLU activation function:
f(x)=max(0,WTx+b)
wherein, WTThe parameter to be trained in the neuron is input by the x neuron, and the parameter b to be biased in the neuron. The loss function of the VGG16 neural network is a softmax cross-entropy loss function L (y)i,Hi):
Figure GDA0001501091160000041
Figure GDA0001501091160000042
Where m is the number of samples in a batch processed in batches, fiActual predicted value, y, output for the ith sample in the training batchiThe true value of the ith sample in the training batch.
Step 4, when the accuracy of the VGG16 model in the step 3 reaches a certain degree, stopping pre-training, removing the full connection layer of the VGG16 model, adding convolution layers with two convolution kernels of 1 × 1, and discarding partial parameters (W) with a probability of 0.5TAnd b). And changing the size of the convolution layer by means of transposition convolution, and fusing the pool-4 and pool-3 convolution layers pixel by pixel. And finally, the convolution layer output is reformed into the size of the original image through transposition convolution. After two discarded layers, all multi-channel layers are first reduced to 2 channels by the 1 × 1 convolutional layer and then operated on. The FCN model structure is shown in fig. 3. The model is trained in a gradient descent mode,the exponential decay adjusts the step length, and the activation function of the neuron in the neural network model still adopts the ReLU activation function:
f(x)=max(0,WTx+b)
wherein, WTThe parameter to be trained in the neuron is input by the x neuron, and the parameter b to be biased in the neuron.
The loss function adopts a softmax cross entropy loss function.
Figure GDA0001501091160000043
Where m is the number of samples in a batch processed in batches, M, N is the length and width dimensions of the input image, fijIs the predicted value, y, of the jth pixel of the ith sample in the training batchijIs the true value of the jth pixel of the ith sample in the training batch.
In the design of the FCN, a convolution kernel with the size of 3 multiplied by 3 is adopted in a full convolution network, two convolution layers are cascaded, the reception field is equal to the convolution kernel with the size of 5 multiplied by 5, and the reception field is equal to the convolution kernel with the size of 7 multiplied by 7 when three convolution layers are cascaded. The number of parameters to be trained is reduced while the receptive field is increased. And a 1 × 1 convolution is introduced to effectively replace a full connection layer, and partial parameters are discarded, so that model overfitting is prevented, data dimensionality is reduced, and the calculation amount is reduced. The training parameters of the convolution layer and the pooling layer of the VGG16 model are used in the training, so that the convergence speed of the model is greatly increased, and the training time is shortened. Because the FCN network structure is not provided with a full connection layer and is a full convolution network, the input image can be in any size, and the conditions that the image is distorted and deformed and cannot be detected when the image is reformed into a fixed size are eliminated. The FCN model is a method for predicting text-non-text pixel by pixel of an image, and compared with a method of a sliding window and a connected region, the detection precision is higher.
And 5, normalizing the image pixel values in the data set 2 to be between 0 and 1, and inputting the image pixel values into the FCN model in the step 4 for training. As shown in FIG. 4, the FCN model parameters are pre-trained by selecting the convolutional layer and pooling layer parameters of VGG16 in step 3, and the truncated normal is used for newly adding layersThe random number of the distribution is initialized. The model output is a text-to-non-text probability map of the input image. For any pixel of the input imageijGiving the probability p of being a pixel of the text regionTrue(pixelij) And probability P of pixels in non-text regionFalse(pixelij) By comparing the probability of a pixel text-to-non-text. If:
PTrue(pixelij)>PFalse(pixelij)
the pixel is considered to be pixelijAnd the pixels belong to the text area, otherwise, the pixels belong to the non-text area. And marking the pixels of the text region as 1 and the pixels of the non-text region as 0 to finally obtain a text-non-text distribution graph of the whole image, and calculating the cross entropy of the text-non-text distribution graph and the label as shown in fig. 5 d.
And 6, clustering pixels of the text area in a connected area mode, so that the area of the image text can be well detected. However, this method cannot accurately segment text lines, many lines are stuck together in the obtained text regions, and the purpose of text detection is to output independent text line regions. Therefore, the certificate image is binarized, and only the text information in the text region is intercepted according to the text-non-text distribution diagram in step 5, so as to obtain a binarized image only containing the text information, as shown in fig. 5 e.
And 7, a certain rotation angle possibly exists in the image shot by the user, and when the background of the image is complex, the correction effect of the certificate can be greatly influenced. After the detection model processing, the information of the text area is obtained, the text can be extracted from the image, and the certificate is corrected by eliminating the interference of a complex background. In the certificate image, the characters in the text region are distributed according to lines, and obvious blanks exist among the lines, so that the closer the projection direction is to the direction of the certificate image, the more obvious the difference value between the peaks and the troughs of the formed projection curve is, and the larger the variance of the projection of the text region is. And (3) rotating and projecting the text binary image (with the size of N multiplied by M) in the step 6 for multiple times, wherein the rotating angle when the image projection variance is maximum is the correcting angle of the image. So that the image is horizontally projected and the horizontal projection is recordedSum of pixel points of i rows of coordinates is sumi:
Figure GDA0001501091160000051
Wherein I is an indicator function when pixelijWhen the e text exists, I is 1, otherwise, I is 0. All row means are
Figure GDA0001501091160000052
Figure GDA0001501091160000053
By rotating the image about a central point by different angles thetakAnd calculating the variance. The image text region projection variance is:
Figure GDA0001501091160000054
when the variance is maximum, the corresponding rotation angle thetakI.e. the tilt angle theta of the image.
Figure GDA0001501091160000055
Figure GDA0001501091160000061
Rectify the image and text-to-non-text region map, as shown in fig. 5 f.
And 8, detecting the condition that the text-non-text distribution diagram detected by the model has character line adhesion, and particularly, detecting the condition that the gaps among lines in the certificate are small. The adhesion of the text lines will seriously affect the accuracy of subsequent character recognition, so the adhered text lines are divided to output independent text line areas in the image. After the image correction in step 7, the text lines in the document are substantially horizontal or vertical. And judging the horizontal/vertical information of the text in each text region according to the number of pixel points projected by each line/column in each text region for the horizontal/vertical projection of the corrected image. For example, when the length of the longitudinal projection of the outline is much longer than that of the transverse projection, the text in the text region is considered to be in the horizontal direction. And then projecting according to the vertical direction of the text direction. And determining the peak point and the valley point of each projection curve according to the variation trend of the number of the projection pixel points, wherein the peak point is an extreme point of which the median of the projection curve is greater than the value of the surrounding points, and the valley point is an extreme point of which the median of the projection curve is less than the value of the surrounding points. The segmentation of the lines of text relies primarily on finding the valleys. To eliminate false valleys, statistics can be performed based on the average of the sums of the first 5 rows of pixel points. If it is
Figure GDA0001501091160000062
The pixel is considered to be a non-text region. Accordingly, the text line is segmented, and the text-non-text binary image is refined to obtain the position information of the text line region, as shown in fig. 5g and 5 h.

Claims (1)

1. A text detection method of a certificate image in a natural scene is characterized by comprising the following steps:
step 1, establishing a training data set: selecting common Chinese characters, manufacturing Chinese character pictures by adopting different font types to form a data set 1, wherein training images in the data set 1 are the Chinese characters with different font types, and labels are designated labels corresponding to the Chinese characters;
step 2, randomly rotating, cutting, blurring, reversing, changing brightness and gamma changing the marked certificate image, and fusing different background images in a Poisson clone mode to form a data set 2, wherein the training image in the data set 2 is a text image, and the label is a text-non-text binary image with a corresponding size;
step 3, training a character classification model of a VGG16(Visual Geometry Group-16Net) network by using a data set 1, removing a full connection layer of the VGG16 network after the model is converged, changing the model into a full convolution neural network (FCN), initializing a full convolution neural network model by using the obtained VGG16 character classification model parameters, and training the full convolution neural network model by using a data set 2;
the step 3 specifically comprises the following steps: reforming the picture in the data set 1 into a fixed size of 28 × 28 pixels, normalizing the pixel value to be between 0 and 1, inputting the pixel value into a VGG16 neural network model, and pre-training a VGG16 neural network in a gradient descent mode, wherein neurons in the neural network adopt a ReLU activation function:
f(x)=max(0,WTx+b)
wherein, WTThe loss function of the VGG16 neural network is softmax cross entropy loss function L (y)i,Hi):
Figure FDA0002787762290000011
Figure FDA0002787762290000012
Where m is the number of samples in a batch processed in batches, fiActual predicted value, y, output for the ith sample in the training batchiThe real value of the ith sample in the training batch;
stopping pre-training when the accuracy of the VGG16 model reaches a preset degree, removing a full connection layer of the VGG16 model, adding convolution layers with two convolution kernels of 1 multiplied by 1, and discarding partial parameters with a probability of 0.5; changing the size of the convolution layer by means of transposition convolution, and fusing the pool-4 and pool-3 convolution layers pixel by pixel; finally, outputting and reforming the convolution layer into the size of the original image through transposition convolution; after two discarding layers, all multi-channel layers are firstly reduced to 2 channels through the convolution layer of 1 multiplied by 1 and then operated; training the model by adopting a gradient descent mode, wherein the activation function of the neurons in the neural network model still adopts a ReLU activation function:
f(x)=max(0,WTx+b)
wherein, WTIs the parameter to be trained in the neuron, x neuron input, b is the parameter to be biased in the neuron,
the loss function adopts a softmax cross entropy loss function,
Figure FDA0002787762290000013
where m is the number of samples in a batch processed in batches, M, N is the length and width dimensions of the image, fijIs the predicted value, y, of the jth pixel of the ith sample in the training batchijThe real value of the jth pixel of the ith sample in the training batch is obtained;
step 4, processing the image by adopting the trained full convolution neural network model to obtain a text-non-text probability distribution map, and obtaining the classification condition of each pixel point by a maximum probability method to form a text-non-text binary image;
step 5, obtaining a text region by using a region connection method according to the text-non-text region binary image;
the step 5 specifically comprises the following steps: normalizing the image pixel values in the data set 2 to be between 0 and 1, inputting the image pixel values into the FCN model in the step 4 for training, selecting the convolution layer and pooling layer parameters of VGG16 in the step 3 for pre-training the FCN model parameters, and initializing a new added layer by using a random number of truncated normal distribution; the FCN model output is a text-to-non-text probability map of the image; for any pixel of an imageijGiving the probability P of being a pixel of the text regionTrue(pixelij) And probability P of pixels in non-text regionFalse(pixelij) Comparing the probability of pixel text-non-text, if:
PTrue(pixelij)>PFalse(pixelij)
the pixel is considered to be pixelijIf the pixel belongs to the text area, otherwise, the pixel belongs to the non-text area, the pixel of the text area is marked with 1, the pixel of the non-text area is marked with 0, and finally the pixel is obtainedA text-to-non-text profile of the entire image;
step 6, binarizing the image, and extracting only character information in a text region in the text-non-text region binary image in the step 5 to obtain a text binary image;
step 7, rotating the text image obtained in the step 6 by different angles, transversely projecting, and correcting the image by a maximum variance method;
the step 7 specifically comprises the following steps: and (4) rotating and projecting the text binarization image in the step (6) for multiple times, wherein the rotating angle when the image projection variance is maximum is the correction angle of the image, so that the image is horizontally projected in the transverse direction, and the sum of pixel points of i lines of the vertical coordinate after horizontal projection is recorded as sumi
Figure FDA0002787762290000021
Wherein I is an indication function when pixelijWhen the e belongs to the text, I is 1, otherwise, I is 0, and all line mean values are
Figure FDA0002787762290000025
Figure FDA0002787762290000022
By rotating the image about a central point by different angles thetakCalculating the variance, wherein the image text region projection variance is as follows:
Figure FDA0002787762290000023
when the variance is maximum, the corresponding rotation angle thetakThat is, the tilt angle θ of the image:
Figure FDA0002787762290000024
step 8, projecting the corrected image again, judging the horizontal/vertical of the region according to the number of horizontal/vertical pixel points of the region, dividing character lines, and finely modifying the text-non-text region binary image obtained in the step 5;
the step 8 specifically comprises the following steps: judging the horizontal/vertical information of the text in each text region according to the number of pixel points projected by each line/column in each text region for the horizontal/vertical projection of the corrected image; when the length of the longitudinal projection of the outline is far greater than that of the transverse projection, the text in the text area is considered to be in the horizontal direction, then the text is projected in the vertical direction of the text direction, and according to the variation trend of the number of the projection pixel points, the peak point and the valley point of each projection curve are determined, wherein the peak point is an extreme point of which the median of the projection curve is greater than the surrounding value, and the valley point is an extreme point of which the median of the projection curve is less than the surrounding value; the character line is divided by searching for a valley, and in order to eliminate false valleys, statistics can be carried out according to the average value of the sum of the first 5 lines of pixel points, and if the sum is not equal to the false valley, the statistics is carried out
Figure FDA0002787762290000031
And considering the pixel line as a text-free area, segmenting the text line according to the text-free area, and refining the text-non-text binary image to obtain the position information of the text line area.
CN201710854505.8A 2017-09-20 2017-09-20 Text detection method for certificate image in natural scene Active CN107609549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710854505.8A CN107609549B (en) 2017-09-20 2017-09-20 Text detection method for certificate image in natural scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710854505.8A CN107609549B (en) 2017-09-20 2017-09-20 Text detection method for certificate image in natural scene

Publications (2)

Publication Number Publication Date
CN107609549A CN107609549A (en) 2018-01-19
CN107609549B true CN107609549B (en) 2021-01-08

Family

ID=61061405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710854505.8A Active CN107609549B (en) 2017-09-20 2017-09-20 Text detection method for certificate image in natural scene

Country Status (1)

Country Link
CN (1) CN107609549B (en)

Families Citing this family (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376244B (en) * 2018-02-02 2022-03-25 北京大学 Method for identifying text font in natural scene picture
CN108229463A (en) * 2018-02-07 2018-06-29 众安信息技术服务有限公司 Character recognition method based on image
CN108280839A (en) * 2018-02-27 2018-07-13 北京尚睿通教育科技股份有限公司 A kind of operation framing and dividing method and its device
CN108681729B (en) * 2018-05-08 2023-06-23 腾讯科技(深圳)有限公司 Text image correction method, device, storage medium and equipment
CN108694393A (en) * 2018-05-30 2018-10-23 深圳市思迪信息技术股份有限公司 A kind of certificate image text area extraction method based on depth convolution
CN110569835B (en) * 2018-06-06 2024-03-05 北京搜狗科技发展有限公司 Image recognition method and device and electronic equipment
CN108921163A (en) * 2018-06-08 2018-11-30 南京大学 A kind of packaging coding detection method based on deep learning
CN109035216A (en) * 2018-07-06 2018-12-18 北京羽医甘蓝信息技术有限公司 Handle the method and device of cervical cell sectioning image
CN109214322A (en) * 2018-08-27 2019-01-15 厦门哲林软件科技有限公司 A kind of optimization method and system of file and picture visual effect
CN109271910A (en) * 2018-09-04 2019-01-25 阿里巴巴集团控股有限公司 A kind of Text region, character translation method and apparatus
WO2020047756A1 (en) * 2018-09-04 2020-03-12 深圳市大疆创新科技有限公司 Image encoding method and apparatus
CN109344727B (en) * 2018-09-07 2020-11-27 苏州创旅天下信息技术有限公司 Identity card text information detection method and device, readable storage medium and terminal
CN109522975A (en) * 2018-09-18 2019-03-26 平安科技(深圳)有限公司 Handwriting samples generation method, device, computer equipment and storage medium
CN109409363B (en) * 2018-10-13 2021-11-12 长沙芯希电子科技有限公司 Content-based text image inversion judgment and correction method
CN109492630A (en) * 2018-10-26 2019-03-19 信雅达系统工程股份有限公司 A method of the word area detection positioning in the financial industry image based on deep learning
CN109583367A (en) * 2018-11-28 2019-04-05 网易(杭州)网络有限公司 Image text row detection method and device, storage medium and electronic equipment
CN109582946B (en) * 2018-11-28 2019-10-25 龙马智芯(珠海横琴)科技有限公司 The determination method and device of character area writing direction
CN111259878A (en) * 2018-11-30 2020-06-09 中移(杭州)信息技术有限公司 Method and equipment for detecting text
CN109635714B (en) * 2018-12-07 2023-05-30 光典信息发展有限公司 Correction method and device for document scanning image
CN111325194B (en) * 2018-12-13 2023-12-29 杭州海康威视数字技术股份有限公司 Character recognition method, device and equipment and storage medium
CN110032997B (en) * 2019-01-07 2021-02-19 武汉大学 Natural scene text positioning method based on image segmentation
CN110059539A (en) * 2019-02-27 2019-07-26 天津大学 A kind of natural scene text position detection method based on image segmentation
CN109902680A (en) * 2019-03-04 2019-06-18 四川长虹电器股份有限公司 The detection of picture rotation angle and bearing calibration based on convolutional neural networks
CN109993164A (en) * 2019-03-20 2019-07-09 上海电力学院 A kind of natural scene character recognition method based on RCRNN neural network
CN109977950A (en) * 2019-03-22 2019-07-05 上海电力学院 A kind of character recognition method based on mixing CNN-LSTM network
CN111723627B (en) * 2019-03-22 2024-07-23 北京搜狗科技发展有限公司 Image processing method and device and electronic equipment
CN110175520A (en) * 2019-04-22 2019-08-27 南方电网科学研究院有限责任公司 Text position detection method and device for robot inspection image and storage medium
CN110210297B (en) * 2019-04-25 2023-12-26 上海海事大学 Method for locating and extracting Chinese characters in customs clearance image
CN110738201B (en) * 2019-04-25 2024-04-19 上海海事大学 Self-adaptive multi-convolution neural network character recognition method based on fusion morphological characteristics
CN109948598B (en) * 2019-05-15 2019-09-06 达而观信息科技(上海)有限公司 Document layout intelligent analysis method and device
CN110222680A (en) * 2019-05-19 2019-09-10 天津大学 A kind of domestic waste article outer packing Method for text detection
CN112001406B (en) * 2019-05-27 2023-09-08 杭州海康威视数字技术股份有限公司 Text region detection method and device
CN110276279B (en) * 2019-06-06 2020-06-16 华东师范大学 Method for detecting arbitrary-shape scene text based on image segmentation
CN112241736B (en) * 2019-07-19 2024-01-26 上海高德威智能交通系统有限公司 Text detection method and device
CN110415177A (en) * 2019-07-26 2019-11-05 四川长虹电器股份有限公司 A kind of image rotation sanction drawing method based on Java
CN110458238A (en) * 2019-08-02 2019-11-15 南通使爱智能科技有限公司 A kind of method and system of certificate arc point detection and positioning
CN110610166B (en) * 2019-09-18 2022-06-07 北京猎户星空科技有限公司 Text region detection model training method and device, electronic equipment and storage medium
CN110751154B (en) * 2019-09-27 2022-04-08 西北工业大学 Complex environment multi-shape text detection method based on pixel-level segmentation
CN112836696B (en) * 2019-11-22 2024-08-02 北京搜狗科技发展有限公司 Text data detection method and device and electronic equipment
CN111062264A (en) * 2019-11-27 2020-04-24 重庆邮电大学 Document object classification method based on dual-channel hybrid convolution network
CN111310746B (en) * 2020-01-15 2024-03-01 支付宝实验室(新加坡)有限公司 Text line detection method, model training method, device, server and medium
CN111260586B (en) 2020-01-20 2023-07-04 北京百度网讯科技有限公司 Correction method and device for distorted document image
CN111368842A (en) * 2020-02-29 2020-07-03 贵州电网有限责任公司 Natural scene text detection method based on multi-level maximum stable extremum region
CN111428710A (en) * 2020-03-16 2020-07-17 五邑大学 File classification collaboration robot and image character recognition method based on same
CN111461122B (en) * 2020-05-18 2024-03-22 南京大学 Certificate information detection and extraction method
CN111709420B (en) * 2020-06-18 2022-06-24 北京易真学思教育科技有限公司 Text detection method, electronic device and computer readable medium
CN111797922B (en) * 2020-07-03 2023-11-28 泰康保险集团股份有限公司 Text image classification method and device
CN111967469B (en) * 2020-08-13 2023-12-15 上海明略人工智能(集团)有限公司 Method and system for correcting malformed text and character recognition method
CN112528776B (en) * 2020-11-27 2024-04-09 京东科技控股股份有限公司 Text line correction method and device
CN113011409A (en) * 2021-04-02 2021-06-22 北京世纪好未来教育科技有限公司 Image identification method and device, electronic equipment and storage medium
CN112990220B (en) * 2021-04-19 2022-08-05 烟台中科网络技术研究所 Intelligent identification method and system for target text in image
CN113033558B (en) * 2021-04-19 2024-03-19 深圳市华汉伟业科技有限公司 Text detection method and device for natural scene and storage medium
CN113505536A (en) * 2021-07-09 2021-10-15 兰州理工大学 Optimized traffic flow prediction model based on space-time diagram convolution network
CN114219946B (en) * 2021-12-29 2022-11-15 北京百度网讯科技有限公司 Text image binarization method and device, electronic equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140003723A1 (en) * 2012-06-27 2014-01-02 Agency For Science, Technology And Research Text Detection Devices and Text Detection Methods
CN106384112A (en) * 2016-09-08 2017-02-08 西安电子科技大学 Rapid image text detection method based on multi-channel and multi-dimensional cascade filter
CN107066972A (en) * 2017-04-17 2017-08-18 武汉理工大学 Natural scene Method for text detection based on multichannel extremal region

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140003723A1 (en) * 2012-06-27 2014-01-02 Agency For Science, Technology And Research Text Detection Devices and Text Detection Methods
CN106384112A (en) * 2016-09-08 2017-02-08 西安电子科技大学 Rapid image text detection method based on multi-channel and multi-dimensional cascade filter
CN107066972A (en) * 2017-04-17 2017-08-18 武汉理工大学 Natural scene Method for text detection based on multichannel extremal region

Also Published As

Publication number Publication date
CN107609549A (en) 2018-01-19

Similar Documents

Publication Publication Date Title
CN107609549B (en) Text detection method for certificate image in natural scene
CN111325203B (en) American license plate recognition method and system based on image correction
US10817741B2 (en) Word segmentation system, method and device
CN109086714A (en) Table recognition method, identifying system and computer installation
US11295417B2 (en) Enhancing the legibility of images using monochromatic light sources
US20140112582A1 (en) Character recognition
US20140193029A1 (en) Text Detection in Images of Graphical User Interfaces
CN109389115B (en) Text recognition method, device, storage medium and computer equipment
CN110689003A (en) Low-illumination imaging license plate recognition method and system, computer equipment and storage medium
Gilly et al. A survey on license plate recognition systems
Biswas et al. A global-to-local approach to binarization of degraded document images
WO2022121021A1 (en) Identity card number detection method and apparatus, and readable storage medium and terminal
Polyakova et al. Improvement of the Color Text Image Binarization Methodusing the Minimum-Distance Classifier
Khan et al. Text detection and recognition on traffic panel in roadside imagery
US9269126B2 (en) System and method for enhancing the legibility of images
CN112712080B (en) Character recognition processing method for acquiring image by moving character screen
Vu et al. Automatic extraction of text regions from document images by multilevel thresholding and k-means clustering
CN112784830A (en) Character recognition method and device
Zhang et al. Improving optical character recognition accuracy for indonesia identification card using generative adversarial network
Ghavidel et al. Natural scene text localization using edge color signature
Sagum Incorporating deblurring techniques in multiple recognition of license plates from video sequences
CN111598080B (en) Binary division tree license plate recognition method based on convex polyhedron piecewise linear classification
Zhu et al. Robust text segmentation in low quality images via adaptive stroke width estimation and stroke based superpixel grouping
Elmore et al. A morphological image preprocessing suite for ocr on natural scene images
Martyshkin et al. Research of the Handwriting Recognition Methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant