CN107609549B - Text detection method for certificate image in natural scene - Google Patents
Text detection method for certificate image in natural scene Download PDFInfo
- Publication number
- CN107609549B CN107609549B CN201710854505.8A CN201710854505A CN107609549B CN 107609549 B CN107609549 B CN 107609549B CN 201710854505 A CN201710854505 A CN 201710854505A CN 107609549 B CN107609549 B CN 107609549B
- Authority
- CN
- China
- Prior art keywords
- text
- image
- pixel
- training
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a text detection method of a certificate image in a natural scene, which comprises the following steps: selecting common Chinese characters to manufacture Chinese character pictures to form a data set 1, performing random rotation and cutting operations on the marked certificate images, and fusing different background pictures by using a Poisson clone mode to form a data set 2; training a character classification model of the VGG16 network by adopting a data set 1, initializing a full convolution neural network model by using the obtained parameters after the model is converged, and training the model by using a data set 2; processing the image by using a trained full convolution neural network model, and obtaining the classification condition of each pixel point according to a maximum probability method to form a text-non-text binary image; obtaining a text region by using a connected region method, binarizing an original image, and extracting only character information in the text region in a text-non-text region binary image to obtain a text binary image; correcting the image by a maximum variance method; and projecting the corrected image again, and refining the text-non-text region binary image.
Description
Technical Field
The invention belongs to an image processing method, and particularly relates to a text detection method of a certificate image in a natural scene.
Background
The rapid development of the internet technology and the popularization of smart phones greatly facilitate our lives. In many scenarios, an operator needs a user to upload a certificate (such as an identity card, a business license, and other certificates) and verify the identity and qualification of the user. The user mobile phone shoots the certificate, uploads the certificate for verification, and is convenient and efficient. The shooting background of the user in a natural scene is complex, and the environmental interference factors are many. The shooting background in the natural scene is various, and the user may shoot in the possible living scene such as the desktop, the sheet and the like with complex textures, and the textures are difficult to distinguish from the characters. There are also situations where text is partially occluded in the captured picture, which also poses a significant challenge to text detection. When a user takes a picture in different environments, different shooting modes and different shooting equipment, the image has the conditions of text rotation, text inclination, uneven illumination, blurring, deformation, more noise points and the like. The traditional text detection technology aiming at the scanned image is difficult to achieve a good effect.
The detection of characters in natural scenes is one of the important research subjects of computer vision and pattern recognition technology in the field of target detection and recognition. The method is used for detecting the characters in the natural scene, and the final purpose of the method is to provide help for subsequent character recognition and semantic understanding. As an important component in a character recognition system, the natural scene character detection technology can assist people in understanding natural scene contents. The natural scene character detection is used as the first step of finishing image acquisition post-processing by a natural scene character recognition system, and the detection performance is directly related to the recognition rate of the whole system. Therefore, how to detect the characters quickly and accurately is a very critical problem in the technology of character recognition in natural scenes.
At present, there are two main algorithms for text detection of pictures: a method based on a sliding window and a method based on a connected region. The method based on the sliding window is that all possible positions of a picture are scanned through a sliding sub-window with variable size to detect text information, and a trained classifier is used for judging whether the text information exists in the window; secondly, the method based on the connected region firstly rapidly segments text and non-text pixels through a bottom filter, and then connects the text pixels with similar attributes to form a text component. Such methods view text in an image as some particular areas or with some particular textural features. First, we can use some features or methods to extract candidate regions in natural images as candidates for text, and these features include color features, texture features, edge features, stroke width transformation, extremum regions, and so on. And filtering out candidate regions without characters after screening, regarding the remaining regions as characters and fusing the characters into text line candidates, and screening the text line candidates to obtain a final text detection result. The filtering and screening method can select a threshold value for screening through manually designing the characteristics, or learn the characteristics by using a statistical model or a machine learning algorithm, and adaptively screen the character candidate region.
Stroke Width Transformation (SWT) and Most Stable Extremum Region (MSER) algorithms are representative of the second class of methods, and are also the predominant classical algorithms in recent years.
The method for extracting text candidates by SWT (Stroke Width Transform) is based on a series of general assumptions: characters are all formed by strokes, the strokes have certain widths, the stroke widths of the same line of text are close, and non-character parts are not formed by strokes, so that the stroke widths are not available. Based on the assumption, the stroke width transformation is carried out on the image, the width value of the stroke where each pixel point in the input image is located is calculated, and the connected region is used as a character candidate.
MSER (maximally stable extremal region) methods use MSER regions that are regions that can maintain shape and size over a range of gray level thresholds. They have sharp edges and a strong grey value contrast with the background. Generally, due to morphological characteristics, characters contain rich edge information, and characters are used as an information transmission mode, so that the characters have strong color and gray value contrast with a background in order to be seen clearly, and therefore the characters are basically MSER regions.
The existing method has the following defects:
(1) the sliding window needs multi-scale traversal images and judges each detection window, so that the detection time is long and the efficiency is low.
(2) Lack of precision and difficulty in coping with complex backgrounds.
SWT is largely affected by noise and blurring because SWT is based on successful edge detection and then detection is performed according to the character stroke width. When the background is complex and the edge is not detected, the method fails. Meanwhile, the method can falsely detect a plurality of objects with regular lines and similar characters, such as rings, grids, bricks and the like, as the characters. The shooting requirements of users on multiple scenes in the natural environment cannot be met.
MSER does not work well for fuzzy, illumination, color, texture variations, low contrast word processing.
The SWT method and the MSER method both detect single characters, detection results are inconvenient for an OCR module to use, detected single characters need to be combined according to character spacing, height difference and other characteristics, and calculated amount is increased.
Disclosure of Invention
The invention provides a text detection method of a certificate image in a natural scene, which is used for detecting text region information in the certificate image shot by a user in the natural scene and outputting an independent text line region in the image, and can tolerate the conditions of distortion, inclination, rotation, light change, complex background and the like of the image at a certain angle.
In order to achieve the purpose, the invention adopts the following technical scheme:
a text detection method of a certificate image in a natural scene comprises the following steps:
step 4, processing the image by adopting the trained full convolution neural network model to obtain a text-non-text probability distribution map, and obtaining the classification condition of each pixel point by a maximum probability method to form a text-non-text binary image;
step 5, obtaining a text region by using a region connection method according to the text-non-text region binary image;
step 6, binarizing the original image, and extracting only the character information in the text region in the text-non-text region binary image in the step 5 to obtain a text binary image;
step 7, rotating the text image obtained in the step 6 by different angles, transversely projecting, and correcting the image by a maximum variance method;
and 8, projecting the corrected image again, judging the horizontal/vertical of the region according to the number of horizontal (vertical) pixel points of the region, dividing character lines, and refining the text-non-text region binary image obtained in the step 5.
The method fuses convolution characteristics of different layers by performing convolution processing on the image, and does not need a sliding window of multiple sizes to traverse the image. The result is more accurate due to the pixel-by-pixel prediction. The features are extracted through convolution, the network structure is a full convolution neural network, a full connection layer is not provided, and the processing speed can reach real time. By fusing the characteristics of different convolution layers, the space characteristic and the texture characteristic are provided. Under the condition that the image background is complex, the text area can still be well detected. The training sample is enlarged in a Poisson cloning mode, so that overfitting of the model is effectively prevented, and scenes of the training sample are enriched.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 shows a network structure of a convolution layer portion shared by VGG16 and a text region detection model;
FIG. 3 is a network structure of VGG 16;
FIG. 4 is an overall network structure of a text detection model;
FIG. 5a is a sample view of a document image;
FIG. 5b is a background image to be fused;
FIG. 5c is a fused document image;
FIG. 5d is a text-to-non-text binary image of full convolution neural network prediction;
FIG. 5e is a certificate text information graph derived from a text-to-non-text binary image;
FIG. 5f is a diagram of the document text information after maximum variance correction;
FIG. 5g is a binary image of the refined text region;
FIG. 5h is a text region map of the rectified certificate image determined from the refined text region binary map.
Detailed Description
As shown in fig. 1, the present invention provides a text detection method for a certificate image in a natural scene, which includes the following steps:
And 2, the neural network model has a plurality of parameters, and a large amount of data is required for training to prevent overfitting. Due to the high cost of the labeled sample, the limited labeled sample needs to be expanded. And randomly rotating the marked certificate image by a rotation angle rotate belonging to [ -30,30 ]. And randomly clipping, wherein the width and height of the original image are in an order of 0.7 Xwidth, width and height of the new image, and the newHeight belongs to an order of 0.7 Xheight, height. Random Gaussian blur, KernelSize ∈ [3,9], semma ∈ [1,9 ]. Converting BGR image into HSV representation, after separating channels, adding a random value hue _ vari to image brightness H, hue _ vari belongs to-8, saturation S is multiplied by sat _ vari randomly, sat _ vari belongs to 0.5,1.5, brightness V is multiplied by val _ vari randomly, val _ vari belongs to 0.7,1.3, random gamma transformation, gamma belongs to 0.5, 2.0.
Image pixel value pixel according to gamma tableiAnd (6) mapping. And then randomly fusing the images of different backgrounds in a Poisson cloning modeIf fig. 5a and fig. 5b are fused, fig. 5c is obtained, which can enrich the sample and the image scene. And 2, in the data set 2, the training image in the data set 2 is a text-containing image, and the label is a binary image of a text-non-text region with a corresponding size.
And 3, reforming the picture in the data set 1 into a fixed size of 28 multiplied by 28 pixels, normalizing the pixel value to be between 0 and 1, and inputting the pixel value into a VGG16 neural network model, wherein the network structure of the VGG16 model is shown in FIG. 2. Pre-training a VGG16 neural network by adopting a gradient descent mode, wherein neurons in the neural network adopt a ReLU activation function:
f(x)=max(0,WTx+b)
wherein, WTThe parameter to be trained in the neuron is input by the x neuron, and the parameter b to be biased in the neuron. The loss function of the VGG16 neural network is a softmax cross-entropy loss function L (y)i,Hi):
Where m is the number of samples in a batch processed in batches, fiActual predicted value, y, output for the ith sample in the training batchiThe true value of the ith sample in the training batch.
Step 4, when the accuracy of the VGG16 model in the step 3 reaches a certain degree, stopping pre-training, removing the full connection layer of the VGG16 model, adding convolution layers with two convolution kernels of 1 × 1, and discarding partial parameters (W) with a probability of 0.5TAnd b). And changing the size of the convolution layer by means of transposition convolution, and fusing the pool-4 and pool-3 convolution layers pixel by pixel. And finally, the convolution layer output is reformed into the size of the original image through transposition convolution. After two discarded layers, all multi-channel layers are first reduced to 2 channels by the 1 × 1 convolutional layer and then operated on. The FCN model structure is shown in fig. 3. The model is trained in a gradient descent mode,the exponential decay adjusts the step length, and the activation function of the neuron in the neural network model still adopts the ReLU activation function:
f(x)=max(0,WTx+b)
wherein, WTThe parameter to be trained in the neuron is input by the x neuron, and the parameter b to be biased in the neuron.
The loss function adopts a softmax cross entropy loss function.
Where m is the number of samples in a batch processed in batches, M, N is the length and width dimensions of the input image, fijIs the predicted value, y, of the jth pixel of the ith sample in the training batchijIs the true value of the jth pixel of the ith sample in the training batch.
In the design of the FCN, a convolution kernel with the size of 3 multiplied by 3 is adopted in a full convolution network, two convolution layers are cascaded, the reception field is equal to the convolution kernel with the size of 5 multiplied by 5, and the reception field is equal to the convolution kernel with the size of 7 multiplied by 7 when three convolution layers are cascaded. The number of parameters to be trained is reduced while the receptive field is increased. And a 1 × 1 convolution is introduced to effectively replace a full connection layer, and partial parameters are discarded, so that model overfitting is prevented, data dimensionality is reduced, and the calculation amount is reduced. The training parameters of the convolution layer and the pooling layer of the VGG16 model are used in the training, so that the convergence speed of the model is greatly increased, and the training time is shortened. Because the FCN network structure is not provided with a full connection layer and is a full convolution network, the input image can be in any size, and the conditions that the image is distorted and deformed and cannot be detected when the image is reformed into a fixed size are eliminated. The FCN model is a method for predicting text-non-text pixel by pixel of an image, and compared with a method of a sliding window and a connected region, the detection precision is higher.
And 5, normalizing the image pixel values in the data set 2 to be between 0 and 1, and inputting the image pixel values into the FCN model in the step 4 for training. As shown in FIG. 4, the FCN model parameters are pre-trained by selecting the convolutional layer and pooling layer parameters of VGG16 in step 3, and the truncated normal is used for newly adding layersThe random number of the distribution is initialized. The model output is a text-to-non-text probability map of the input image. For any pixel of the input imageijGiving the probability p of being a pixel of the text regionTrue(pixelij) And probability P of pixels in non-text regionFalse(pixelij) By comparing the probability of a pixel text-to-non-text. If:
PTrue(pixelij)>PFalse(pixelij)
the pixel is considered to be pixelijAnd the pixels belong to the text area, otherwise, the pixels belong to the non-text area. And marking the pixels of the text region as 1 and the pixels of the non-text region as 0 to finally obtain a text-non-text distribution graph of the whole image, and calculating the cross entropy of the text-non-text distribution graph and the label as shown in fig. 5 d.
And 6, clustering pixels of the text area in a connected area mode, so that the area of the image text can be well detected. However, this method cannot accurately segment text lines, many lines are stuck together in the obtained text regions, and the purpose of text detection is to output independent text line regions. Therefore, the certificate image is binarized, and only the text information in the text region is intercepted according to the text-non-text distribution diagram in step 5, so as to obtain a binarized image only containing the text information, as shown in fig. 5 e.
And 7, a certain rotation angle possibly exists in the image shot by the user, and when the background of the image is complex, the correction effect of the certificate can be greatly influenced. After the detection model processing, the information of the text area is obtained, the text can be extracted from the image, and the certificate is corrected by eliminating the interference of a complex background. In the certificate image, the characters in the text region are distributed according to lines, and obvious blanks exist among the lines, so that the closer the projection direction is to the direction of the certificate image, the more obvious the difference value between the peaks and the troughs of the formed projection curve is, and the larger the variance of the projection of the text region is. And (3) rotating and projecting the text binary image (with the size of N multiplied by M) in the step 6 for multiple times, wherein the rotating angle when the image projection variance is maximum is the correcting angle of the image. So that the image is horizontally projected and the horizontal projection is recordedSum of pixel points of i rows of coordinates is sumi:
Wherein I is an indicator function when pixelijWhen the e text exists, I is 1, otherwise, I is 0. All row means are
By rotating the image about a central point by different angles thetakAnd calculating the variance. The image text region projection variance is:
when the variance is maximum, the corresponding rotation angle thetakI.e. the tilt angle theta of the image.
Rectify the image and text-to-non-text region map, as shown in fig. 5 f.
And 8, detecting the condition that the text-non-text distribution diagram detected by the model has character line adhesion, and particularly, detecting the condition that the gaps among lines in the certificate are small. The adhesion of the text lines will seriously affect the accuracy of subsequent character recognition, so the adhered text lines are divided to output independent text line areas in the image. After the image correction in step 7, the text lines in the document are substantially horizontal or vertical. And judging the horizontal/vertical information of the text in each text region according to the number of pixel points projected by each line/column in each text region for the horizontal/vertical projection of the corrected image. For example, when the length of the longitudinal projection of the outline is much longer than that of the transverse projection, the text in the text region is considered to be in the horizontal direction. And then projecting according to the vertical direction of the text direction. And determining the peak point and the valley point of each projection curve according to the variation trend of the number of the projection pixel points, wherein the peak point is an extreme point of which the median of the projection curve is greater than the value of the surrounding points, and the valley point is an extreme point of which the median of the projection curve is less than the value of the surrounding points. The segmentation of the lines of text relies primarily on finding the valleys. To eliminate false valleys, statistics can be performed based on the average of the sums of the first 5 rows of pixel points. If it is
The pixel is considered to be a non-text region. Accordingly, the text line is segmented, and the text-non-text binary image is refined to obtain the position information of the text line region, as shown in fig. 5g and 5 h.
Claims (1)
1. A text detection method of a certificate image in a natural scene is characterized by comprising the following steps:
step 1, establishing a training data set: selecting common Chinese characters, manufacturing Chinese character pictures by adopting different font types to form a data set 1, wherein training images in the data set 1 are the Chinese characters with different font types, and labels are designated labels corresponding to the Chinese characters;
step 2, randomly rotating, cutting, blurring, reversing, changing brightness and gamma changing the marked certificate image, and fusing different background images in a Poisson clone mode to form a data set 2, wherein the training image in the data set 2 is a text image, and the label is a text-non-text binary image with a corresponding size;
step 3, training a character classification model of a VGG16(Visual Geometry Group-16Net) network by using a data set 1, removing a full connection layer of the VGG16 network after the model is converged, changing the model into a full convolution neural network (FCN), initializing a full convolution neural network model by using the obtained VGG16 character classification model parameters, and training the full convolution neural network model by using a data set 2;
the step 3 specifically comprises the following steps: reforming the picture in the data set 1 into a fixed size of 28 × 28 pixels, normalizing the pixel value to be between 0 and 1, inputting the pixel value into a VGG16 neural network model, and pre-training a VGG16 neural network in a gradient descent mode, wherein neurons in the neural network adopt a ReLU activation function:
f(x)=max(0,WTx+b)
wherein, WTThe loss function of the VGG16 neural network is softmax cross entropy loss function L (y)i,Hi):
Where m is the number of samples in a batch processed in batches, fiActual predicted value, y, output for the ith sample in the training batchiThe real value of the ith sample in the training batch;
stopping pre-training when the accuracy of the VGG16 model reaches a preset degree, removing a full connection layer of the VGG16 model, adding convolution layers with two convolution kernels of 1 multiplied by 1, and discarding partial parameters with a probability of 0.5; changing the size of the convolution layer by means of transposition convolution, and fusing the pool-4 and pool-3 convolution layers pixel by pixel; finally, outputting and reforming the convolution layer into the size of the original image through transposition convolution; after two discarding layers, all multi-channel layers are firstly reduced to 2 channels through the convolution layer of 1 multiplied by 1 and then operated; training the model by adopting a gradient descent mode, wherein the activation function of the neurons in the neural network model still adopts a ReLU activation function:
f(x)=max(0,WTx+b)
wherein, WTIs the parameter to be trained in the neuron, x neuron input, b is the parameter to be biased in the neuron,
the loss function adopts a softmax cross entropy loss function,
where m is the number of samples in a batch processed in batches, M, N is the length and width dimensions of the image, fijIs the predicted value, y, of the jth pixel of the ith sample in the training batchijThe real value of the jth pixel of the ith sample in the training batch is obtained;
step 4, processing the image by adopting the trained full convolution neural network model to obtain a text-non-text probability distribution map, and obtaining the classification condition of each pixel point by a maximum probability method to form a text-non-text binary image;
step 5, obtaining a text region by using a region connection method according to the text-non-text region binary image;
the step 5 specifically comprises the following steps: normalizing the image pixel values in the data set 2 to be between 0 and 1, inputting the image pixel values into the FCN model in the step 4 for training, selecting the convolution layer and pooling layer parameters of VGG16 in the step 3 for pre-training the FCN model parameters, and initializing a new added layer by using a random number of truncated normal distribution; the FCN model output is a text-to-non-text probability map of the image; for any pixel of an imageijGiving the probability P of being a pixel of the text regionTrue(pixelij) And probability P of pixels in non-text regionFalse(pixelij) Comparing the probability of pixel text-non-text, if:
PTrue(pixelij)>PFalse(pixelij)
the pixel is considered to be pixelijIf the pixel belongs to the text area, otherwise, the pixel belongs to the non-text area, the pixel of the text area is marked with 1, the pixel of the non-text area is marked with 0, and finally the pixel is obtainedA text-to-non-text profile of the entire image;
step 6, binarizing the image, and extracting only character information in a text region in the text-non-text region binary image in the step 5 to obtain a text binary image;
step 7, rotating the text image obtained in the step 6 by different angles, transversely projecting, and correcting the image by a maximum variance method;
the step 7 specifically comprises the following steps: and (4) rotating and projecting the text binarization image in the step (6) for multiple times, wherein the rotating angle when the image projection variance is maximum is the correction angle of the image, so that the image is horizontally projected in the transverse direction, and the sum of pixel points of i lines of the vertical coordinate after horizontal projection is recorded as sumi:
Wherein I is an indication function when pixelijWhen the e belongs to the text, I is 1, otherwise, I is 0, and all line mean values are
By rotating the image about a central point by different angles thetakCalculating the variance, wherein the image text region projection variance is as follows:
when the variance is maximum, the corresponding rotation angle thetakThat is, the tilt angle θ of the image:
step 8, projecting the corrected image again, judging the horizontal/vertical of the region according to the number of horizontal/vertical pixel points of the region, dividing character lines, and finely modifying the text-non-text region binary image obtained in the step 5;
the step 8 specifically comprises the following steps: judging the horizontal/vertical information of the text in each text region according to the number of pixel points projected by each line/column in each text region for the horizontal/vertical projection of the corrected image; when the length of the longitudinal projection of the outline is far greater than that of the transverse projection, the text in the text area is considered to be in the horizontal direction, then the text is projected in the vertical direction of the text direction, and according to the variation trend of the number of the projection pixel points, the peak point and the valley point of each projection curve are determined, wherein the peak point is an extreme point of which the median of the projection curve is greater than the surrounding value, and the valley point is an extreme point of which the median of the projection curve is less than the surrounding value; the character line is divided by searching for a valley, and in order to eliminate false valleys, statistics can be carried out according to the average value of the sum of the first 5 lines of pixel points, and if the sum is not equal to the false valley, the statistics is carried out
And considering the pixel line as a text-free area, segmenting the text line according to the text-free area, and refining the text-non-text binary image to obtain the position information of the text line area.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710854505.8A CN107609549B (en) | 2017-09-20 | 2017-09-20 | Text detection method for certificate image in natural scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710854505.8A CN107609549B (en) | 2017-09-20 | 2017-09-20 | Text detection method for certificate image in natural scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107609549A CN107609549A (en) | 2018-01-19 |
CN107609549B true CN107609549B (en) | 2021-01-08 |
Family
ID=61061405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710854505.8A Active CN107609549B (en) | 2017-09-20 | 2017-09-20 | Text detection method for certificate image in natural scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107609549B (en) |
Families Citing this family (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108376244B (en) * | 2018-02-02 | 2022-03-25 | 北京大学 | Method for identifying text font in natural scene picture |
CN108229463A (en) * | 2018-02-07 | 2018-06-29 | 众安信息技术服务有限公司 | Character recognition method based on image |
CN108280839A (en) * | 2018-02-27 | 2018-07-13 | 北京尚睿通教育科技股份有限公司 | A kind of operation framing and dividing method and its device |
CN108681729B (en) * | 2018-05-08 | 2023-06-23 | 腾讯科技(深圳)有限公司 | Text image correction method, device, storage medium and equipment |
CN108694393A (en) * | 2018-05-30 | 2018-10-23 | 深圳市思迪信息技术股份有限公司 | A kind of certificate image text area extraction method based on depth convolution |
CN110569835B (en) * | 2018-06-06 | 2024-03-05 | 北京搜狗科技发展有限公司 | Image recognition method and device and electronic equipment |
CN108921163A (en) * | 2018-06-08 | 2018-11-30 | 南京大学 | A kind of packaging coding detection method based on deep learning |
CN109035216A (en) * | 2018-07-06 | 2018-12-18 | 北京羽医甘蓝信息技术有限公司 | Handle the method and device of cervical cell sectioning image |
CN109214322A (en) * | 2018-08-27 | 2019-01-15 | 厦门哲林软件科技有限公司 | A kind of optimization method and system of file and picture visual effect |
CN109271910A (en) * | 2018-09-04 | 2019-01-25 | 阿里巴巴集团控股有限公司 | A kind of Text region, character translation method and apparatus |
WO2020047756A1 (en) * | 2018-09-04 | 2020-03-12 | 深圳市大疆创新科技有限公司 | Image encoding method and apparatus |
CN109344727B (en) * | 2018-09-07 | 2020-11-27 | 苏州创旅天下信息技术有限公司 | Identity card text information detection method and device, readable storage medium and terminal |
CN109522975A (en) * | 2018-09-18 | 2019-03-26 | 平安科技(深圳)有限公司 | Handwriting samples generation method, device, computer equipment and storage medium |
CN109409363B (en) * | 2018-10-13 | 2021-11-12 | 长沙芯希电子科技有限公司 | Content-based text image inversion judgment and correction method |
CN109492630A (en) * | 2018-10-26 | 2019-03-19 | 信雅达系统工程股份有限公司 | A method of the word area detection positioning in the financial industry image based on deep learning |
CN109583367A (en) * | 2018-11-28 | 2019-04-05 | 网易(杭州)网络有限公司 | Image text row detection method and device, storage medium and electronic equipment |
CN109582946B (en) * | 2018-11-28 | 2019-10-25 | 龙马智芯(珠海横琴)科技有限公司 | The determination method and device of character area writing direction |
CN111259878A (en) * | 2018-11-30 | 2020-06-09 | 中移(杭州)信息技术有限公司 | Method and equipment for detecting text |
CN109635714B (en) * | 2018-12-07 | 2023-05-30 | 光典信息发展有限公司 | Correction method and device for document scanning image |
CN111325194B (en) * | 2018-12-13 | 2023-12-29 | 杭州海康威视数字技术股份有限公司 | Character recognition method, device and equipment and storage medium |
CN110032997B (en) * | 2019-01-07 | 2021-02-19 | 武汉大学 | Natural scene text positioning method based on image segmentation |
CN110059539A (en) * | 2019-02-27 | 2019-07-26 | 天津大学 | A kind of natural scene text position detection method based on image segmentation |
CN109902680A (en) * | 2019-03-04 | 2019-06-18 | 四川长虹电器股份有限公司 | The detection of picture rotation angle and bearing calibration based on convolutional neural networks |
CN109993164A (en) * | 2019-03-20 | 2019-07-09 | 上海电力学院 | A kind of natural scene character recognition method based on RCRNN neural network |
CN109977950A (en) * | 2019-03-22 | 2019-07-05 | 上海电力学院 | A kind of character recognition method based on mixing CNN-LSTM network |
CN111723627B (en) * | 2019-03-22 | 2024-07-23 | 北京搜狗科技发展有限公司 | Image processing method and device and electronic equipment |
CN110175520A (en) * | 2019-04-22 | 2019-08-27 | 南方电网科学研究院有限责任公司 | Text position detection method and device for robot inspection image and storage medium |
CN110210297B (en) * | 2019-04-25 | 2023-12-26 | 上海海事大学 | Method for locating and extracting Chinese characters in customs clearance image |
CN110738201B (en) * | 2019-04-25 | 2024-04-19 | 上海海事大学 | Self-adaptive multi-convolution neural network character recognition method based on fusion morphological characteristics |
CN109948598B (en) * | 2019-05-15 | 2019-09-06 | 达而观信息科技(上海)有限公司 | Document layout intelligent analysis method and device |
CN110222680A (en) * | 2019-05-19 | 2019-09-10 | 天津大学 | A kind of domestic waste article outer packing Method for text detection |
CN112001406B (en) * | 2019-05-27 | 2023-09-08 | 杭州海康威视数字技术股份有限公司 | Text region detection method and device |
CN110276279B (en) * | 2019-06-06 | 2020-06-16 | 华东师范大学 | Method for detecting arbitrary-shape scene text based on image segmentation |
CN112241736B (en) * | 2019-07-19 | 2024-01-26 | 上海高德威智能交通系统有限公司 | Text detection method and device |
CN110415177A (en) * | 2019-07-26 | 2019-11-05 | 四川长虹电器股份有限公司 | A kind of image rotation sanction drawing method based on Java |
CN110458238A (en) * | 2019-08-02 | 2019-11-15 | 南通使爱智能科技有限公司 | A kind of method and system of certificate arc point detection and positioning |
CN110610166B (en) * | 2019-09-18 | 2022-06-07 | 北京猎户星空科技有限公司 | Text region detection model training method and device, electronic equipment and storage medium |
CN110751154B (en) * | 2019-09-27 | 2022-04-08 | 西北工业大学 | Complex environment multi-shape text detection method based on pixel-level segmentation |
CN112836696B (en) * | 2019-11-22 | 2024-08-02 | 北京搜狗科技发展有限公司 | Text data detection method and device and electronic equipment |
CN111062264A (en) * | 2019-11-27 | 2020-04-24 | 重庆邮电大学 | Document object classification method based on dual-channel hybrid convolution network |
CN111310746B (en) * | 2020-01-15 | 2024-03-01 | 支付宝实验室(新加坡)有限公司 | Text line detection method, model training method, device, server and medium |
CN111260586B (en) | 2020-01-20 | 2023-07-04 | 北京百度网讯科技有限公司 | Correction method and device for distorted document image |
CN111368842A (en) * | 2020-02-29 | 2020-07-03 | 贵州电网有限责任公司 | Natural scene text detection method based on multi-level maximum stable extremum region |
CN111428710A (en) * | 2020-03-16 | 2020-07-17 | 五邑大学 | File classification collaboration robot and image character recognition method based on same |
CN111461122B (en) * | 2020-05-18 | 2024-03-22 | 南京大学 | Certificate information detection and extraction method |
CN111709420B (en) * | 2020-06-18 | 2022-06-24 | 北京易真学思教育科技有限公司 | Text detection method, electronic device and computer readable medium |
CN111797922B (en) * | 2020-07-03 | 2023-11-28 | 泰康保险集团股份有限公司 | Text image classification method and device |
CN111967469B (en) * | 2020-08-13 | 2023-12-15 | 上海明略人工智能(集团)有限公司 | Method and system for correcting malformed text and character recognition method |
CN112528776B (en) * | 2020-11-27 | 2024-04-09 | 京东科技控股股份有限公司 | Text line correction method and device |
CN113011409A (en) * | 2021-04-02 | 2021-06-22 | 北京世纪好未来教育科技有限公司 | Image identification method and device, electronic equipment and storage medium |
CN112990220B (en) * | 2021-04-19 | 2022-08-05 | 烟台中科网络技术研究所 | Intelligent identification method and system for target text in image |
CN113033558B (en) * | 2021-04-19 | 2024-03-19 | 深圳市华汉伟业科技有限公司 | Text detection method and device for natural scene and storage medium |
CN113505536A (en) * | 2021-07-09 | 2021-10-15 | 兰州理工大学 | Optimized traffic flow prediction model based on space-time diagram convolution network |
CN114219946B (en) * | 2021-12-29 | 2022-11-15 | 北京百度网讯科技有限公司 | Text image binarization method and device, electronic equipment and medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140003723A1 (en) * | 2012-06-27 | 2014-01-02 | Agency For Science, Technology And Research | Text Detection Devices and Text Detection Methods |
CN106384112A (en) * | 2016-09-08 | 2017-02-08 | 西安电子科技大学 | Rapid image text detection method based on multi-channel and multi-dimensional cascade filter |
CN107066972A (en) * | 2017-04-17 | 2017-08-18 | 武汉理工大学 | Natural scene Method for text detection based on multichannel extremal region |
-
2017
- 2017-09-20 CN CN201710854505.8A patent/CN107609549B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140003723A1 (en) * | 2012-06-27 | 2014-01-02 | Agency For Science, Technology And Research | Text Detection Devices and Text Detection Methods |
CN106384112A (en) * | 2016-09-08 | 2017-02-08 | 西安电子科技大学 | Rapid image text detection method based on multi-channel and multi-dimensional cascade filter |
CN107066972A (en) * | 2017-04-17 | 2017-08-18 | 武汉理工大学 | Natural scene Method for text detection based on multichannel extremal region |
Also Published As
Publication number | Publication date |
---|---|
CN107609549A (en) | 2018-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107609549B (en) | Text detection method for certificate image in natural scene | |
CN111325203B (en) | American license plate recognition method and system based on image correction | |
US10817741B2 (en) | Word segmentation system, method and device | |
CN109086714A (en) | Table recognition method, identifying system and computer installation | |
US11295417B2 (en) | Enhancing the legibility of images using monochromatic light sources | |
US20140112582A1 (en) | Character recognition | |
US20140193029A1 (en) | Text Detection in Images of Graphical User Interfaces | |
CN109389115B (en) | Text recognition method, device, storage medium and computer equipment | |
CN110689003A (en) | Low-illumination imaging license plate recognition method and system, computer equipment and storage medium | |
Gilly et al. | A survey on license plate recognition systems | |
Biswas et al. | A global-to-local approach to binarization of degraded document images | |
WO2022121021A1 (en) | Identity card number detection method and apparatus, and readable storage medium and terminal | |
Polyakova et al. | Improvement of the Color Text Image Binarization Methodusing the Minimum-Distance Classifier | |
Khan et al. | Text detection and recognition on traffic panel in roadside imagery | |
US9269126B2 (en) | System and method for enhancing the legibility of images | |
CN112712080B (en) | Character recognition processing method for acquiring image by moving character screen | |
Vu et al. | Automatic extraction of text regions from document images by multilevel thresholding and k-means clustering | |
CN112784830A (en) | Character recognition method and device | |
Zhang et al. | Improving optical character recognition accuracy for indonesia identification card using generative adversarial network | |
Ghavidel et al. | Natural scene text localization using edge color signature | |
Sagum | Incorporating deblurring techniques in multiple recognition of license plates from video sequences | |
CN111598080B (en) | Binary division tree license plate recognition method based on convex polyhedron piecewise linear classification | |
Zhu et al. | Robust text segmentation in low quality images via adaptive stroke width estimation and stroke based superpixel grouping | |
Elmore et al. | A morphological image preprocessing suite for ocr on natural scene images | |
Martyshkin et al. | Research of the Handwriting Recognition Methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |