CN108038486A - A kind of character detecting method - Google Patents

A kind of character detecting method Download PDF

Info

Publication number
CN108038486A
CN108038486A CN201711267804.8A CN201711267804A CN108038486A CN 108038486 A CN108038486 A CN 108038486A CN 201711267804 A CN201711267804 A CN 201711267804A CN 108038486 A CN108038486 A CN 108038486A
Authority
CN
China
Prior art keywords
value
region
representing
character
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711267804.8A
Other languages
Chinese (zh)
Inventor
巫义锐
黄多辉
冯钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201711267804.8A priority Critical patent/CN108038486A/en
Publication of CN108038486A publication Critical patent/CN108038486A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of character detecting method, this method includes:The extremal region of word picture to be detected is extracted, extremal region is filtered, obtains character candidates region;MSSH features, depth convolution feature are calculated, by own coding neutral net by MSSH features, depth convolution Fusion Features, obtains fusion feature;Character zone is further filtered out from character candidates region according to fusion feature;Merge all character zones and obtain final character area.Detection method has very strong robustness, and detection efficiency is high, can be rapidly completed text detection task.

Description

Character detection method
Technical Field
The invention relates to a character detection method.
Background
Characters play an important role in human life as one of the most influential inventions of human beings. The abundant and accurate information contained in the characters has great significance for natural scene understanding application based on visual semantics. More and more multimedia applications, such as street scene understanding, unmanned vehicle understanding of traffic signs, and semantic-based image retrieval, require accurate and robust text detection. The basic task of text detection is to determine whether text is present in the scene image and video, and if so, to mark its location. In recent years, as the capability and number of image capturing devices have increased, the number of images and videos containing scene text has increased dramatically compared to the past. Therefore, there has been an increasing interest in text detection in images and videos of natural scenes. With the gradual and intensive research of computer vision related technology, how to utilize computer algorithm to detect scene characters has become one of important and active international leading topics.
Scene text detection and recognition of low quality and complex backgrounds is extremely challenging. Scene characters often have the characteristics of low resolution, complex background, arbitrary direction, perspective deformation, uneven illumination and the like, and document characters have a uniform format and a single background.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a character detection method, and solves the technical problems of low success rate and low robustness of character detection in the prior art.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a character detection method comprises the following steps:
extracting an extreme value area of the character picture to be detected, and filtering the extreme value area to obtain a character candidate area;
calculating MSSH characteristics and deep convolution characteristics, and fusing the MSSH characteristics and the deep convolution characteristics through a self-coding neural network to obtain fused characteristics;
further screening out a character region from the character candidate region according to the fusion characteristics;
and merging all the character areas to obtain a final character area.
The specific method for extracting the extremum region is as follows:
converting the character picture to be detected into a gray level image I gray R value graph I R G-value graph I G And B-value graph I B
Are respectively to I R ,I G ,I B Obtaining an extremum region specifically as follows:
r value graph I R Extreme value region A of R Is defined as:
wherein I R (p) representing the value of pixel p in the R-value graph; i is R (q) representing the value of a pixel point q in the R value graph; θ represents a threshold value of the extremum region;representation and extremum area A R Adjacent but not belonging to the extremum region A R A set of pixels of (a);
g-value graph I G Extreme value region A of G Is defined as:
wherein I G (p) representing the value of pixel p in the G-value graph; I.C. A G (q) representing the value of a pixel point q in the G-value graph; theta denotes the threshold value of the extremum region,representation and extremum area A G Adjacent but not belonging to the extremum region A G A set of pixels of (a);
b-value graph I B Extreme value region A of B Is defined as follows:
wherein I B (p) representing the value of pixel p in the B-value map; i is B (q) representing the value of a pixel point q in the B-value graph; theta denotes the threshold value of the extremum region,representation and extremum region A B Adjacent but not belonging to the extremum region A B The set of pixels of (a).
The method for acquiring the character candidate region comprises the following steps:
calculating the area S, the perimeter C, the Euler number E and the pixel value variance H of each extreme value region, wherein the pixel value variance H is obtained by passing through a gray level image I gray Calculated, the calculation formula is as follows:
wherein: x represents a pixel point; I.C. A gray (x) Representing the gray value of the pixel point x; a represents a color interval with the maximum number of pixels in the extreme value area; b represents a color interval with a plurality of pixels in the extreme value area; n is a radical of an alkyl radical a Representing the number of pixels in the color interval a in the extremum region; n is a radical of an alkyl radical b Representing the number of pixels in the color interval b in the extremum region; r a Representing a set of pixels in the color interval a in the extremum region; r is b Representing a set of pixels in the extremum region in the color interval b; mu.s a Representing an average value of pixel values in the color interval a in the extremum region; mu.s b Representing an average value of pixel values in the color interval b in the extremum region;
redundant extremum regions are filtered through the area S, the perimeter C, the Euler number E and the pixel value variance H of each extremum region, and the rest regions after the redundant extremum regions are filtered are character candidate regions, wherein the filtering conditions are as follows:
wherein S is 0 A threshold value representing an area S of the extremum region; c 0 A threshold value representing the perimeter of the extremum region; e 0 A threshold value representing an extreme region Euler number; h 0 A threshold value representing the variance of the extremum region pixel values.
The specific method for calculating the MSSH characteristics is as follows:
acquiring a stroke pixel pair and a stroke line segment of a character candidate area;
calculating the symmetrical feature description value of a certain stroke pixel pair in the character candidate region on the gray value and gradient attributes;
calculating the symmetrical characteristic description of all stroke line segments in the character candidate area on stroke width values, stroke sequence value distribution and low-frequency mode attributes;
connecting the characteristic values of different symmetric attributes to form MSSH characteristics, wherein the specific formula is as follows:
F m (e i )=[F j |=V,G m ,G o ,Sw,Md,Pa]
wherein: f m (e i ) Values representing MSSH feature vectors; []Representing a vector join operation; e.g. of a cylinder i Representing an ith character candidate region; f j Representing a feature vector corresponding to the symmetric attribute; j represents a specific type of symmetry property; v represents a gray value; g m Representing a gradient magnitude attribute; g o Representing a gradient direction attribute; sw represents a stroke width value; md represents the stroke sequence value distribution; pa denotes a low frequency mode attribute.
The specific method for acquiring the stroke pixel pairs and the stroke line segments of the character candidate area comprises the following steps:
outputting an edge image by using a Canny edge detection operator;
calculating the gradient direction of a certain pixel point p on the stroke edge image;
following the ray r determined by the gradient direction until the ray meets another stroke edge pixel point q;
the stroke pixel pair is defined as { p, q }, and the stroke line segment is defined as the distance of ray r between pixel points p and q.
The specific method for calculating the deep convolution characteristics is as follows:
adjusting the size of the character candidate area to 64 x 64 pixel values;
constructing a convolutional neural network model comprising three stages;
the first-stage construction method comprises the following steps:
in the first stage, two convolution layers and a maximum pooling layer are sequentially used, wherein the convolution layers all adopt 32 convolution kernels with the size of 3 × 3, 1 pixel is displacement offset, and convolution operation is carried out on the convolution kernels and the character candidate area, and the specific formula is as follows:
wherein g (a, b, k) represents the value of the pixel value of the line a and the column b in the character candidate area after the k convolution operation; e.g. of a cylinder i (a + m, b + n) represents the (a + m) th row and (b + n) th column pixel value in the ith character candidate region; m represents the row offset of the pixel, n represents the column offset of the pixel, and the value set is { -1,0,1}, h k Represents the kth convolution kernel; after each convolution layer is operated, a nonlinear activation function is used for calculating an activation value, and the specific formula is as follows:
f(a,b,k)=max(0,g(a,b,k))
f (a, b, k) represents an activation value of a line a and a column b pixel values in the character candidate area after a kth convolution operation; max () represents a large value taking function;
the activation value is then transmitted to a maximum pooling layer, which takes 2 pixels as a stride and takes the maximum value in a 2 × 2 spatial neighborhood as an output value;
the architecture of the second stage is the same as that of the first stage;
the three-stage sequence uses three convolutional layers, a maximum pooling layer and a full-link layer, wherein the full-link layer connects the output of the maximum pooling layer into a one-dimensional vector as input and controls the output to be 128-dimensional, and the formula can be expressed as follows:
F d =W·X+B
wherein: f d For the generated 128-dimensional depth convolution characteristics, X is a one-dimensional vector obtained after connecting the output of the maximum pooling layer, W is a weight matrix, and B is an offset vector;
training and testing the convolution neural network model, and determining the unknown parameter h through training k W and B, F generated by testing d As a deep convolution feature of the character candidate region.
The method for acquiring the fusion characteristics comprises the following steps:
weights ω Using trained convolutional neural network model d As a deep convolution feature F d The initial fusion weight value of (1);
for MSSH feature F m Predicting initial fusion weight value omega of logistic regression model m And reducing the size of the characteristic dimension, wherein the specific process can be represented by the following formula:
wherein,representing MSSH features after dimensionality reduction, e i Representing an ith character candidate region; function f τ () Representing a logistic regression model representing a small data set used to train the feature initial weight values;
generating fusion feature F s Can be represented by the following formula:
wherein the function f μ () It is shown that a self-encoded network,and F d Remain dimensionally consistent.
In the fusion training process, when the verification error rate stops decreasing, the joint training process of the self-coding network is finished.
The specific method for merging character areas is as follows:
assuming that the character area is S, all S are calculated i C is the center point of S i
For any two character region s i ,s j E S, if the Euler distance between the two central points is less than a threshold value F, connecting a straight line l between the two central points i,
Calculating included angles alpha between all straight lines and the horizontal line, and taking the mode alpha of all included angles mode (ii) a With a remaining angle in the interval [ alpha ] mode -π/6,α mode +π/6]Removing the inner straight line and the other straight lines;
and combining character areas connected by straight lines to obtain a final character area.
Compared with the prior art, the invention has the following beneficial effects:
1. describing the character candidate area by using MSSH characteristics and depth convolution characteristics, wherein the MSSH characteristics are based on edge images and have strong robustness on low resolution, picture rotation, affine deformation and multi-language multi-font change; manual intervention is not needed in the deep convolution feature construction process, the appearance attribute of the character candidate area is strongly described, the overall appearance change of the picture is not large in the low-resolution, picture rotation and illumination change processes, and the robustness is also strong;
(2) The self-coding network used by the invention does not need manual intervention, can automatically fuse MSSH characteristics and deep convolution characteristics, and the generated fusion characteristics can integrate the advantages of all characteristics and have strong robustness on low resolution, picture rotation, affine deformation and complex background.
(3) The method for detecting the characters in the natural scene has high efficiency, has low computational algorithm complexity, and can quickly complete the character detection process.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of the computation of the deep convolution feature of FIG. 1;
FIG. 3 is a flow chart of feature fusion in FIG. 1;
FIG. 4 is a diagram of a text to be detected;
FIG. 5 is a picture of the character candidate region from FIG. 4 filtered by the extremum region;
FIG. 6 is the character region from FIG. 5 after feature fusion;
fig. 7 is a view showing a character area obtained by combining the character areas shown in fig. 6.
Detailed Description
The invention provides a character detection method, which comprises the steps of obtaining a character candidate region by extracting and filtering an extreme value region, further screening out a character region from the character candidate region through MSSH (minimum shift keying) feature and depth convolution feature fusion, and finally obtaining a character region through character region combination. The detection method has the advantages of strong robustness and high detection efficiency, and can quickly complete the character detection task.
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
As shown in fig. 1, which is a flow chart of the present invention, the method of the present invention specifically includes the following steps:
the method comprises the following steps: inputting a character picture to be detected, and extracting an extreme value area of the character picture to be detected;
firstly, converting input RGB color image into gray-scale image I gray R-value diagram I of red component diagram R Green component map G value map I G And blue component map B value map I B
Secondly, respectively to I R ,I G ,I B Obtaining an extremum region A R ,A G ,A B The extremum region refers to a region where the pixel value at the outer boundary of the region is strictly larger than the pixel value in the region, and is represented by an R-value graph I R For example, the extremum region A R Can be defined as:
in which I R (p) and I R (q) each represents I R The values of the middle pixel points p and q, theta represents the threshold of the extremum region,representation and extremum area A R Adjacent to but not belonging to the extremum region A R A set of pixels of (a);
then, the area S, the perimeter C, the Euler number E and the pixel value variance H of each extremum region are calculated, wherein the pixel value variance H is obtained through the gray-scale image I gray Calculated, the calculation formula is as follows:
wherein x represents a pixel point, I gray (x) Expressing the gray value of the pixel point x, a and b are respectively the color interval with the maximum number of pixels in the extreme value area and the color interval with the multiple number of pixels, n a And n b Respectively representing the number of pixels in the color regions a and b, R a And R b Respectively representing the sets of pixels, μ, in the extremum regions in the color intervals a and b a And mu b Which represent the average values of the pixel values in the color intervals a and b, respectively, in the extremum region.
Step two: filtering the extreme value region to obtain a character candidate region;
redundant extremum regions are filtered through the area S, the perimeter C, the Euler number E and the pixel value variance H of each extremum region, and the rest of the filtered redundant extremum regions are character candidate regions. The filtration conditions were as follows:
wherein S is 0 ,C 0 ,E 0 ,H 0 Are thresholds statistically derived from a large number of character and non-character regions. S 0 Threshold representing the area S of the extremal region S 0 The specific values are within the interval [80, 120; c 0 Threshold value representing the perimeter of the extremum region, C 0 The specific values are within the interval [30, 50; e 0 Threshold value representing the Euler number of the extremum region, E 0 The specific value is in the interval [0,1 ]]Internal; h 0 Threshold, H, representing the variance of the pixel values of the extremum region 0 The specific value is in the interval of [100,200 ]]And (4) the following steps.
Fig. 4 is an inputted text picture to be detected, and as shown in fig. 5, the text picture is a picture of the character candidate region obtained from fig. 4 after the extreme value region filtering.
Step three: calculating MSSH characteristics and depth convolution characteristics;
the specific method for calculating the MSSH characteristics is as follows:
acquiring a stroke pixel pair and a stroke line segment of a character candidate area;
obtaining stroke pixel pairs and stroke line segments of the character candidate area through an SWT algorithm, and the steps are as follows:
(1) Outputting an edge image by using a Canny edge detection operator;
(2) Calculating the gradient direction of a certain pixel p on the stroke edge image;
(3) Following the ray r determined by the gradient direction until the ray meets another stroke edge pixel point q;
(4) The stroke pixel pair is defined as { p, q }, and the stroke line segment is defined as the distance of ray r between pixel points p and q.
The Canny edge detection algorithm has the following steps:
(1) Converting the character candidate area into a gray scale map;
(2) Performing Gaussian filtering on the obtained gray level image;
(3) Calculating the amplitude and direction of the gradient;
(4) Carrying out non-maximum suppression on the gradient amplitude;
(5) Edges are detected and connected using a dual threshold algorithm.
Calculating the symmetrical feature description value of a certain stroke pixel pair in the character candidate region on the gray value and gradient attributes;
assuming { p, q } is a certain stroke pixel pair in the character candidate region, the calculation of the symmetric attribute description value based on the stroke pixel pair is as follows:
(1) The characteristic value F of the stroke pixel pair { p, q } on the gray value and gradient size attributes is calculated by the following formula j (p,q) 1
F j (p,q) 1 =f h (|I j (p)-I j (q)|)if∈{V,G m }
Wherein, I j (p) value, I, of pixel p on symmetry property j j (q) representing the value of the pixel point q on the symmetric attribute j; j represents a specific type of symmetry property; (ii) a { V, G m Denotes the gray value and gradient magnitude attributes, respectively, function f h () Representing a histogram statistical operation.
(2) We calculate the eigenvalue F of the stroked pixel pair { p, q } on the gradient direction property by the following formula j (p,q) 2
F j (p,q) 2 =f h (cos<I j (p), j (q)>)j=G o
Wherein, G o Refers to the property of gradient direction, cos<&gt represents an inverse cosine function, function f h () Representing a histogram statistical operation.
Calculating the symmetrical feature description of all stroke line segments in the character candidate region on stroke width values, stroke sequence value distribution and low-frequency mode attributes;
assuming that s represents a set of stroked line segments within a character candidate region, the calculation of the symmetry-attribute-describing value based on the set of stroked line segments is as follows:
(1) Calculating the characteristic value F of the stroke pixel pair on the attribute of the gradient direction by the following formula j (s):
F j (s)=f h (f ξ (s,j))j∈{Sw,Md,Pa}
Wherein the function f h () Representing histogram statistical operation, and { Sw, md, pa } representing symmetry-of-class attributes, including stroke width value Sw, stroke sequence value distribution Md and low-frequency mode attribute Pa, where function f ξ (s, j) can be defined as:
wherein | | | refers to the Euclidean distance, D s And M s Respectively representing the gray value variance and the mean value of the pixels contained in the stroke line segment set s,representing Haar wavelet transform, k represents the number of wavelet transform layers, n l Representing the highest scale layer number, ω k Are predefined weight parameters. Wherein n is l Is 1. When k =0, ω k The specific numerical value of (a) is 0.1; when k =1, ω k The specific numerical value of (a) is 0.3; when k =2, ω k The specific value of (3) is 0.5.
(2) And scaling the attribute values of a certain character candidate region to be between 0 and 1 in an equal proportion according to the stroke width value, the stroke sequence value distribution and the symmetrical characteristic description value on the low-frequency mode attribute.
Connecting the characteristic values of different symmetric attributes to form MSSH characteristics, wherein the specific formula is as follows:
F m (e i )=[F j |=V,G m ,o,w,Md,a]
wherein: f m (e i ) Values representing MSSH feature vectors; []Representing a vector join operation; e.g. of the type i Representing the ith character candidate area; f j Representing symmetric propertiesThe corresponding feature vector; j represents a specific type of symmetry property; v represents a gray value; g m Representing a gradient magnitude attribute; g o Representing a gradient direction property; sw represents a stroke width value; md represents the stroke sequence value distribution; pa denotes a low frequency mode attribute.
As shown in fig. 2, it is a flowchart for calculating depth convolution features, and the method for calculating depth convolution features is as follows:
adjusting the size of the character candidate area to 64 x 64 pixel values;
constructing a convolutional neural network model comprising three stages;
the first-stage construction method comprises the following steps:
in the first stage, two convolutional layers and a maximum pooling layer are sequentially used, wherein the convolutional layers all adopt 32 convolutional kernels with the size of 3 × 3, 1 pixel is a displacement offset, and the convolutional layers and the character candidate area are subjected to convolution operation, and the specific formula is as follows:
wherein g (a, b, k) represents the value of the pixel value of the line a and the column b in the character candidate area after the k convolution operation; e.g. of the type i (a + m, b + n) represents the (a + m) th row and (b + n) th column pixel value in the ith character candidate region; m represents the row offset of the pixel, n represents the column offset of the pixel, and the value set is { -1,0,1}, h k Represents the kth convolution kernel; after each convolution layer is operated, a nonlinear activation function is used for calculating an activation value, and the specific formula is as follows:
f(a,b,k)=max(0,g(a,b,k))
f (a, b, k) represents an activation value of a line a and a column b pixel values in the character candidate area after the k convolution operation; max () represents a take large value function;
the activation value is then transmitted to a maximum pooling layer, which takes 2 pixels as a stride and takes the maximum value in a 2 × 2 spatial neighborhood as an output value;
the structure of the second stage is the same as that of the first stage;
the three-stage sequence uses three convolutional layers, a maximum pooling layer and a full-link layer, wherein the full-link layer connects the output of the maximum pooling layer into a one-dimensional vector as input, and controls the output to be 128-dimensional, and the formula can be expressed as follows:
F d =W·X+B
wherein: f d For the generated 128-dimensional depth convolution characteristics, X is a one-dimensional vector obtained after connecting the outputs of the maximum pooling layers, W is a weight matrix, and B is an offset vector;
the model is used in two processes, namely a training process and a testing process. Wherein the training process is used to determine the unknown parameter h k W and B, the test process is used for generating the deep convolution characteristic F of the character candidate area d
In the training process, each character candidate region for training is given a label. When the label is 0, the character candidate area is not the character area; when the label is 1, it indicates that the character candidate region is a character region. Deep convolution feature F d Will be connected to the two-dimensional label vector by full connection, with values of 0 and 1, respectively. In the training process, when the label value predicted by the neural model for the character candidate area is not changed any more, the training is finished. The result h at the end of training k W and b are each independently h k And W and b are fixed values.
During the test, F generated by convolution neural network in two, two and three stages d Will be the depth convolution characteristic of the character candidate region.
The output of the maximum pooling layer in the third stage is a 128-dimensional deep convolution feature F d This feature will be connected to the fully-connected layer when the convolutional neural network is trained. The full connection layer will output whether the character candidate area is a text or non-text label.
Step four: fusing MSSH characteristics and deep convolution characteristics through a self-coding neural network to obtain fused characteristics;
as shown in fig. 3, is a flow chart of feature fusion, comprising the following steps:
first, in the fusion process, the weight ω of the trained convolutional neural network model is used d As a deep convolution feature F d The initial fusion weight value of (1);
then, for MSSH feature F m Predicting the initial fusion weight value omega by using a logistic regression model m And reducing the size of the characteristic dimension, and the specific process can be represented by the following formula:
wherein the function f τ () A logistic regression model is represented as a model of,representing the MSSH features after dimensionality reduction, and representing a small data set used for training initial weight values of the features.
Finally, based on self-coding network, MSSH characteristic and deep convolution characteristic are fused to generate fusion characteristic F s Can be represented by the following formula:
wherein the function f μ () It is shown that a self-encoded network,the MSSH characteristics after the dimensionality reduction are shown,and F d Remain dimensionally consistent.
In the fusion training process, when the verification error rate stops decreasing, the joint training process of the self-coding network is finished.
Step five: further screening out a character region from the character candidate region according to the fusion characteristics;
and inputting the fusion characteristics of the character candidate region into a pre-trained logistic regression classifier, and judging whether the character candidate region is a real character region or not.
The training steps of the logistic regression classifier are as follows:
(1) And taking an ICDAR 2013scene data set of the universal scene character detection data set, calculating the fusion characteristics of all candidate character areas of the data set according to the steps, and taking the fusion characteristics as a training set.
(2) Inputting the training set into a logistic regression algorithm to carry out two-classification problem training.
As shown in fig. 6, the character region obtained from fig. 5 is subjected to feature fusion.
Step six: and merging all the character areas to obtain a final character area.
First, for a character region S, all S are calculated i C is the center point of S i
Second, for arbitrary character region s i ,s j E S if the center point c i And c j The Euler distance therebetween is less than the threshold value F, then at the center point c i And c j Is connected with a straight line l i,j (ii) a Preferably, F is 5.
Then, calculating included angles alpha between all straight lines l and the horizontal line, and taking the mode alpha of all included angles mode . Keeping the included angle in the interval [ alpha ] mode -π/6,α mode +π/6]The inner straight line and the rest straight lines are removed. As shown in fig. 7, the character area obtained in fig. 6 is merged with the character area.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (9)

1. A character detection method is characterized by comprising the following steps:
extracting an extreme value area of the character picture to be detected, and filtering the extreme value area to obtain a character candidate area;
calculating MSSH characteristics and depth convolution characteristics, and fusing the MSSH characteristics and the depth convolution characteristics through a self-coding neural network to obtain fused characteristics;
further screening out a character region from the character candidate region according to the fusion characteristic;
and merging all the character areas to obtain a final character area.
2. The text detection method according to claim 1, wherein the specific method for extracting the extremum region is as follows:
converting the character picture to be detected into a gray level image I gray R value graph I R G-value graph I G And B-value graph I B
Are respectively to I R ,I G ,I B Obtaining an extremum region specifically as follows:
r value graph I R Extreme value region A of R Is defined as:
wherein I R (p) representing the value of pixel p in the R-value graph; i is R (q) representing the value of a pixel point q in the R value graph; theta represents a threshold value of the extremum region;representation and extremum region A R Adjacent to but not belonging to the extremum region A R A set of pixels of (a);
g-value graph I G Extreme value region A of G Is defined as:
wherein I G (p) representing the value of pixel point p in the G-value graph; I.C. A G (q) representing the value of a pixel point q in the G-value graph; theta meterThe threshold value of the extreme value region is shown,representation and extremum region A G Adjacent but not belonging to the extremum region A G A set of pixels of (a);
b-value graph I B Extreme value region A of B Is defined as:
wherein I B (p) representing the value of pixel p in the B-value map; i is B (q) representing the value of a pixel point q in the B-value graph; theta denotes the threshold value of the extremum region,representation and extremum region A B Adjacent to but not belonging to the extremum region A B The set of pixels of (1).
3. The character detection method of claim 2, wherein the character candidate regions are obtained by:
calculating the area S, the perimeter C, the Euler number E and the pixel value variance H of each extreme value region, wherein the pixel value variance H is obtained through a gray level image I gray Calculated, the calculation formula is as follows:
wherein: x represents a pixel point; I.C. A gray (x) Representing the gray value of the pixel point x; a represents a color interval with the maximum number of pixels in the extreme value area; b represents a color interval with a plurality of pixels in the extreme value area; n is a Representing the number of pixels in the color interval a in the extremum region; n is b Representing the number of pixels in the color interval b in the extremum region; r a Representing a set of pixels in the color interval a in the extremum region; r b Representing positions in the extremum regionA set of pixels in color bin b; mu.s a Representing an average value of pixel values in the color interval a in the extremum region; mu.s b Representing an average value of pixel values in the color interval b in the extremum region;
redundant extremum regions are filtered through the area S, the perimeter C, the Euler number E and the pixel value variance H of each extremum region, and the rest regions after the redundant extremum regions are filtered are character candidate regions, wherein the filtering conditions are as follows:
wherein S is 0 A threshold value representing an area S of the extremum region; c 0 A threshold representing the perimeter of the extremum region; e 0 A threshold value representing an extreme value region Euler number; h 0 A threshold value representing the variance of the extremum region pixel values.
4. The text detection method of claim 1, wherein the specific method for calculating the MSSH features is as follows:
acquiring a stroke pixel pair and a stroke line segment of a character candidate area;
calculating the symmetrical feature description value of a certain stroke pixel pair in the character candidate region on the gray value and gradient attribute;
calculating the symmetrical feature description of all stroke line segments in the character candidate region on stroke width values, stroke sequence value distribution and low-frequency mode attributes;
connecting the characteristic values of different symmetric attributes to form MSSH characteristics, wherein the specific formula is as follows:
F m (e i )=[F j |j=V,G m ,G o ,Sw,Md,Pa]
wherein: f m (e i ) A value representing an MSSH feature vector; []Representing a vector join operation; e.g. of the type i Representing an ith character candidate region; f j Representing a feature vector corresponding to the symmetric attribute; j represents a specific type of symmetry property; v represents a gray value; g m Indicating a large gradientA small attribute; g o Representing a gradient direction property; sw represents the stroke width value; md represents the stroke sequence value distribution; and a represents a low frequency mode attribute.
5. The text detection method of claim 4, wherein the specific method for obtaining the stroke pixel pairs and the stroke line segments of the character candidate area is as follows:
outputting an edge image by using a Canny edge detection operator;
calculating the gradient direction of a certain pixel point p on the stroke edge image;
following the ray r determined by the gradient direction until the ray meets another stroke edge pixel point q;
the stroke pixel pair is defined as { p, q }, and the stroke line segment is defined as the distance of ray r between pixel points p and q.
6. The text detection method of claim 4, wherein the specific method for calculating the deep convolution features is as follows:
adjusting the size of the character candidate area to 64 x 64 pixel values;
constructing a convolutional neural network model comprising three stages;
the first-stage construction method comprises the following steps:
in the first stage, two convolution layers and a maximum pooling layer are sequentially used, wherein the convolution layers all adopt 32 convolution kernels with the size of 3 × 3, 1 pixel is displacement offset, and convolution operation is carried out on the convolution kernels and the character candidate area, and the specific formula is as follows:
wherein g (a, b, k) represents the value of the pixel value of the line a and the column b in the character candidate area after the k convolution operation; ei (a + m, b + n) represents the (a + m) th row and (b + n) th column pixel value in the ith character candidate region; m represents the row offset of the pixel, n represents the column offset of the pixel, and the value set is { -1,0,1}, h k Represents the k < th >A convolution kernel; after each convolution layer is operated, a nonlinear activation function is used for calculating an activation value, and the specific formula is as follows:
f(a,b,k)=max(0,g(a,b,k))
f (a, b, k) represents an activation value of a line a and a column b pixel values in the character candidate area after the k convolution operation; max () represents a take large value function;
the activation value is then transmitted to a maximum pooling layer, which takes 2 pixels as a stride and takes the maximum value in a 2 × 2 spatial neighborhood as an output value;
the architecture of the second stage is the same as that of the first stage;
the three-stage sequence uses three convolutional layers, a maximum pooling layer and a full-link layer, wherein the full-link layer connects the output of the maximum pooling layer into a one-dimensional vector as input, and controls the output to be 128-dimensional, and the formula can be expressed as follows:
F d =W·X+B
wherein: f d For the generated 128-dimensional depth convolution characteristics, X is a one-dimensional vector obtained after connecting the output of the maximum pooling layer, W is a weight matrix, and B is an offset vector;
training and testing the convolution neural network model, and determining the unknown parameter h through training k W and B, F generated by testing d As a deep convolution feature of the character candidate region.
7. The text detection method according to claim 6, wherein the method for obtaining the fusion feature comprises:
weights ω Using trained convolutional neural network model d As a deep convolution feature F d The initial fusion weight value of (1);
for MSSH feature F m Predicting initial fusion weight value omega of logistic regression model m And reducing the size of the characteristic dimension, and the specific process can be represented by the following formula:
wherein,representing MSSH features after dimensionality reduction, e i Representing an ith character candidate region; function f τ () Representing a logistic regression model, D representing a small data set used to train the feature initial weight values;
generating fusion feature F S Can be represented by the following formula:
wherein the function f μ () A self-encoding network is represented that,and F d Remain consistent in dimension.
8. The text detection method of claim 7, wherein in the fusion training process, when the verification error rate stops decreasing, the joint training process of the self-coding network is ended.
9. The character detection method of claim 1, wherein the specific method for merging character areas is as follows:
assuming that the character area is S, all S are calculated i C is the center point of S i
For any two character region s i ,s j E S, if the Euler distance between the two central points is less than the threshold value F, connecting a straight line l between the two central points i,i
Calculating included angles alpha of all straight lines and horizontal lines, and taking the mode alpha of all included angles mode (ii) a With a remaining angle in the interval [ alpha ] mode -π/6,α mode +π/6]The inner straight line and the rest straight lines are removed;
and combining character areas connected by straight lines to obtain a final character area.
CN201711267804.8A 2017-12-05 2017-12-05 A kind of character detecting method Pending CN108038486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711267804.8A CN108038486A (en) 2017-12-05 2017-12-05 A kind of character detecting method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711267804.8A CN108038486A (en) 2017-12-05 2017-12-05 A kind of character detecting method

Publications (1)

Publication Number Publication Date
CN108038486A true CN108038486A (en) 2018-05-15

Family

ID=62095092

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711267804.8A Pending CN108038486A (en) 2017-12-05 2017-12-05 A kind of character detecting method

Country Status (1)

Country Link
CN (1) CN108038486A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN108961218A (en) * 2018-06-11 2018-12-07 无锡维胜威信息科技有限公司 Solar power silicon platelet spends extracting method
CN109409224A (en) * 2018-09-21 2019-03-01 河海大学 A kind of method of natural scene fire defector
CN110188622A (en) * 2019-05-09 2019-08-30 新华三信息安全技术有限公司 A kind of text location method, apparatus and electronic equipment
CN110909728A (en) * 2019-12-03 2020-03-24 中国太平洋保险(集团)股份有限公司 Control algorithm and device for multilingual policy automatic identification
CN112926497A (en) * 2021-03-20 2021-06-08 杭州知存智能科技有限公司 Face recognition living body detection method and device based on multi-channel data feature fusion
CN113807351A (en) * 2021-09-18 2021-12-17 京东鲲鹏(江苏)科技有限公司 Scene character detection method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066972A (en) * 2017-04-17 2017-08-18 武汉理工大学 Natural scene Method for text detection based on multichannel extremal region

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066972A (en) * 2017-04-17 2017-08-18 武汉理工大学 Natural scene Method for text detection based on multichannel extremal region

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIRUI WU 等: "A Robust Symmetry-based Method for Scene/Video Text Detection Through Neural Network", 《2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN108961218A (en) * 2018-06-11 2018-12-07 无锡维胜威信息科技有限公司 Solar power silicon platelet spends extracting method
CN108961218B (en) * 2018-06-11 2021-07-02 无锡维胜威信息科技有限公司 Solar silicon wafer crystal flower extraction method
CN109409224A (en) * 2018-09-21 2019-03-01 河海大学 A kind of method of natural scene fire defector
CN109409224B (en) * 2018-09-21 2023-09-05 河海大学 Method for detecting flame in natural scene
CN110188622A (en) * 2019-05-09 2019-08-30 新华三信息安全技术有限公司 A kind of text location method, apparatus and electronic equipment
CN110188622B (en) * 2019-05-09 2021-08-06 新华三信息安全技术有限公司 Character positioning method and device and electronic equipment
CN110909728A (en) * 2019-12-03 2020-03-24 中国太平洋保险(集团)股份有限公司 Control algorithm and device for multilingual policy automatic identification
CN112926497A (en) * 2021-03-20 2021-06-08 杭州知存智能科技有限公司 Face recognition living body detection method and device based on multi-channel data feature fusion
CN113807351A (en) * 2021-09-18 2021-12-17 京东鲲鹏(江苏)科技有限公司 Scene character detection method and device
CN113807351B (en) * 2021-09-18 2024-01-16 京东鲲鹏(江苏)科技有限公司 Scene text detection method and device

Similar Documents

Publication Publication Date Title
CN108038486A (en) A kind of character detecting method
CN106845478B (en) A kind of secondary licence plate recognition method and device of character confidence level
CN112884064B (en) Target detection and identification method based on neural network
Jiang et al. A deep learning approach for fast detection and classification of concrete damage
WO2020062433A1 (en) Neural network model training method and method for detecting universal grounding wire
CN111680690B (en) Character recognition method and device
CN112990077B (en) Face action unit identification method and device based on joint learning and optical flow estimation
CN111401384A (en) Transformer equipment defect image matching method
CN107506765B (en) License plate inclination correction method based on neural network
CN111709980A (en) Multi-scale image registration method and device based on deep learning
CN111860439A (en) Unmanned aerial vehicle inspection image defect detection method, system and equipment
CN112330593A (en) Building surface crack detection method based on deep learning network
Dong et al. Infrared image colorization using a s-shape network
CN114155527A (en) Scene text recognition method and device
US20220366682A1 (en) Computer-implemented arrangements for processing image having article of interest
CN110751154B (en) Complex environment multi-shape text detection method based on pixel-level segmentation
CN107273870A (en) The pedestrian position detection method of integrating context information under a kind of monitoring scene
CN116279592A (en) Method for dividing travelable area of unmanned logistics vehicle
CN114444565B (en) Image tampering detection method, terminal equipment and storage medium
CN112446292B (en) 2D image salient object detection method and system
Majumder et al. A tale of a deep learning approach to image forgery detection
Khan et al. Lrdnet: lightweight lidar aided cascaded feature pools for free road space detection
Barodi et al. An enhanced artificial intelligence-based approach applied to vehicular traffic signs detection and road safety enhancement
CN111753714B (en) Multidirectional natural scene text detection method based on character segmentation
CN117392419A (en) Drug picture similarity comparison method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180515

RJ01 Rejection of invention patent application after publication