CN105574513B - Character detecting method and device - Google Patents
Character detecting method and device Download PDFInfo
- Publication number
- CN105574513B CN105574513B CN201510970839.2A CN201510970839A CN105574513B CN 105574513 B CN105574513 B CN 105574513B CN 201510970839 A CN201510970839 A CN 201510970839A CN 105574513 B CN105574513 B CN 105574513B
- Authority
- CN
- China
- Prior art keywords
- image
- character area
- detected
- probability graph
- mask
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/26—Techniques for post-processing, e.g. correcting the recognition result
- G06V30/262—Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
- G06V30/274—Syntactic or semantic context, e.g. balancing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of character detecting method and device.The character detecting method includes:Receive image to be detected;The character area probability graph of the full figure of described image to be detected is generated via semantic forecast model, wherein, the character area probability graph distinguishes the character area of described image to be detected and the non-legible region of described image to be detected using different pixel values;The semantic forecast model is neutral net;And cutting operation is carried out to the character area probability graph, to determine the character area.Above-mentioned character detecting method and device while the interference of complex background is effectively suppressed, can detect the word of different language, direction, color, font and size, wide adaptation range.In addition, the character detecting method and device have the characteristics of strong robustness, can successfully manage picture noise, image be fuzzy, in image the factor such as complex background, inhomogeneous illumination interference.
Description
Technical field
The present invention relates to image processing field, and in particular to a kind of character detecting method and device.
Background technology
Developed rapidly with the widely available and mobile Internet of smart mobile phone, pass through the shooting of the mobile terminals such as mobile phone
Head obtains, retrieves and share information and progressively turn into a kind of life style.(Camera-based's) based on camera should
With the understanding more emphasized to photographed scene.Generally, often more paid close attention to first in word and other objects and the scene deposited, user
Text information in scene, thus the word in correct identification image user is shot be intended to have deeper into understanding.This is just
Text detection technology be relate to identify the character area in shooting image.
The text detection basic technology important as one, there is huge application value and wide application prospect, it is special
It is not the text detection of natural scene image.For example, the text detection technology of natural scene image may be directly applied to enhancing now
The fields such as reality, geo-location, man-machine interaction, robot navigation, autonomous driving vehicle and industrial automation.
However, include more complicated background mostly in image to be detected, and its quality may be by noise, fuzzy, non-equal
The influence of the factors such as even illumination;In addition, word has diversity, such as, the word in natural scene image may have difference
Color, size, font and direction etc..These factors can all bring huge difficulty and challenge to text detection.Based on above-mentioned
Reason, existing character detecting method easily produce false-alarm (false alarm), also i.e. by the non-legible composition mistake in background
Ground is determined as word.In addition, existing character detecting method in terms of adaptability there is also weak point, for example, major part side
Method can only detection level direction word, it is then helpless for the word that tilts or rotates.In another example some methods are merely able to
Detected applied to Chinese, can not directly be generalized to the word of different classes of language (such as English, Russian, Korean).And when figure
When serious noise, fuzzy or inhomogeneous illumination as in be present, existing character detecting method often produces mistake again.Always
It, existing character detecting method and system are in precision and the scope of application etc. existing defects.
The content of the invention
In view of the above problems, it is proposed that the present invention is to provide a kind of text detection to solve the above problems at least in part
Method and apparatus.
According to one aspect of the invention, there is provided a kind of character detecting method, including:
Receive the markup information of multiple sample images and the sample image;According to the sample image and the sample graph
The markup information of picture generates the mask figure of the sample image;Utilize the sample image and mask figure training nerve net
Network, to obtain semantic forecast model;Receive image to be detected;Described image to be detected is generated via the semantic forecast model
The character area probability graph of full figure, wherein, the character area probability graph distinguishes the mapping to be checked using different pixel values
The non-legible region of the character area of picture and described image to be detected;And segmentation behaviour is carried out to the character area probability graph
Make, to determine the character area.
According to a further aspect of the invention, a kind of text detection device, including training module, semantic module are additionally provided
With segmentation module.Training module is used for the markup information for receiving multiple sample images and the sample image, according to the sample
The markup information of image and the sample image generates the mask figure of the sample image, and utilizes the sample image and institute
Mask figure training neutral net is stated, to obtain semantic forecast model.Semantic module is used to receive image to be detected, and uses
The semantic forecast model to generate the character area probability graph of the full figure of described image to be detected, wherein, the character area
Probability graph distinguishes the character area of described image to be detected and the non-legible area of described image to be detected using different pixel values
Domain.Split module to be used to carry out cutting operation to the character area probability graph, to determine the character area.
In above-mentioned character detecting method and device, support directly to carry out the full figure of image to be detected text detection, it is different
In split based on simple threshold values, the algorithm of sliding window or connected component.It can effectively suppress the same of the interference of complex background
When, detection different language, direction, color, the word of font and size, wide adaptation range.In addition, the character detecting method and dress
The characteristics of putting with strong robustness, picture noise can be successfully managed, image obscures, complex background, inhomogeneous illumination in image
Etc. the interference of factor.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the embodiment of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area
Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 a and Fig. 1 b schematically illustrate image to be detected according to an embodiment of the invention and after testing respectively
Image;
Fig. 2 schematically illustrates the flow chart of character detecting method according to an embodiment of the invention;
Fig. 3 a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b schematically illustrate according to this hair respectively
The character area probability graph of the full figure of image to be detected of bright embodiment and its corresponding generation.
Fig. 7 schematically illustrates the flow chart of the method for acquisition image to be detected according to an embodiment of the invention;
Fig. 8 is schematically illustrated according to an embodiment of the invention carries out cutting operation to character area probability graph
The flow chart of method;
Fig. 9 schematically illustrates the flow chart of the method for training neutral net according to an embodiment of the invention;
Figure 10 a, Figure 10 b, Figure 10 c and Figure 10 d, which are respectively illustrated, according to an embodiment of the invention has markup information
Sample image;
Figure 11 a and Figure 11 b respectively illustrate the sample image according to an embodiment of the invention with markup information and
Its corresponding mask artwork;
Figure 12 schematically illustrates the schematic diagram of full convolutional neural networks according to an embodiment of the invention;
Figure 13 schematically illustrates the schematic block diagram of text detection device according to an embodiment of the invention;
Figure 14 schematically illustrates the schematic block diagram of text detection device in accordance with another embodiment of the present invention;With
And
Figure 15 schematically illustrates the schematic block diagram of text detection system according to an embodiment of the invention.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Completely it is communicated to those skilled in the art.
For character area in more reasonably automatic identification image, the invention provides a kind of character detecting method.Fig. 1 a
Image to be detected according to an embodiment of the invention and after testing image are schematically illustrated respectively with Fig. 1 b.Fig. 2 is shown
The flow chart of character detecting method 200 according to an embodiment of the invention.As shown in Fig. 2 this method 200 includes step
S210 to step S230.
In step S210, image to be detected is received.Image to be detected can be original image or to original graph
As the image obtained after being pre-processed.In one embodiment of the invention, can be by entering to the original image collected
Row pretreatment obtains described image to be detected.Carried out below in conjunction with the method that specific accompanying drawing pre-processes to described image detailed
Description.
In step S220, the character area probability of the full figure of described image to be detected is generated via semantic forecast model
Figure, wherein, the character area probability graph distinguishes the character area of described image to be detected and described using different pixel value
The non-legible region of image to be detected.According to one embodiment of present invention, character area refers to the area that word is included in image
Domain.By taking Fig. 1 a and Fig. 1 b as an example, the region in Fig. 1 b inside two black quadrangles is character area.In first character area
In, comprising word " I is growing ", in second character area, include word " me please not stepped on ".
In one embodiment, character area probability graph represents that different probability is described to distinguish using different pixel values
The non-legible region of the character area of image to be detected and described image to be detected.In one embodiment, the pixel value of image
The probability that higher expression pixel region belongs to character area is higher, and the pixel value of image is more low, represents pixel place
The probability that region belongs to character area is lower.Such as the black picture element that pixel value is 0 represents that the pixel region belongs to word
The probability in region is 0, and the white pixel that pixel value is 255 represents that the probability that the pixel region belongs to character area is
100%.
According to one embodiment of present invention, the full figure of image to be detected is general via semantic forecast model generation character area
Rate figure.Semantic forecast model is used for the semantic generation character area probability graph according to image to be detected, to predict image to be detected
In pixel belong to character area and still fall within non-legible region.Image, semantic is the high-level characteristic of image, although it is with image
Color, texture, based on shape etc. low-level image feature, it is but dramatically different with these low-level image features.Image, semantic, which is used as, to be known
Know information basic description carrier, complete picture material can be converted into can intuitivism apprehension class text language performance, scheming
As playing vital effect in understanding.Image understanding input is view data, and output is knowledge, and it belongs to image and ground
Study carefully the High-level content in field.Semantic forecast model can realize image understanding, and it directly can identify image according to image, semantic
In character area, this is dramatically different with each model based on Threshold segmentation image.It is right that semantic forecast model can be based on its
The understanding of image to be detected, it is to be checked so as to predict according to the semanteme of image to be detected, generation more preferably character area probability graph
Pixel in altimetric image belongs to character area and still falls within non-legible region, to obtain more reasonably character area.
The semantic forecast model can be by training neutral net to obtain.Neutral net can be used for according to substantial amounts of input
Estimate the unknown approximate function of in general.Neutral net can machine learning, there is stronger self-adaptive property.Trained god
An arbitrary function can be approached through network, it can be from given data " study ".Thus, neutral net be highly suitable for by
Training is used as semantic forecast model, to identify the character area in image to be detected.Hereinafter in connection with Fig. 9 to Figure 12 to instruction
Practice neutral net acquisition semantic forecast model to be described in detail.
Fig. 3 a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b are according to an embodiment of the invention respectively
The full figure of image to be detected and the corresponding character area probability graph generated via semantic forecast model.Fig. 3 a, Fig. 4 a, Fig. 5 a and
Fig. 6 a can be the full figure of image to be detected, also, contain character area on image, for example, in the character area on Fig. 3 a
Comprising be Chinese;The character area included on Fig. 4 a images includes Chinese and English, and as shown in fig. 4 a, the text in Fig. 4 a
The direction in block domain is non-horizontal;Character area on Fig. 5 a image includes Russian;The character area of Fig. 6 a images includes Korean.
And it is possible to find out, Fig. 3 a, Fig. 4 a, Fig. 5 a and Fig. 6 a image have different backgrounds, and background is more complicated;On also,
The word stated in image also has diversity, such as these words have the information such as different colors, font, languages and size.
Fig. 3 b, Fig. 4 b, Fig. 5 b and Fig. 6 b are shown respectively Fig. 3 a, Fig. 4 a, Fig. 5 a and Fig. 6 a full figure of image to be detected and pass through language
The character area probability graph generated after adopted forecast model.The character area probability graph of generation is represented using different pixel values
Different probability is to distinguish the character area of image to be detected and non-legible region.For example, the pixel filling using pixel value 255
Character area, represent that the region belongs to the probability highest of character area, use the non-legible region (example of the pixel filling of pixel value 0
Such as, background area), represent the region belong to character area probability it is minimum, so as to distinguish the literal field in image to be detected
Domain and non-legible region.By taking Fig. 4 b as an example, Fig. 4 b character area probability graph distinguishes image to be detected using different pixel values
Character area and non-legible region in 4a.For example, using in the pixel filling image to be detected 4a for being 255 with pixel value
Two character areas, " unauthorized No Admittance " and " Authorized PersonnelOnly ", so as to obtain as shown in Figure 4 b
The character area probability graph gone out, also, Fig. 4 b character area probability graph is also shown in image to be detected 4a complete and accurate
Character area direction.
In step S230, the character area probability graph generated to step S220 carries out cutting operation, to determine word
Region.Because the numerical value of the pixel in character area probability graph can represent that the pixel region belongs to the general of character area
Rate, so as to distinguish character area and non-legible region, it is possible to according to low-level image feature (such as gray scale of image) to literal field
Domain probability is split.
For example, step S230 can be by carrying out binarization operation to obtain character area to character area probability graph.
In the present invention, due to it is expected to distinguish character area and non-legible region (background area), so can be real using binarization operation
The now purpose.Binarization operation realizes that simply amount of calculation is few and speed is fast.
Binarization operation can be Threshold segmentation operation.Alternatively, threshold value T is adjustable parameter.If gray value 255 represents
The probability for belonging to character area is 100%, and gray value 0 represents that the probability for belonging to character area is 0, then can set threshold value
For 128.
Binarization operation can also be the cutting operation increased based on region.Region growing methods are according to same object areas
The similar quality of pixel assembles the method for pixel in domain.Specifically, from prime area (for example, picture in character area probability graph
Element is worth larger pixel) start, by the adjacent pixel with same property (poor smaller with the pixel value of current pixel)
It is integrated into current region so as to progressively growth region, until without can be untill the pixel of merger.
It is considered that the less region of average pixel value is non-legible region in the image that is obtained after segmentation, other regions
For character area.Determine that the character area is described in detail to binarization operation below in conjunction with specific accompanying drawing.
It will appreciated by the skilled person that the above method 200 has universality.It can be used for any image
Text detection.This method 200 can be directed to file and picture and carry out text detection and identification, file and picture such as certificate and bill
The scanned copy etc. of photo, paper document.This method 200 can also be directed to natural scene image and carry out text detection and identification.
The above method 200 of the present invention has abandoned the detection mode based on sliding window and the detection side based on connected component
Formula, employ the brand-new detection mode based on semantic segmentation.This method 200 can realize that full figure is predicted, that is, input and export all
Be entire image, rather than regional area or window, therefore the contextual information in image can be better profited from, particularly from
Contextual information in right scene image, so as to obtain more accurately text detection result.
This method 200 can handle different scenes, the image of different quality.This method 200 can suppress complicated effective
While the interference of background, the word of detection different colours, font and size.This method 200 can be with automatic Prediction literal line
Direction, can directly in detection image different directions word.This method 200 is insensitive to the language belonging to word, Ke Yitong
When detect word corresponding to different classes of language (such as Chinese, English, Korean etc.).In addition, this method 200 has strong robustness
Feature, the interference of the factor such as noise, fuzzy, complex background, inhomogeneous illumination can be successfully managed.
Fig. 7 shows the flow chart according to an embodiment of the invention for obtaining described image to be detected.
In step S710, original image is received.In one embodiment, original image can have complicated background letter
Breath, its character area included can also have diversity, such as character area can include different colors, font, language
The text information of kind and size etc..
In step S720, the original image received is pre-processed, to obtain image to be detected.In an implementation
In example, the original image received can be subjected to dimension normalization, i.e., by the maximum dimension of original image (for example, original graph
The greater in the height and width of picture) pre-set dimension is zoomed to, the pre-set dimension can include 480,640,800 and 960
Pixel etc..Kept in the Aspect Ratio of image to be detected that dimension normalization operation obtains afterwards and the Aspect Ratio of original image
It is identical.
Fig. 8 is schematically illustrated according to an embodiment of the invention carries out cutting operation to character area probability graph
The flow chart of method.
In step S810, binarization operation is carried out to the character area probability graph of image to be detected.
It is appreciated that character area directly can be obtained according to the result of binarization operation.In the present invention, due to the phase
Hope and distinguish character area and non-legible region (background area), so the purpose can be achieved using binarization operation.Binaryzation
Operation realizes that simply amount of calculation is few and speed is fast.
Binarization operation can be Threshold segmentation operation.Alternatively, threshold value T is adjustable parameter.If gray value 255 represents
The probability for belonging to character area is 100%, and gray value 0 represents that the probability for belonging to character area is 0, then can set threshold value
For 128.
Binarization operation can also be the cutting operation increased based on region.Region growing methods are according to same object areas
The similar quality of pixel assembles the method for pixel in domain.Specifically, from prime area (for example, picture in character area probability graph
Element is worth larger pixel) start, by the adjacent pixel with same property (poor smaller with the pixel value of current pixel)
It is integrated into current region so as to progressively growth region, until without can be untill the pixel of merger
In the embodiment shown in fig. 8, after binarization operation, in addition to step S820 and step S830.
In step S820, the profile for each connected region that binarization operation is obtained is determined.The step can be used existing
Any edge detection methods of research and development have or following realizes, such as based on the various edges such as Sobel or Canny operators
Detection method.
It is quadrangle to determine the character area by the contour fitting of each connected region in step S830.One
In individual embodiment, the interior zone of all quadrangles can be used as character area.Specifically, it is assumed that the collection of all quadrangle compositions
It is combined into B, B={ bk, k=1,2 ... Q, wherein bkThe quadrangle that fitting obtains is represented, Q represents the number of quadrangle, and k is subscript.
Then set B is the result output of text detection.
The region that quadrangle surrounds can preferably include any direction, the word of language, and it is calculated simply.Such as
Shown in Fig. 6 b character area probability graph, the various reasons such as noise in image, image Chinese word shape may cause character area
Probability graph fails more preferably to represent the probability that pixel belongs to character area.Character area is fitted by using quadrilateral area,
It may further ensure that in character area and include whole word contents, so as to ensure the precision of text detection.
Fig. 9 schematically illustrates training neutral net according to an embodiment of the invention to obtain semantic forecast model
Method flow chart.The purpose of this method is that the model can be with effective district from sample image learning semantic forecast model
The character area divided in image to be detected and non-legible region.
Sample image is the image of known wherein character area., can be with as described above, neutral net has " study " ability
Neutral net is trained to obtain available semantic forecast model by using multiple sample images.In this embodiment, the training
Method enables semanteme of the semantic forecast model according to image to be detected, generation more accurately character area probability graph, so as to
Predict that the pixel in described image to be detected belongs to character area or non-legible region, so as to so that character detecting method
The accuracy of testing result is higher.
It will appreciated by the skilled person that for text detection system, the semantic forecast model can be pre-
It is first stored in wherein.
In step S910, multiple sample images and its markup information are received.
In one embodiment, can be gathered from separate sources largely the various images comprising word as sample image,
For example, natural scene image.It is expected that sample image species is abundant and number is more, to obtain preferable semantic forecast model.
In one embodiment, the number of sample image is no less than 1000.
Polygon can be used to mark all character areas in the sample image in each sample image, so as to obtain
Obtain the markup information of sample image.The ground literal unit of mark can be literal line or word.Character area in sample image
Markup information can be preserved in the form of polygon (for example, quadrangle).Specifically, in one embodiment, can only protect
Deposit the coordinate on four summits of quadrangle.Any direction, language can not only be met by preserving markup information with the shape of quadrangle
Word, and be easy to calculate.
Figure 10 a, Figure 10 b, Figure 10 c and Figure 10 d respectively illustrate having through mark according to an embodiment of the invention
The sample image of markup information.Go out as shown in these figures, quadrangle (light quadrangle in figure) mark sample graph can be used
Character area as in, and the tab area is applied to the direction of any font, languages and word.
In step S920, the mask figure of sample image is generated according to the sample image and its markup information.Specifically,
For sample image I and corresponding markup information a, one width of generation and sample image I mask figures of the same size.In an implementation
In example, the mask figure can include two-value mask figure R.In the two-value mask figure R, sample is distinguished using different pixel values
The character area of this image and non-legible region.In one embodiment, for sample image I, using with the first pixel value
The character area that is marked of pixel filling markup information, using the non-legible region of pixel filling with the second pixel value, from
And two-value mask figure R is generated, wherein, the first pixel value and the second pixel value are different, to distinguish the character area and non-legible
Region.For example, in two-value mask figure R, the picture of the character area (namely the interior zone marked using quadrangle) marked
Plain value is filled to be 255, rather than the pixel value of character area is filled to be 0.
Figure 11 a and Figure 11 b respectively illustrate the sample with markup information according to an embodiment of the invention through mark
This image mask figure corresponding with its.As shown in fig. 11a, using quadrangle by the word segment of original sample image (for example, " sea
Shallow lake construction security ", " Haidian Middle St ", " HAIDIANZHONGJIE ", " Haidian South Road ") mark out and, and accordingly generate Figure 11 b
Shown in mask artwork.Wherein, the word segment come using being marked out with the pixel filling that pixel value is 255, uses pixel
It is worth the non-legible part of pixel filling for 0, so as to obtain the mask figure shown in Figure 11 b.
In step S930, training set is built using sample image and its mask figure, and trains neutral net, to obtain language
Adopted forecast model M.Original sample image mask figure composing training sample set S corresponding with its.S={ (Ii,Ri), i=1,
2 ..., N, wherein IiRepresent original sample image, RiFor original sample image IiCorresponding mask figure, N are training sample
Collect the number of sample image in S, i is subscript.
In one embodiment, neutral net can include full convolutional neural networks.Full convolutional neural networks are a kind of special
Different neutral net, its feature are that from all of output are input to comprising that can learn the layer of parameter be all convolutional layer
(convolutional layer).Full convolutional neural networks avoid the pretreatment complicated early stage to image, can directly input
Original image, it can make the text detection result of image especially suitable for the analyzing and processing to the image with complex background
It is more accurate.
According to a specific embodiment of the invention, the full convolutional neural networks being made up of 13 layers can be used.Figure
12 show the schematic diagram of the full convolutional neural networks.
Except including convolutional layer, in addition to maximum pond layer in the full convolutional neural networks.Maximum pond layer separates company
Continuous convolutional layer, it can effectively reduce amount of calculation, while the robustness of strength neural network.
The input of the full convolutional neural networks is raw image data.As shown in figure 12, the full convolutional neural networks include
First convolutional layer and the second convolutional layer, the number of its median filter can be 64, and wave filter size can be 3x3.Second convolution
Layer the first maximum pond layer (maxpool layer) of connection.Followed by the 3rd convolutional layer and Volume Four lamination, its median filter
Number can be 128, wave filter size can be 3x3.Volume Four lamination connects the second maximum pond layer.Followed by the 5th
Convolutional layer, the 6th convolutional layer and the 7th convolutional layer, the number of its median filter is 256, and wave filter size is 3x3.7th convolution
Layer the 3rd maximum pond layer of connection.Followed by the 8th convolutional layer, the 9th convolutional layer and the tenth convolutional layer, the number of its median filter
Mesh can be 512, and wave filter size can be 3x3.Tenth convolutional layer connects the 4th maximum pond layer.It is a roll of followed by the tenth
Lamination, the 12nd convolutional layer and the 13rd convolutional layer, the number of its median filter can be 512, and wave filter size can be
3x3。
In the training process, a sample image and corresponding mask figure are input in full convolutional neural networks every time,
Initial learning rate can be 0.00000001, often be reduced to original 1/10 by 10000 wheel iteration, learning rate.Work as iteration
After 100000 wheels, training process can terminate.The full convolutional neural networks that training process is obtained when terminating are desired language
Adopted forecast model.Via the semantic forecast model trained, mapping to be checked can be generated according to the semanteme of image to be detected
The character area probability graph of the full figure of picture, so as to predict the character area in image to be detected.
Although it will appreciated by the skilled person that illustrate above by taking 13 layers of full convolutional neural networks as an example,
But the number of plies of full convolutional neural networks can include the Arbitrary Digit between 6 to 19.The number of plies of this scope has weighed calculating
As a result the two aspects of accuracy and amount of calculation.In addition, the number and size of wave filter recited above are also only example, rather than
Limitation.Such as the number of wave filter can also be 100,500 or 1000 etc., the size of wave filter can also be 1x1 or 5x5.
According to a further aspect of the invention, a kind of text detection device is additionally provided.Figure 13 is shown according to of the invention one
The schematic block diagram of the text detection device 1300 of embodiment.As shown in figure 13, text detection device 1300 includes semantic analysis
Module 1330 and segmentation module 1340.According to one embodiment of present invention, the semantic module 1330 also includes
Semantic forecast model 1350.
Semantic module 1330 is used to receive image to be detected, and is generated using semantic forecast model 1350 described to be checked
The character area probability graph of the full figure of altimetric image.Semantic forecast model is used for the semantic generation character area according to image to be detected
Probability graph, non-legible region is still fallen within to predict that the pixel in described image to be detected belongs to character area.The literal field
Domain probability graph using different pixel value represents different probability to distinguish the character area of described image to be detected and described treat
The non-legible region of detection image.
In one embodiment, image to be detected can be original image or original image is pre-processed
The image obtained afterwards.
In one embodiment, semantic forecast model 1350 can be by training neutral net to obtain.Hereinafter will knot
Figure 14 is closed training neutral net acquisition semantic forecast model 1350 is described in detail.
With reference to Fig. 3 a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b descriptive text area probability figures.Figure
3a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b are image to be detected according to an embodiment of the invention respectively
Full figure and via semantic forecast model 1350 generate corresponding character area probability graph.Fig. 3 a, Fig. 4 a, Fig. 5 a and Fig. 6 a can
To be the full figure of image to be detected, also, contain character area on image, Fig. 3 b, Fig. 4 b, Fig. 5 b and Fig. 6 b show the figure
3a, Fig. 4 a, Fig. 5 a and Fig. 6 a full figure of image to be detected pass through the character area that is generated after semantic forecast model 1350
Probability graph.The character area probability graph of generation represents different probability to distinguish the text of image to be detected using different pixel value
Block domain and non-legible region.For example, using the pixel filling character area of pixel value 255, represent that the region belongs to literal field
The probability highest in domain, using the non-legible region of the pixel filling of pixel value 0 (for example, background area), represent that the region belongs to text
The probability in block domain is minimum, so as to distinguish character area and the non-legible region in image to be detected.By taking Fig. 4 b as an example, Fig. 4 b
Character area probability graph character area and non-legible region in image to be detected 4a are distinguished using different pixel values.Example
Such as, using two character areas in the pixel filling image to be detected 4a for being 255 with pixel value, " unauthorized No Admittance "
" Authorized Personnel Only ", so as to the character area probability graph gone out as shown in Figure 4 b, also, Fig. 4 b
Character area probability graph also show the direction of the character area in former image to be detected 4a complete and accurate
Split module 1340 to be used to carry out cutting operation to the character area probability graph, to determine character area.Because
The numerical value of pixel in character area probability graph can represent that the pixel belongs to the probability of character area, it is possible to according to bottom
Feature (such as gray scale of image) is split to character area probability.
For example, segmentation module 1340 can be by carrying out binarization operation to obtain literal field to character area probability graph
Domain.In the present invention, due to it is expected to distinguish character area and non-legible region (background area), so utilizing binarization operation
Realize the purpose.Binarization operation realizes that simply amount of calculation is few and speed is fast.
Binarization operation can be Threshold segmentation operation.Alternatively, threshold value T is adjustable parameter.If gray value 255 represents
The probability for belonging to character area is 100%, and gray value 0 represents that the probability for belonging to character area is 0, then can set threshold value
For 128.
Binarization operation can also be the cutting operation increased based on region.Region growing methods are according to same object areas
The similar quality of pixel assembles the method for pixel in domain.Specifically, from prime area (for example, picture in character area probability graph
Element is worth larger pixel) start, by the adjacent pixel with same property (poor smaller with the pixel value of current pixel)
It is integrated into current region so as to progressively growth region, until without can be untill the pixel of merger.
After binarization operation, the segmentation module 1340 can be also used for determining that binarization operation is obtained and each connect
The profile in logical region.It can be realized with any edge detection method of existing or following research and development, such as based on such as Sobel
Or the various edge detection methods such as Canny operators.Segmentation module 1340 can be also used for the contour fitting of each connected region
It is quadrangle to determine the character area.In one embodiment, the interior zone of all quadrangles can be used as literal field
Domain.Specifically, it is assumed that the collection of all quadrangle compositions is combined into B, B={ bk, k=1,2 ... Q, wherein bkRepresent what fitting obtained
Quadrangle, Q represent the number of quadrangle, and k is subscript.Then set B is the result output of text detection.
The region that quadrangle surrounds can preferably include any direction, the word of language, and it is calculated simply.Such as
Shown in Fig. 6 b character area probability graph, the various reasons such as noise in image, image Chinese word shape may cause character area
Probability graph fails more preferably to represent the probability that pixel belongs to character area.Character area is fitted by using quadrilateral area,
It may further ensure that in character area and include whole word contents, so as to ensure the precision of text detection.
In one embodiment, it is believed that average pixel value is smaller in the image that segmentation module 1340 is obtained after splitting
Region be non-legible region, other regions are character area.
Figure 14 shows the schematic block diagram of text detection device 1400 according to another embodiment of the present invention.Text detection
Semantic module 1330 in device 1400 is similar with the semantic module 1330 in text detection device 1300, word inspection
The segmentation module 1340 surveyed in device 1400 is similar with the segmentation module 1340 in text detection device 1300, for sake of simplicity,
This is repeated no more.
Compared with text detection device 1300, text detection device 1400 adds image pre-processing module 1410 and training
Module 1420.
According to an embodiment of the invention, described image pretreatment module 1410 receives original image.In one embodiment,
Original image can have complicated background information, can include having multifarious character area, for example, there is different face
Color, font, the text information of languages and size.
Image pre-processing module 1410 pre-processes to the original image received.In one embodiment, image is pre-
Processing module 1410 can carry out dimension normalization to the original image that receives, i.e., by the maximum dimension of original image (for example,
The greater in the height and width of original image) zoom to pre-set dimension, the pre-set dimension can include 480,640,
800 and 960 pixels etc..Also, the Aspect Ratio for the Aspect Ratio and the original image for pre-processing the image obtained afterwards is protected
Hold identical.
After pretreatment, image pre-processing module 1410 obtains described image to be detected and by described image to be detected
Full figure export to the semantic module 1330 and handled.Wherein, according to described above, image to be detected tool
There is pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of the original image.
According to one embodiment of present invention, training module 1420 is used for using multiple sample images training neutral net,
To obtain semantic forecast model 1350, the model can efficiently differentiate character area and non-legible area in image to be detected
Domain.
In one embodiment, training module 1420 can gather the largely various images comprising word from separate sources and make
For sample image and receive the markup information of sample image.Sample image is, for example, natural scene image.It is expected sample image kind
Class is abundant and number is more, to obtain preferable semantic forecast model.In one embodiment, the number of sample image is no less than
1000。
All character areas in each sample image can be marked using polygon in the sample image.The base of mark
This word unit can be literal line or word.The markup information of character area can be with polygon (for example, four in sample image
Side shape) form preserve.Specifically, in one embodiment, the coordinate on four summits of quadrangle can only be preserved.With four sides
The shape of shape, which preserves markup information, can not only meet any direction, the word of language, and be easy to calculate.
Figure 10 a, Figure 10 b, Figure 10 c and Figure 10 d respectively illustrate having through mark according to an embodiment of the invention
The sample image of markup information.Go out as shown in these figures, quadrangle (light quadrangle in figure) mark sample graph can be used
Character area as in, and the tab area is applied to the direction of any font, languages and word.
Training module 1420 is additionally operable to the mask figure according to sample image and its markup information generation sample image.At one
In embodiment, the mask figure includes two-value mask figure.Specifically, for sample image I and corresponding markup information a, training
Module 1420 generates a width and sample image I mask figures of the same size, for example, two-value mask figure R.Two-value mask figure R is used
Different pixel values distinguishes the character area of sample image and non-legible region.In one embodiment, for sample image I,
The character area marked using the pixel filling with the first pixel value, use the non-text of pixel filling with the second pixel value
Block domain, so as to generate the mask figure, wherein, the first pixel value and the second pixel value are different, to distinguish the character area
With non-legible region.For example, the pixel value of the character area (namely the interior zone marked using quadrangle) marked is filled out
Fill for 255, rather than the pixel value of character area is filled to be 0.
Training module 1420 is further used for using sample image and its mask figure structure training set, and trains nerve net
Network, to obtain semantic forecast model 1350.Specifically, the training sample that original sample image mask figure corresponding with its is formed
Integrate as S.S={ (Ii,Ri), i=1, wherein 2 ..., N, IiRepresent original sample image, RiFor original sample image IiIt is right
The mask figure answered, N are the number of sample image in training sample set S, and i is subscript.
In one embodiment, neutral net can be full convolutional neural networks.Full convolutional neural networks are a kind of special
Neutral net, its feature is that from all of output are input to comprising that can learn the layer of parameter be all convolutional layer.Full convolutional Neural
Network avoids the pretreatment complicated early stage to image, can directly input original image, it is especially suitable for complexity
The analyzing and processing of the image of background, the text detection result of image can be made more accurate.
Training sample set S is inputted full convolutional neural networks and is trained by training module 1420, to obtain semantic forecast mould
Type 1350.According to a specific embodiment of the invention, the full convolutional neural networks being made up of 13 layers can be used.Figure 12
Show the schematic diagram of the full convolutional neural networks.
Except including convolutional layer, in addition to maximum pond layer in the full convolutional neural networks.Maximum pond layer separates company
Continuous convolutional layer, it can effectively reduce amount of calculation, while the robustness of strength neural network.
The input of the full convolutional neural networks is raw image data.As shown in figure 12, the full convolutional neural networks include
First convolutional layer and the second convolutional layer, the number of its median filter can be 64, and wave filter size can be 3x3.Second convolution
Layer the first maximum pond layer of connection.Followed by the 3rd convolutional layer and Volume Four lamination, the number of its median filter can be
128, wave filter size can be 3x3.Volume Four lamination connects the second maximum pond layer.Followed by the 5th convolutional layer, the 6th
Convolutional layer and the 7th convolutional layer, the number of its median filter is 256, and wave filter size is 3x3.7th convolutional layer connection the 3rd is most
Great Chiization layer.Followed by the 8th convolutional layer, the 9th convolutional layer and the tenth convolutional layer, the number of its median filter can be 512,
Wave filter size can be 3x3.Tenth convolutional layer connects the 4th maximum pond layer.Followed by the 11st convolutional layer, the 12nd
Convolutional layer and the 13rd convolutional layer, the number of its median filter can be 512, and wave filter size can be 3x3.
In the training process, a sample image and corresponding mask figure are input in full convolutional neural networks every time,
Initial learning rate can be 0.00000001, often be reduced to original 1/10 by 10000 wheel iteration, learning rate.Work as iteration
After 100000 wheels, training process can terminate.The full convolutional neural networks that training process is obtained when terminating are desired language
Adopted forecast model., can be according to the semantic generation character area of image to be detected via the semantic forecast model trained
Probability graph, so as to predict the character area in image to be detected.
Although it will appreciated by the skilled person that illustrate above by taking 13 layers of full convolutional neural networks as an example,
But the number of plies of full convolutional neural networks can include the Arbitrary Digit between 6 to 19.The number of plies of this scope has weighed calculating
As a result the two aspects of accuracy and amount of calculation.In addition, the number and size of wave filter recited above are also only example, rather than
Limitation.Such as the number of wave filter can also be 100,500 or 1000 etc., the size of wave filter can also be 1x1 or 5x5.
The semantic forecast model 1350 trained neutral net using multiple sample images by training module 1420 and obtained
The character area in image to be detected and non-legible region can be efficiently differentiated.
Figure 15 shows the schematic block diagram of text detection system 1500 according to embodiments of the present invention.As shown in figure 15,
Text detection system 1500 includes processor 1510, memory 1520 and the programmed instruction stored in the memory 1520
1530。
Described program instruction 1530 can realize word according to embodiments of the present invention when the processor 1510 is run
The function of each functional module of detection means, and/or character detecting method according to embodiments of the present invention can be performed
Each step.
Specifically, when described program instruction 1530 is run by the processor 1510, following steps are performed:Receive to be checked
Altimetric image;The character area probability graph of the full figure of described image to be detected is generated via semantic forecast model, wherein, the word
Area probability figure distinguishes the character area of described image to be detected and the non-text of described image to be detected using different pixel values
Block domain;And cutting operation is carried out to the character area probability graph, to determine the character area.Semantic forecast model is used
Pixel in image to be detected described in the semantic forecast according to image belongs to character area and still falls within non-legible region.
In addition, when described program instruction 1530 is run by the processor 1510, following steps are also performed:Receive original
Image;And the original image is pre-processed, to obtain described image to be detected, wherein, image to be detected tool
There is pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of the original image.
It is in addition, performed general to the character area when described program instruction 1530 is run by the processor 1510
The step of rate figure progress cutting operation is to determine the character area includes:Binaryzation behaviour is carried out to the character area probability graph
Make, to determine the character area.
It is in addition, performed general to the character area when described program instruction 1530 is run by the processor 1510
The step of rate figure progress binarization operation is to determine the character area includes:Determine that the binarization operation is obtained each
The profile of connected region;And by the contour fitting be quadrangle, wherein, the quadrangle interior zone is the literal field
Domain.
In addition, when described program instruction 1530 is run by the processor 1510, following steps are also performed:Using multiple
Sample image trains neutral net, to obtain the semantic forecast model.
In addition, when described program instruction 1530 is run by the processor 1510, the multiple sample graphs of performed utilization
As training neutral net is included with obtaining the step of the semantic forecast model:Receive the sample image and the sample image
Markup information;The mask figure of the sample image is generated according to the markup information of the sample image and the sample image;
And the neutral net is trained using the sample image and the mask figure, to obtain the semantic forecast model.
In addition, execution is run using multiple sample images training god by the processor 1510 in described program instruction 1530
In the step of through network to obtain the semantic forecast model, the mask figure includes two-value mask figure, and the two-value is covered
Film figure distinguishes the character area of the sample image and non-legible region using different pixel values.
In addition, execution is run using multiple sample images training god by the processor 1510 in described program instruction 1530
In the step of through network to obtain the semantic forecast model, the neutral net includes full convolutional neural networks.
In addition, execution is run using multiple sample images training god by the processor 1510 in described program instruction 1530
In the step of through network to obtain the semantic forecast model, the number of plies of the full convolutional neural networks is included between 6 to 19
Arbitrary Digit.
In addition, according to embodiments of the present invention, a kind of storage medium is additionally provided, stores program on said storage
Instruction, when described program instruction is run by computer or processor for performing the character detecting method of the embodiment of the present invention
Corresponding steps, and for realizing the corresponding module in text detection device according to embodiments of the present invention.The storage medium
Such as the storage card of smart phone, the memory unit of tablet personal computer, hard disk, the read-only storage of personal computer can be included
(ROM), Erasable Programmable Read Only Memory EPROM (EPROM), portable compact disc read-only storage (CD-ROM), USB storage,
Or any combination of above-mentioned storage medium.The computer-readable recording medium, which can be that one or more is computer-readable, to be deposited
Any combination of storage media, such as a computer-readable recording medium include and are used to train neutral net to obtain semantic forecast
The computer-readable program code of model, another computer-readable recording medium include the calculating for being used for carrying out text detection
The readable program code of machine.
In one embodiment, the computer program instructions can be realized when being run by computer according to of the invention real
Each functional module of the text detection device of example is applied, and/or text detection according to embodiments of the present invention can be performed
Method.
In one embodiment, the computer program instructions perform following steps when being run by computer:Reception is treated
Detection image;The character area probability graph of the full figure of described image to be detected is generated via semantic forecast model, wherein, the text
Block domain probability graph using different pixel value distinguish described image to be detected character area and described image to be detected it is non-
Character area;And cutting operation is carried out to the character area probability graph, to determine the character area.The semantic forecast
Model belongs to character area for pixel in image to be detected according to the semantic forecast of image and still falls within non-legible region.
In addition, the computer program instructions perform when being run by computer, following steps are also performed:Receive original graph
Picture;And the original image is pre-processed, to obtain described image to be detected, wherein, described image to be detected has
Pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of the original image.
It is in addition, performed to the character area probability graph when the computer program instructions are being run by computer
Carrying out the step of cutting operation is to determine the character area includes:Binarization operation is carried out to the character area probability graph,
To determine the character area.
It is in addition, performed to the character area probability graph when the computer program instructions are being run by computer
Carrying out the step of binarization operation is to determine the character area includes:Determine each connection that the binarization operation is obtained
The profile in region;And by the contour fitting be quadrangle, wherein, the quadrangle interior zone is the character area.
In addition, when the computer program instructions are being run by computer, following steps are also performed:Utilize multiple samples
Image trains neutral net, to obtain the semantic forecast model.
In addition, when the computer program instructions are being run by computer, the multiple sample image instructions of performed utilization
Practice neutral net is included with obtaining the step of the semantic forecast model:Receive the mark of the sample image and the sample image
Note information;The mask figure of the sample image is generated according to the markup information of the sample image and the sample image;And
The neutral net is trained using the sample image and the mask figure, to obtain the semantic forecast model.
Multiple sample images training nerve is utilized in addition, being performed when the computer program instructions are being run by computer
In the step of network is to obtain the semantic forecast model, the mask figure includes two-value mask figure, and the two-value mask
Figure distinguishes the character area of the sample image and non-legible region using different pixel values.
Multiple sample images training nerve is utilized in addition, being performed when the computer program instructions are being run by computer
In the step of network is to obtain the semantic forecast model, the neutral net includes full convolutional neural networks.
Multiple sample images training nerve is utilized in addition, being performed when the computer program instructions are being run by computer
In the step of network is to obtain the semantic forecast model, the number of plies of the full convolutional neural networks includes appointing between 6 to 19
Meaning number.
Those of ordinary skill in the art are by reading the detailed description above for character detecting method, it is to be understood that above-mentioned
Text detection device, the structure of system, realization and advantage, therefore repeat no more here.
In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention
Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect,
Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor
The application claims of shield features more more than the feature being expressly recited in each claim.It is more precisely, such as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself
Separate embodiments all as the present invention.
Those skilled in the art are appreciated that except at least one in such feature and/or process or unit
Outside excluding each other, any combinations can be used in this specification (including adjoint claim, summary and accompanying drawing)
Disclosed all features and so disclosed any method or all processes or unit of device are combined.Unless in addition
It is expressly recited, each feature disclosed in this specification (including adjoint claim, summary and accompanying drawing) can be by offer phase
The alternative features of same, equivalent or similar purpose replace.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
One of meaning mode can use in any combination.
The all parts embodiment of the present invention can be realized with hardware, or to be run on one or more processor
Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that it can use in practice
Microprocessor or digital signal processor (DSP) realize some moulds in text detection device according to embodiments of the present invention
The some or all functions of block.The present invention is also implemented as the part or complete for performing method as described herein
The program of device (for example, computer program and computer program product) in portion.Such program for realizing the present invention can store
On a computer-readable medium, or can the form with one or more signal.Such signal can be from internet
Download and obtain on website, either provide on carrier signal or provided in the form of any other.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of some different elements and being come by means of properly programmed computer real
It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
Claims (16)
1. a kind of character detecting method, including:
Receive the markup information of multiple sample images and the sample image;
The mask figure of the sample image is generated according to the markup information of the sample image and the sample image;
Using the sample image and mask figure training neutral net, to obtain semantic forecast model;
Receive image to be detected;
The character area probability graph of the full figure of described image to be detected is generated via the semantic forecast model, wherein, the text
Block domain probability graph using different pixel value distinguish described image to be detected character area and described image to be detected it is non-
Character area;And
Cutting operation is carried out to the character area probability graph, to determine the character area.
2. the method as described in claim 1, in addition to:
Receive original image;And
The original image is pre-processed, to obtain described image to be detected,
Wherein, described image to be detected has pre-set dimension size, and the Aspect Ratio of described image to be detected and the original
The Aspect Ratio of beginning image is identical.
3. the method described in claim 1, wherein, cutting operation is carried out to the character area probability graph, to determine the text
Block domain includes:
Binarization operation is carried out to the character area probability graph, to determine the character area.
4. method as claimed in claim 3, wherein, binarization operation is carried out to the character area probability graph, to determine
Stating character area includes:
Determine the profile for each connected region that the binarization operation is obtained;And
It is quadrangle by the contour fitting, wherein, the quadrangle interior zone is the character area.
5. the method for claim 1, wherein the mask figure includes two-value mask figure, and the two-value mask figure
The character area of the sample image and non-legible region are distinguished using different pixel values.
6. the method for claim 1, wherein the neutral net includes full convolutional neural networks.
7. method as claimed in claim 6, wherein, the numbers of plies of the full convolutional neural networks includes any between 6 to 19
Number.
8. the method as described in any one of claim 1 to 7, wherein, the semantic forecast model is used for according to described to be detected
Pixel in image to be detected described in the semantic forecast of image belongs to character area and still falls within non-legible region.
9. a kind of text detection device, including:
Training module, for receiving the markup information of multiple sample images and the sample image, according to the sample image and
The markup information of the sample image generates the mask figure of the sample image, and utilizes the sample image and the mask
Figure training neutral net, to obtain semantic forecast model;
Semantic module, the training module is connected to, for receiving image to be detected, and uses the semantic forecast model
To generate the character area probability graph of the full figure of described image to be detected, wherein, the character area probability graph uses different
Pixel value distinguishes the character area of described image to be detected and the non-legible region of described image to be detected;And
Split module, for carrying out cutting operation to the character area probability graph, to determine the character area.
10. text detection device as claimed in claim 9, described device further comprise:
Image pre-processing module, pre-processed for receiving original image, and to the original image, it is described to be checked to obtain
Altimetric image,
Wherein, described image to be detected has pre-set dimension size, and the Aspect Ratio of described image to be detected and the original
The Aspect Ratio of beginning image is identical.
11. the text detection device described in claim 9, wherein, the segmentation module is further used for the character area
Probability graph carries out binarization operation, to determine the character area.
12. text detection device as claimed in claim 11, wherein, the segmentation module is further used for determining the two-value
Change the profile for each connected region that operation is obtained, and be quadrangle by the contour fitting, wherein, inside the quadrangle
Region is the character area.
13. text detection device as claimed in claim 9, wherein, the mask figure includes two-value mask figure, and described two
It is worth mask figure and distinguishes the character area of the sample image and non-legible region using different pixel values.
14. text detection device as claimed in claim 9, wherein, the neutral net includes full convolutional neural networks.
15. text detection device as claimed in claim 14, wherein, the number of plies of the full convolutional neural networks includes 6 to 19
Between Arbitrary Digit.
16. the text detection device as described in any one of claim 9 to 15, wherein, semantic forecast model is used for according to
Pixel in image to be detected described in the semantic forecast of image to be detected belongs to character area and still falls within non-legible region.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510970839.2A CN105574513B (en) | 2015-12-22 | 2015-12-22 | Character detecting method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510970839.2A CN105574513B (en) | 2015-12-22 | 2015-12-22 | Character detecting method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105574513A CN105574513A (en) | 2016-05-11 |
CN105574513B true CN105574513B (en) | 2017-11-24 |
Family
ID=55884621
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510970839.2A Active CN105574513B (en) | 2015-12-22 | 2015-12-22 | Character detecting method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105574513B (en) |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106295629B (en) * | 2016-07-15 | 2018-06-15 | 北京市商汤科技开发有限公司 | structured text detection method and system |
CN107633527B (en) * | 2016-07-19 | 2020-07-07 | 北京图森未来科技有限公司 | Target tracking method and device based on full convolution neural network |
CN108108731B (en) * | 2016-11-25 | 2021-02-05 | 中移(杭州)信息技术有限公司 | Text detection method and device based on synthetic data |
CN106778928B (en) * | 2016-12-21 | 2020-08-04 | 广州华多网络科技有限公司 | Image processing method and device |
CN106897732B (en) * | 2017-01-06 | 2019-10-08 | 华中科技大学 | It is a kind of based on connection text section natural picture in multi-direction Method for text detection |
CN107025457B (en) * | 2017-03-29 | 2022-03-08 | 腾讯科技(深圳)有限公司 | Image processing method and device |
US10262236B2 (en) * | 2017-05-02 | 2019-04-16 | General Electric Company | Neural network training image generation system |
WO2018232592A1 (en) * | 2017-06-20 | 2018-12-27 | Microsoft Technology Licensing, Llc. | Fully convolutional instance-aware semantic segmentation |
CN109389116B (en) * | 2017-08-14 | 2022-02-08 | 阿里巴巴(中国)有限公司 | Character detection method and device |
CN109410211A (en) * | 2017-08-18 | 2019-03-01 | 北京猎户星空科技有限公司 | The dividing method and device of target object in a kind of image |
CN107886093B (en) * | 2017-11-07 | 2021-07-06 | 广东工业大学 | Character detection method, system, equipment and computer storage medium |
CN108305262A (en) * | 2017-11-22 | 2018-07-20 | 腾讯科技(深圳)有限公司 | File scanning method, device and equipment |
CN109961553A (en) * | 2017-12-26 | 2019-07-02 | 航天信息股份有限公司 | Invoice number recognition methods, device and tax administration self-service terminal system |
CN108229575A (en) * | 2018-01-19 | 2018-06-29 | 百度在线网络技术(北京)有限公司 | For detecting the method and apparatus of target |
CN108197623A (en) * | 2018-01-19 | 2018-06-22 | 百度在线网络技术(北京)有限公司 | For detecting the method and apparatus of target |
CN108427950B (en) * | 2018-02-01 | 2021-02-19 | 北京捷通华声科技股份有限公司 | Character line detection method and device |
CN108304814B (en) * | 2018-02-08 | 2020-07-14 | 海南云江科技有限公司 | Method for constructing character type detection model and computing equipment |
CN108446621A (en) * | 2018-03-14 | 2018-08-24 | 平安科技(深圳)有限公司 | Bank slip recognition method, server and computer readable storage medium |
CN108717542B (en) * | 2018-04-23 | 2020-09-15 | 北京小米移动软件有限公司 | Method and device for recognizing character area and computer readable storage medium |
CN109102037B (en) * | 2018-06-04 | 2024-03-05 | 平安科技(深圳)有限公司 | Chinese model training and Chinese image recognition method, device, equipment and medium |
CN108921158A (en) * | 2018-06-14 | 2018-11-30 | 众安信息技术服务有限公司 | Method for correcting image, device and computer readable storage medium |
CN108989793A (en) * | 2018-07-20 | 2018-12-11 | 深圳市华星光电技术有限公司 | A kind of detection method and detection device of text pixel |
CN109040824B (en) * | 2018-08-28 | 2020-07-28 | 百度在线网络技术(北京)有限公司 | Video processing method and device, electronic equipment and readable storage medium |
KR102211763B1 (en) * | 2018-09-21 | 2021-02-03 | 네이버 주식회사 | Apparatus, method and system for detecting character |
CN109492638A (en) * | 2018-11-07 | 2019-03-19 | 北京旷视科技有限公司 | Method for text detection, device and electronic equipment |
CN112789623A (en) * | 2018-11-16 | 2021-05-11 | 北京比特大陆科技有限公司 | Text detection method, device and storage medium |
CN111259878A (en) * | 2018-11-30 | 2020-06-09 | 中移(杭州)信息技术有限公司 | Method and equipment for detecting text |
CN109685055B (en) * | 2018-12-26 | 2021-11-12 | 北京金山数字娱乐科技有限公司 | Method and device for detecting text area in image |
CN110119742B (en) * | 2019-04-25 | 2023-07-07 | 添维信息科技(天津)有限公司 | Container number identification method and device and mobile terminal |
CN110059685B (en) * | 2019-04-26 | 2022-10-21 | 腾讯科技(深圳)有限公司 | Character area detection method, device and storage medium |
CN110110777A (en) * | 2019-04-28 | 2019-08-09 | 网易有道信息技术(北京)有限公司 | Image processing method and training method and device, medium and calculating equipment |
CN112001406B (en) * | 2019-05-27 | 2023-09-08 | 杭州海康威视数字技术股份有限公司 | Text region detection method and device |
CN110458162B (en) * | 2019-07-25 | 2023-06-23 | 上海兑观信息科技技术有限公司 | Method for intelligently extracting image text information |
CN111753836A (en) * | 2019-08-27 | 2020-10-09 | 北京京东尚科信息技术有限公司 | Character recognition method and device, computer readable medium and electronic equipment |
CN110503103B (en) * | 2019-08-28 | 2023-04-07 | 上海海事大学 | Character segmentation method in text line based on full convolution neural network |
CN110503159B (en) * | 2019-08-28 | 2022-10-11 | 北京达佳互联信息技术有限公司 | Character recognition method, device, equipment and medium |
CN110807454B (en) * | 2019-09-19 | 2024-05-14 | 平安科技(深圳)有限公司 | Text positioning method, device, equipment and storage medium based on image segmentation |
DE102019134387A1 (en) * | 2019-12-13 | 2021-06-17 | Beckhoff Automation Gmbh | Process for real-time optical character recognition in an automation system and automation system |
CN111242120B (en) * | 2020-01-03 | 2022-07-29 | 中国科学技术大学 | Character detection method and system |
CN113496223A (en) * | 2020-03-19 | 2021-10-12 | 顺丰科技有限公司 | Method and device for establishing text region detection model |
CN111626283B (en) * | 2020-05-20 | 2022-12-13 | 北京字节跳动网络技术有限公司 | Character extraction method and device and electronic equipment |
CN111723815B (en) * | 2020-06-23 | 2023-06-30 | 中国工商银行股份有限公司 | Model training method, image processing device, computer system and medium |
CN111753727B (en) * | 2020-06-24 | 2023-06-23 | 北京百度网讯科技有限公司 | Method, apparatus, device and readable storage medium for extracting structured information |
CN111767921A (en) * | 2020-06-30 | 2020-10-13 | 上海媒智科技有限公司 | Express bill positioning and correcting method and device |
CN114078108B (en) * | 2020-08-11 | 2023-12-22 | 北京阅影科技有限公司 | Method and device for processing abnormal region in image, and method and device for dividing image |
CN112801911B (en) * | 2021-02-08 | 2024-03-26 | 苏州长嘴鱼软件有限公司 | Method and device for removing text noise in natural image and storage medium |
CN114067192A (en) * | 2022-01-07 | 2022-02-18 | 北京许先网科技发展有限公司 | Character recognition method and system |
CN114495129B (en) * | 2022-04-18 | 2022-09-09 | 阿里巴巴(中国)有限公司 | Character detection model pre-training method and device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100269102B1 (en) * | 1994-06-24 | 2000-10-16 | 윤종용 | Numeric character recognition with neural network |
CN103745213A (en) * | 2014-02-28 | 2014-04-23 | 中国人民解放军63680部队 | Optical character recognition method based on LVQ neural network |
CN104899586B (en) * | 2014-03-03 | 2018-10-12 | 阿里巴巴集团控股有限公司 | Method and device is identified to the word content for including in image |
-
2015
- 2015-12-22 CN CN201510970839.2A patent/CN105574513B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN105574513A (en) | 2016-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105574513B (en) | Character detecting method and device | |
CN109800736B (en) | Road extraction method based on remote sensing image and deep learning | |
Wang et al. | Tire defect detection using fully convolutional network | |
Ke et al. | A review of methods for automatic individual tree-crown detection and delineation from passive remote sensing | |
CN108154105B (en) | Underwater biological detection and identification method and device, server and terminal equipment | |
CN113887459B (en) | Open-pit mining area stope change area detection method based on improved Unet + | |
CN111986099A (en) | Tillage monitoring method and system based on convolutional neural network with residual error correction fused | |
CN109977191B (en) | Problem map detection method, device, electronic equipment and medium | |
Qu et al. | A pedestrian detection method based on yolov3 model and image enhanced by retinex | |
Wang et al. | Extraction of coastal raft cultivation area with heterogeneous water background by thresholding object-based visually salient NDVI from high spatial resolution imagery | |
CN113989662A (en) | Remote sensing image fine-grained target identification method based on self-supervision mechanism | |
Ghorai et al. | Extracting shoreline from satellite imagery for GIS analysis | |
CN108108731A (en) | Method for text detection and device based on generated data | |
Xiao et al. | Treetop detection using convolutional neural networks trained through automatically generated pseudo labels | |
Yue et al. | Texture extraction for object-oriented classification of high spatial resolution remotely sensed images using a semivariogram | |
CN103946865B (en) | Method and apparatus for contributing to the text in detection image | |
CN114399480A (en) | Method and device for detecting severity of vegetable leaf disease | |
CN113887472A (en) | Remote sensing image cloud detection method based on cascade color and texture feature attention | |
CN110991430A (en) | Ground feature identification and coverage rate calculation method and system based on remote sensing image | |
CN110570442A (en) | Contour detection method under complex background, terminal device and storage medium | |
CN115019181B (en) | Remote sensing image rotating target detection method, electronic equipment and storage medium | |
CN111881706B (en) | Living body detection, image classification and model training method, device, equipment and medium | |
CN111860465A (en) | Remote sensing image extraction method, device, equipment and storage medium based on super pixels | |
Dong et al. | A cloud detection method for GaoFen-6 wide field of view imagery based on the spectrum and variance of superpixels | |
CN112651351B (en) | Data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100190 Beijing, Haidian District Academy of Sciences, South Road, No. 2, block A, No. 313 Applicant after: MEGVII INC. Applicant after: Beijing maigewei Technology Co., Ltd. Address before: 100190 Beijing, Haidian District Academy of Sciences, South Road, No. 2, block A, No. 313 Applicant before: MEGVII INC. Applicant before: Beijing aperture Science and Technology Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |