CN105574513A - Character detection method and device - Google Patents

Character detection method and device Download PDF

Info

Publication number
CN105574513A
CN105574513A CN201510970839.2A CN201510970839A CN105574513A CN 105574513 A CN105574513 A CN 105574513A CN 201510970839 A CN201510970839 A CN 201510970839A CN 105574513 A CN105574513 A CN 105574513A
Authority
CN
China
Prior art keywords
image
character area
detected
sample image
probability graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510970839.2A
Other languages
Chinese (zh)
Other versions
CN105574513B (en
Inventor
姚聪
周舒畅
周昕宇
印奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Beijing Aperture Science and Technology Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Beijing Aperture Science and Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd, Beijing Aperture Science and Technology Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN201510970839.2A priority Critical patent/CN105574513B/en
Publication of CN105574513A publication Critical patent/CN105574513A/en
Application granted granted Critical
Publication of CN105574513B publication Critical patent/CN105574513B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/274Syntactic or semantic context, e.g. balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention discloses a character detection method and a device. The character detection method comprises steps of receiving image to be detected, generating a character area probability graph of the whole text of the image to be detected which is generated by a semantic prediction model, wherein the character area probability graph uses different the pixel value area to distinguish the Chinese character area of the image to be detected and the non-Chinese character area of the image to be detected and performing segmentation operation on the probability graph of the Chinese character area in order to determine the Chinese character area. The Character detection method disclosed by the invention can detect the characters of various languages, directions, colors, fonts and sizes while effectively inhibiting the interference of the complex background. Besides, the Chinese character detection method and the device have characteristics of strong robustness and effectively resisting the interferences of the image noise, image fuzziness, the complex image background and the non-uniform light irradiation.

Description

Character detecting method and device
Technical field
The present invention relates to image processing field, be specifically related to a kind of character detecting method and device.
Background technology
Along with extensively popularizing of smart mobile phone and developing rapidly of mobile Internet, obtaining, retrieve and share information by the camera of the mobile terminals such as mobile phone progressively becomes one way of life mode.The understanding to photographed scene is emphasized in application based on (Camera-based) of camera more.Usually, at word and other objects and the scene of depositing, user often first more pays close attention to the Word message in scene, and the word thus in correct recognition image is taken intention to user and had more deep understanding.This just relate to text detection technology to identify the character area in shooting image.
Text detection, as an important basic technology, has the text detection of huge using value and wide application prospect, particularly natural scene image.Such as, the text detection technology of natural scene image can directly apply to the fields such as augmented reality, geo-location, man-machine interaction, robot navigation, autonomous driving vehicle and industrial automation.
But, mostly comprise more complicated background in image to be detected, and its quality may be subject to the impact of the factors such as noise, fuzzy, inhomogeneous illumination; In addition, word has diversity, and such as, the word in natural scene image may have different colors, size, font and direction etc.These factors all can bring huge difficulty and challenge to text detection.For these reasons, existing character detecting method easily produces false-alarm (falsealarm), is also determined as word mistakenly by the non-legible composition in background.In addition, existing character detecting method also Shortcomings part in adaptability, such as, most of method can only the word in detection level direction, then helpless for the word tilted or rotate.Again such as, some method is merely able to be applied to Chinese and detects, and directly cannot be generalized to the word of different classes of language (as English, Russian, Korean etc.).And when there is serious noise, fuzzy or inhomogeneous illumination in image, existing character detecting method often produces mistake again.In a word, existing character detecting method and system existing defects in precision and the scope of application etc.
Summary of the invention
In view of the above problems, the present invention is proposed to provide a kind of character detecting method of solving the problem at least in part and device.
According to one aspect of the invention, provide a kind of character detecting method, comprising:
Receive image to be detected; Via the character area probability graph of the full figure of image to be detected described in semantic forecast model generation, wherein, described character area probability graph uses different pixel values to distinguish the character area of described image to be detected and the non-legible region of described image to be detected; And
Cutting operation is carried out to described character area probability graph, to determine described character area.
According to a further aspect of the invention, additionally provide a kind of text detection device, comprise semantic module and segmentation module.Semantic module is for receiving image to be detected, and use semantic forecast model to generate the character area probability graph of the full figure of described image to be detected, wherein, described character area probability graph uses different pixel values to distinguish the character area of described image to be detected and the non-legible region of described image to be detected.Segmentation module is used for carrying out cutting operation to described character area probability graph, to determine described character area.
In above-mentioned character detecting method and device, support that the full figure treating detected image directly carries out text detection, be different from the algorithm based on simple threshold values segmentation, sliding window or connected component.It can while the interference effectively suppressing complex background, detects the word of different language, direction, color, font and size, wide accommodation.In addition, this character detecting method and device have the feature of strong robustness, can successfully manage the interference of the factor such as complex background, inhomogeneous illumination in picture noise, image blurring, image.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of instructions, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.
Accompanying drawing explanation
By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 a and Fig. 1 b schematically illustrates image to be detected according to an embodiment of the invention and image after testing respectively;
Fig. 2 schematically illustrates the process flow diagram of character detecting method according to an embodiment of the invention;
Fig. 3 a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b schematically illustrate full figure and its corresponding character area probability graph generated of image to be detected according to an embodiment of the invention respectively.
Fig. 7 schematically illustrates the process flow diagram of the method obtaining image to be detected according to an embodiment of the invention;
Fig. 8 schematically illustrates the process flow diagram of the method for according to an embodiment of the invention character area probability graph being carried out to cutting operation;
Fig. 9 schematically illustrates the process flow diagram of the method for neural network training according to an embodiment of the invention;
Figure 10 a, Figure 10 b, Figure 10 c and Figure 10 d respectively illustrate the sample image according to an embodiment of the invention with markup information;
Figure 11 a and Figure 11 b respectively illustrates has the sample image of markup information and the mask artwork of its correspondence according to an embodiment of the invention;
Figure 12 schematically illustrates the schematic diagram of complete according to an embodiment of the invention convolutional neural networks;
Figure 13 schematically illustrates the schematic block diagram of text detection device according to an embodiment of the invention;
Figure 14 schematically illustrates the schematic block diagram of text detection device in accordance with another embodiment of the present invention; And
Figure 15 schematically illustrates the schematic block diagram of text detection system according to an embodiment of the invention.
Embodiment
Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
In order to character area in more reasonably automatic recognition image, the invention provides a kind of character detecting method.Fig. 1 a and Fig. 1 b schematically illustrates image to be detected according to an embodiment of the invention and image after testing respectively.Fig. 2 shows the process flow diagram of character detecting method 200 according to an embodiment of the invention.As shown in Figure 2, the method 200 comprises step S210 to step S230.
In step S210, receive image to be detected.Image to be detected can be original image, also can be the image obtained after carrying out pre-service to original image.In one embodiment of the invention, described image to be detected can be obtained by carrying out pre-service to the original image collected.Hereinafter be described in detail in conjunction with the method for concrete accompanying drawing to described Image semantic classification.
In step S220, via the character area probability graph of the full figure of image to be detected described in semantic forecast model generation, wherein, described character area probability graph uses different pixel values to distinguish the character area of described image to be detected and the non-legible region of described image to be detected.According to one embodiment of present invention, character area refers in image the region comprising word.Character areas for the region of two black quadrilateral inside in Fig. 1 a and Fig. 1 b, Fig. 1 b.In first character area, comprise word " I am in growth ", in second character area, comprise word and " please do not step on me ".
In one embodiment, character area probability graph use different pixel values to represent different probability is with the non-legible region of the character area and described image to be detected of distinguishing described image to be detected.In one embodiment, the probability that this pixel region of the higher expression of pixel value of image belongs to character area is higher, and the pixel value of image is lower, represents that the probability that this pixel region belongs to character area is lower.Such as pixel value be 0 black picture element represent that the probability that this pixel region belongs to character area is 0, pixel value be 255 white pixel represent that the probability that this pixel region belongs to character area is 100%.
According to one embodiment of present invention, the full figure of image to be detected is via semantic forecast model generation character area probability graph.Semantic forecast model is used for according to the semantic generating character area probability figure of image to be detected, belongs to character area still belong to non-legible region with the pixel predicting in image to be detected.Image, semantic is the high-level characteristic of image, although it is based on the color of image, texture, shape etc. low-level image feature, significantly different from these low-level image features.Image, semantic is as the basic description carrier of knowledge information, and complete picture material can be converted to can the class text language performance of intuitivism apprehension, plays vital effect in image understanding.Image understanding input be view data, output be knowledge, it belongs to the High-level content of picture research field.Semantic forecast model can realize image understanding, and it can directly according to the character area in image, semantic recognition image, and these are significantly different from each model based on Threshold segmentation image.Semantic forecast model can treat the understanding of detected image based on it, according to the semanteme of image to be detected, generate more preferably character area probability graph, thus the pixel predicting in image to be detected belongs to character area still belongs to non-legible region, to obtain more reasonably character area.
Described semantic forecast model can be obtained by neural network training.Neural network can be used for estimating general unknown approximate function according to a large amount of inputs.Neural network can machine learning, has stronger self-adaptation character.Trained neural network can approach an arbitrary function, and it can from given data " study ".Thus, neural network is highly suitable for being used as semantic forecast model through training, identifies the character area in image to be detected.Hereinafter composition graphs 9 to Figure 12 is obtained semantic forecast model to neural network training to be described in detail.
Fig. 3 a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b are the character area probability graph of the full figure of image to be detected according to an embodiment of the invention and the correspondence via semantic forecast model generation respectively.Fig. 3 a, Fig. 4 a, Fig. 5 a and Fig. 6 a can be the full figures of image to be detected, and containing character area on image, such as, what comprise in the character area on Fig. 3 a is Chinese; The character area that Fig. 4 a image comprises comprises Chinese and English, and as shown in fig. 4 a, the direction non-horizontal of the character area in Fig. 4 a; Character area on the image of Fig. 5 a comprises Russian; The character area of Fig. 6 a image comprises Korean.Further, can find out, the image of Fig. 3 a, Fig. 4 a, Fig. 5 a and Fig. 6 a has different backgrounds, and background more complicated; Further, the word in above-mentioned image also has diversity, and such as these words have the information such as different colors, font, languages and size.The character area probability graph that Fig. 3 b, Fig. 4 b, Fig. 5 b and Fig. 6 b illustrate described Fig. 3 a, Fig. 4 a respectively, the full figure of image to be detected of Fig. 5 a and Fig. 6 a generates after semantic forecast model.The character area probability graph generated uses different pixel values to represent, and different probability is to distinguish the character area of image to be detected and non-legible region.Such as, use the pixel filling character area of pixel value 255, represent that this region belongs to the probability of character area the highest, use the non-legible region of pixel filling of pixel value 0 (such as, background area), represent that this region belongs to the probability of character area minimum, thus the character area distinguished in image to be detected and non-legible region.Different pixel values is used to distinguish character area in image 4a to be detected and non-legible region for the character area probability graph of Fig. 4 b, Fig. 4 b.Such as, use that to have pixel value be two character areas in the pixel filling image 4a to be detected of 255, " unauthorized No Admittance " and " AuthorizedPersonnelOnly ", thus the character area probability graph gone out as shown in Figure 4 b, further, the character area probability graph of Fig. 4 b also shows the direction of the character area in image 4a to be detected complete and accurate.
In step S230, cutting operation is carried out, to determine character area to the character area probability graph that step S220 generates.Because the numerical value of the pixel in character area probability graph can represent that this pixel region belongs to the probability of character area, thus distinguish character area and non-legible region, so can split character area probability according to low-level image feature (gray scale of such as image).
Such as, this step S230 can obtain character area by carrying out binaryzation operation to character area probability graph.In the present invention, owing to expecting to distinguish character area and non-legible region (background area), so utilize binaryzation to operate can realize this object.Binaryzation operation realizes simple, and calculated amount is few and speed is fast.
Binaryzation operation can be Threshold segmentation operation.Alternatively, threshold value T is adjustable parameter.If gray-scale value 255 represents that the probability belonging to character area is 100%, gray-scale value 0 represents that the probability belonging to character area is 0, so threshold value can be set to 128.
Binaryzation operation can also be the cutting operation increased based on region.Region growing methods assembles the method for pixel according to the similar quality of pixel in same object area.Particularly, from prime area (such as, the pixel that in character area probability graph, pixel value is larger) start, the adjacent pixel with same property (smaller with the difference of the pixel value of current pixel) to be integrated in current region thus progressively growth region, until do not have can till the pixel of merger.
Can think that splitting the region that in the rear image obtained, average pixel value is less is non-legible region, other regions are character area.Hereinafter by conjunction with concrete accompanying drawing, binaryzation operation is determined that described character area is described in detail.
One of ordinary skill in the art will appreciate that, said method 200 has universality.It may be used for the text detection of any image.The method 200 can carry out text detection and identification for file and picture, the photo of file and picture such as certificate and bill, the scanned copy etc. of paper document.The method 200 can also carry out text detection and identification for natural scene image.
Said method 200 of the present invention has abandoned the detection mode based on sliding window and the detection mode based on connected component, have employed the brand-new detection mode based on semantic segmentation.The method 200 can realize full figure prediction, namely input and output are all entire image, instead of regional area or window, therefore can utilize the contextual information in image better, contextual information particularly in natural scene image, thus obtain text detection result more accurately.
The method 200 can process the image of different scene, different quality.The method 200 while the interference effectively suppressing complex background, can detect the word of different colours, font and size.The method 200 can the direction of automatic Prediction literal line, can the word of different directions in direct-detection image.Language belonging to the method 200 pairs of words is insensitive, can detect the word that different classes of language (as Chinese, English, Korean etc.) is corresponding simultaneously.In addition, the method 200 has the feature of strong robustness, can successfully manage the interference of noise, fuzzy, the factor such as complex background, inhomogeneous illumination.
Fig. 7 shows the process flow diagram obtaining described image to be detected according to an embodiment of the invention.
In step S710, receive original image.In one embodiment, original image can have complicated background information, and its character area comprised also can have diversity, and such as character area can include the Word message of different colors, font, languages and size etc.
In step S720, pre-service is carried out to the original image received, to obtain image to be detected.In one embodiment, the original image received can be carried out dimension normalization, by original image maximum dimension (such as, the greater in the height of original image and width) zoom to pre-set dimension, described pre-set dimension can comprise 480,640,800 and 960 pixels etc.The Aspect Ratio of image to be detected obtained after dimension normalization operation keeps identical with the Aspect Ratio of original image.
Fig. 8 schematically illustrates the process flow diagram of the method for according to an embodiment of the invention character area probability graph being carried out to cutting operation.
In step S810, the character area probability graph treating detected image carries out binaryzation operation.
Be appreciated that and directly can obtain character area according to the result of binaryzation operation.In the present invention, owing to expecting to distinguish character area and non-legible region (background area), so utilize binaryzation to operate can realize this object.Binaryzation operation realizes simple, and calculated amount is few and speed is fast.
Binaryzation operation can be Threshold segmentation operation.Alternatively, threshold value T is adjustable parameter.If gray-scale value 255 represents that the probability belonging to character area is 100%, gray-scale value 0 represents that the probability belonging to character area is 0, so threshold value can be set to 128.
Binaryzation operation can also be the cutting operation increased based on region.Region growing methods assembles the method for pixel according to the similar quality of pixel in same object area.Particularly, from prime area (such as, the pixel that in character area probability graph, pixel value is larger) start, the adjacent pixel with same property (smaller with the difference of the pixel value of current pixel) to be integrated in current region thus progressively growth region, until do not have can till the pixel of merger
In the embodiment shown in fig. 8, after binaryzation operation, step S820 and step S830 is also comprised.
In step S820, determine that binaryzation operates the profile of each connected region obtained.This step can realize with any edge detection method of existing or following research and development, such as, based on various edge detection methods such as such as Sobel or Canny operators.
In step S830, be that quadrilateral is to determine described character area by the contour fitting of each connected region.In one embodiment, the interior zone of all quadrilaterals can as character area.Particularly, suppose that the set that all quadrilaterals form is B, B={b k, k=1,2 ... Q, wherein b krepresent the quadrilateral that matching obtains, Q represents the number of quadrilateral, and k is subscript.Then set B is the result output of text detection.
The region that quadrilateral surrounds can comprise the word of any direction, language preferably, and it calculates simple.Such as, shown in the character area probability graph of Fig. 6 b, the various reasons such as noise in image, image Chinese word shape may cause character area probability graph to fail more desirably to represent that pixel belongs to the probability of character area.By carrying out matching character area with quadrilateral area, can ensure further in character area, to comprise whole word content, thus ensure the precision of text detection.
Fig. 9 schematically illustrate according to an embodiment of the invention neural network training to obtain the process flow diagram of the method for semantic forecast model.The object of the method is from sample image learning semantic forecast model, and this model effectively can distinguish character area in image to be detected and non-legible region.
Sample image is the image of known wherein character area.As mentioned above, ability that neural network has " study ", can obtain available semantic forecast model by utilizing multiple sample image neural network training.In this embodiment, this training method makes semantic forecast model can according to the semanteme of image to be detected, generate character area probability graph more accurately, thus predict that the pixel in described image to be detected belongs to character area or non-legible region, thus, make the accuracy of the testing result of character detecting method higher.
One of ordinary skill in the art will appreciate that, for text detection system, this semantic forecast model can be pre-stored within wherein.
In step S910, receive multiple sample image and its markup information.
In one embodiment, the various images of word can be comprised in a large number as sample image from separate sources collection, such as, natural scene image.Expect sample image abundant species and number is more, to obtain desirable semantic forecast model.In one embodiment, the number of sample image is no less than 1000.
Can use polygon in each sample image, mark all character areas in described sample image, thus obtain the markup information of sample image.The ground literal unit of mark can be literal line or word.In sample image, the markup information of character area can be preserved with the form of polygon (such as, quadrilateral).Particularly, in one embodiment, the coordinate on four summits of quadrilateral can only be preserved.Preserve with the shape of quadrilateral the word that markup information not only can meet any direction, language, and be convenient to calculate.
Figure 10 a, Figure 10 b, Figure 10 c and Figure 10 d respectively illustrate according to an embodiment of the invention through the sample image with markup information of mark.Go out as shown in these figures, the character area in sample image can be marked with quadrilateral (in figure light quadrilateral), and this tab area is applicable to the direction of any font, languages and word.
In step S920, generate the mask figure of sample image according to described sample image and its markup information.Particularly, for sample image I and corresponding markup information a, a width and sample image I mask figure of the same size is generated.In one embodiment, described mask figure can comprise two-value mask figure R.In described two-value mask figure R, different pixel values is used to distinguish the character area of sample image and non-legible region.In one embodiment, for sample image I, use the character area that the pixel filling markup information with the first pixel value marks, use the non-legible region of pixel filling with the second pixel value, thus generate two-value mask figure R, wherein, the first pixel value is different with the second pixel value, to distinguish described character area and non-legible region.Such as, in two-value mask figure R, the pixel value of the character area (also namely using the interior zone of quadrilateral mark) marked is filled to be 255, but not the pixel value of character area is filled to be 0.
Figure 11 a and Figure 11 b respectively illustrates according to an embodiment of the invention through the sample image with markup information of mark and the mask figure of its correspondence.As shown in fig. lla, use quadrilateral by the word segment of former sample image (such as, " Haidian construction security ", " Haidian Middle St ", " HAIDIANZHONGJIE ", " South Road, Haidian ") mark out, and generate the mask artwork shown in Figure 11 b accordingly.Wherein, use and there is the pixel filling word segment that marks out that pixel value is 255, use pixel value be 0 the non-legible part of pixel filling, thus obtain the mask figure shown in Figure 11 b.
In step S930, sample image and its mask figure is utilized to build training set, and neural network training, to obtain semantic forecast model M.The mask figure composing training sample set S of original sample image and its correspondence.S={ (I i, R i), i=1,2 ..., N, wherein I irepresent original sample image, R ifor original sample image I icorresponding mask figure, N is the number of sample image in training sample set S, and i is subscript.
In one embodiment, neural network can comprise full convolutional neural networks.Full convolutional neural networks is the special neural network of a class, and its feature is that all the comprising from being input to output can the layer of mathematic(al) parameter be all convolutional layer (convolutionallayer).Full convolutional neural networks avoids the complicated pre-service in early stage to image, and directly can input original image, it is specially adapted to the analyzing and processing to the image with complex background, and the text detection result of image can be made more accurate.
According to the present invention's specific embodiment, a full convolutional neural networks be made up of 13 layers can be adopted.Figure 12 shows the schematic diagram of this full convolutional neural networks.
Except comprising convolutional layer in this full convolutional neural networks, also comprise maximum pond layer.Maximum pond layer separates continuous print convolutional layer, and it effectively can reduce calculated amount, simultaneously the robustness of strength neural network.
This full convolutional neural networks be input as raw image data.As shown in figure 12, this full convolutional neural networks comprises first volume lamination and volume Two lamination, and the number of its median filter can be 64, and wave filter size can be 3x3.Volume Two lamination connects the first maximum pond layer (maxpoollayer).Next be the 3rd convolutional layer and Volume Four lamination, the number of its median filter can be 128, and wave filter size can be 3x3.Volume Four lamination connects the second maximum pond layer.Next be the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer, the number of its median filter is 256, and wave filter size is 3x3.7th convolutional layer connects the 3rd maximum pond layer.Next be the 8th convolutional layer, the 9th convolutional layer and the tenth convolutional layer, the number of its median filter can be 512, and wave filter size can be 3x3.Tenth convolutional layer connects the 4th maximum pond layer.Next be the 11 convolutional layer, the 12 convolutional layer and the 13 convolutional layer, the number of its median filter can be 512, and wave filter size can be 3x3.
In the training process, be input in full convolutional neural networks by a sample image and corresponding mask figure, initial learn rate can be 0.00000001, often takes turns iteration through 10000, and learning rate reduces to original 1/10 at every turn.After iteration 100000 is taken turns, training process can stop.The full convolutional neural networks that training process obtains when stopping is the semantic forecast model of expectation.Via the described semantic forecast model trained, the character area probability graph of the full figure of image to be detected can be generated according to the semanteme of image to be detected, thus predict the character area in image to be detected.
One of ordinary skill in the art will appreciate that, although illustrate for the full convolutional neural networks of 13 layers above, the number of plies of full convolutional neural networks can be the Arbitrary Digit comprised between 6 to 19.The number of plies of this scope has weighed result of calculation accuracy and these two aspects of calculated amount.In addition, number and the size of wave filter recited above are also only example, and unrestricted.The number of such as wave filter can also be 100,500 or 1000 etc., and the size of wave filter can also be 1x1 or 5x5.
According to a further aspect of the invention, a kind of text detection device is additionally provided.Figure 13 shows the schematic block diagram of text detection device 1300 according to an embodiment of the invention.As shown in figure 13, text detection device 1300 comprises semantic module 1330 and segmentation module 1340.According to one embodiment of present invention, described semantic module 1330 also comprises semantic forecast model 1350.
Semantic module 1330 for receiving image to be detected, and uses semantic forecast model 1350 to generate the character area probability graph of the full figure of described image to be detected.Semantic forecast model is used for according to the semantic generating character area probability figure of image to be detected, belongs to character area still belong to non-legible region with the pixel predicting in described image to be detected.Described character area probability graph uses different pixel values to represent, and different probability is with the non-legible region of the character area and described image to be detected of distinguishing described image to be detected.
In one embodiment, image to be detected can be original image, also can be the image obtained after carrying out pre-service to original image.
In one embodiment, semantic forecast model 1350 can be obtained by neural network training.Semantic forecast model 1350 will be obtained in conjunction with Figure 14 to neural network training to be hereinafter described in detail.
Composition graphs 3a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b descriptive text area probability figure.Fig. 3 a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b are the character area probability graph of the full figure of image to be detected according to an embodiment of the invention and the correspondence via semantic forecast model 1350 generation respectively.Fig. 3 a, Fig. 4 a, Fig. 5 a and Fig. 6 a can be the full figures of image to be detected, and, containing character area on image, the character area probability graph that Fig. 3 b, Fig. 4 b, Fig. 5 b and Fig. 6 b illustrate described Fig. 3 a, Fig. 4 a, the full figure of image to be detected of Fig. 5 a and Fig. 6 a generates after semantic forecast model 1350.The character area probability graph generated uses different pixel values to represent, and different probability is to distinguish the character area of image to be detected and non-legible region.Such as, use the pixel filling character area of pixel value 255, represent that this region belongs to the probability of character area the highest, use the non-legible region of pixel filling of pixel value 0 (such as, background area), represent that this region belongs to the probability of character area minimum, thus the character area distinguished in image to be detected and non-legible region.Different pixel values is used to distinguish character area in image 4a to be detected and non-legible region for the character area probability graph of Fig. 4 b, Fig. 4 b.Such as, use that to have pixel value be two character areas in the pixel filling image 4a to be detected of 255, " unauthorized No Admittance " and " AuthorizedPersonnelOnly ", thus the character area probability graph gone out as shown in Figure 4 b, further, the character area probability graph of Fig. 4 b also shows the direction of the character area in former image 4a to be detected complete and accurate
Segmentation module 1340 is for carrying out cutting operation, to determine character area to described character area probability graph.Because the numerical value of the pixel in character area probability graph can represent that this pixel belongs to the probability of character area, so can split character area probability according to low-level image feature (gray scale of such as image).
Such as, split module 1340 and can obtain character area by carrying out binaryzation operation to character area probability graph.In the present invention, owing to expecting to distinguish character area and non-legible region (background area), so utilize binaryzation to operate can realize this object.Binaryzation operation realizes simple, and calculated amount is few and speed is fast.
Binaryzation operation can be Threshold segmentation operation.Alternatively, threshold value T is adjustable parameter.If gray-scale value 255 represents that the probability belonging to character area is 100%, gray-scale value 0 represents that the probability belonging to character area is 0, so threshold value can be set to 128.
Binaryzation operation can also be the cutting operation increased based on region.Region growing methods assembles the method for pixel according to the similar quality of pixel in same object area.Particularly, from prime area (such as, the pixel that in character area probability graph, pixel value is larger) start, the adjacent pixel with same property (smaller with the difference of the pixel value of current pixel) to be integrated in current region thus progressively growth region, until do not have can till the pixel of merger.
After binaryzation operation, described segmentation module 1340 can also be used for determining that binaryzation operates the profile of each connected region obtained.Can realize with any edge detection method of existing or following research and development, such as, based on various edge detection methods such as such as Sobel or Canny operators.Segmentation module 1340 can also be used for being that quadrilateral is to determine described character area by the contour fitting of each connected region.In one embodiment, the interior zone of all quadrilaterals can as character area.Particularly, suppose that the set that all quadrilaterals form is B, B={b k, k=1,2 ... Q, wherein b krepresent the quadrilateral that matching obtains, Q represents the number of quadrilateral, and k is subscript.Then set B is the result output of text detection.
The region that quadrilateral surrounds can comprise the word of any direction, language preferably, and it calculates simple.Such as, shown in the character area probability graph of Fig. 6 b, the various reasons such as noise in image, image Chinese word shape may cause character area probability graph to fail more desirably to represent that pixel belongs to the probability of character area.By carrying out matching character area with quadrilateral area, can ensure further in character area, to comprise whole word content, thus ensure the precision of text detection.
In one embodiment, can think that segmentation module 1340 splits the region that in the rear image obtained, average pixel value is less is non-legible region, and other regions are character area.
Figure 14 shows the schematic block diagram of text detection device 1400 according to another embodiment of the present invention.Semantic module 1330 in text detection device 1400 is similar with the semantic module 1330 in text detection device 1300, segmentation module 1340 in text detection device 1400 is similar with the segmentation module 1340 in text detection device 1300, for simplicity, do not repeat them here.
Compared with text detection device 1300, text detection device 1400 adds image pre-processing module 1410 and training module 1420.
According to embodiments of the invention, described image pre-processing module 1410 receives original image.In one embodiment, original image can have complicated background information, can comprise and have multifarious character area, such as, have the Word message of different colors, font, languages and size.
Image pre-processing module 1410 carries out pre-service to the original image received.In one embodiment, image pre-processing module 1410 can carry out dimension normalization to the original image received, by original image maximum dimension (such as, the greater in the height of original image and width) zoom to pre-set dimension, described pre-set dimension can comprise 480,640,800 and 960 pixels etc.Further, the Aspect Ratio of the image obtained after pre-service keeps identical with the Aspect Ratio of described original image.
After pre-service, image pre-processing module 1410 obtains described image to be detected and exports the full figure of described image to be detected to described semantic module 1330 processing.Wherein, according to description above, described image to be detected has pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of described original image.
According to one embodiment of present invention, training module 1420 is for utilizing multiple sample image neural network training, and to obtain semantic forecast model 1350, this model can distinguish character area in image to be detected and non-legible region effectively.
In one embodiment, the various images that training module 1420 can comprise word in a large number from separate sources collection receive the markup information of sample image as sample image.Sample image is such as natural scene image.Expect sample image abundant species and number is more, to obtain desirable semantic forecast model.In one embodiment, the number of sample image is no less than 1000.
All character areas in each sample image can use polygon to mark in this sample image.The ground literal unit of mark can be literal line or word.In sample image, the markup information of character area can be preserved with the form of polygon (such as, quadrilateral).Particularly, in one embodiment, the coordinate on four summits of quadrilateral can only be preserved.Preserve with the shape of quadrilateral the word that markup information not only can meet any direction, language, and be convenient to calculate.
Figure 10 a, Figure 10 b, Figure 10 c and Figure 10 d respectively illustrate according to an embodiment of the invention through the sample image with markup information of mark.Go out as shown in these figures, the character area in sample image can be marked with quadrilateral (in figure light quadrilateral), and this tab area is applicable to the direction of any font, languages and word.
Training module 1420 is also for generating the mask figure of sample image according to sample image and its markup information.In one embodiment, described mask figure comprises two-value mask figure.Particularly, for sample image I and corresponding markup information a, training module 1420 generates a width and sample image I mask figure of the same size, such as, and two-value mask figure R.Two-value mask figure R uses different pixel values to distinguish the character area of sample image and non-legible region.In one embodiment, for sample image I, use the character area that the pixel filling with the first pixel value marks, use the non-legible region of pixel filling with the second pixel value, thus generate described mask figure, wherein, the first pixel value is different with the second pixel value, to distinguish described character area and non-legible region.Such as, the pixel value of the character area (also namely using the interior zone of quadrilateral mark) marked is filled to be 255, but not the pixel value of character area is filled to be 0.
Training module 1420 is further used for utilizing sample image and its mask figure to build training set, and neural network training, to obtain semantic forecast model 1350.Particularly, the training sample set that original sample image is formed with the mask figure of its correspondence is S.S={ (I i, R i), i=1,2 ..., N, wherein I irepresent original sample image, R ifor original sample image I icorresponding mask figure, N is the number of sample image in training sample set S, and i is subscript.
In one embodiment, neural network can be full convolutional neural networks.Full convolutional neural networks is the special neural network of a class, and its feature is that all the comprising from being input to output can the layer of mathematic(al) parameter be all convolutional layer.Full convolutional neural networks avoids the complicated pre-service in early stage to image, and directly can input original image, it is specially adapted to the analyzing and processing to the image with complex background, and the text detection result of image can be made more accurate.
Training sample set S is inputted full convolutional neural networks and trains by training module 1420, to obtain semantic forecast model 1350.According to the present invention's specific embodiment, a full convolutional neural networks be made up of 13 layers can be adopted.Figure 12 shows the schematic diagram of this full convolutional neural networks.
Except comprising convolutional layer in this full convolutional neural networks, also comprise maximum pond layer.Maximum pond layer separates continuous print convolutional layer, and it effectively can reduce calculated amount, simultaneously the robustness of strength neural network.
This full convolutional neural networks be input as raw image data.As shown in figure 12, this full convolutional neural networks comprises first volume lamination and volume Two lamination, and the number of its median filter can be 64, and wave filter size can be 3x3.Volume Two lamination connects the first maximum pond layer.Next be the 3rd convolutional layer and Volume Four lamination, the number of its median filter can be 128, and wave filter size can be 3x3.Volume Four lamination connects the second maximum pond layer.Next be the 5th convolutional layer, the 6th convolutional layer and the 7th convolutional layer, the number of its median filter is 256, and wave filter size is 3x3.7th convolutional layer connects the 3rd maximum pond layer.Next be the 8th convolutional layer, the 9th convolutional layer and the tenth convolutional layer, the number of its median filter can be 512, and wave filter size can be 3x3.Tenth convolutional layer connects the 4th maximum pond layer.Next be the 11 convolutional layer, the 12 convolutional layer and the 13 convolutional layer, the number of its median filter can be 512, and wave filter size can be 3x3.
In the training process, be input in full convolutional neural networks by a sample image and corresponding mask figure, initial learn rate can be 0.00000001, often takes turns iteration through 10000, and learning rate reduces to original 1/10 at every turn.After iteration 100000 is taken turns, training process can stop.The full convolutional neural networks that training process obtains when stopping is the semantic forecast model of expectation.Via the described semantic forecast model trained, according to the semantic generating character area probability figure of image to be detected, thus the character area in image to be detected can be predicted.
One of ordinary skill in the art will appreciate that, although illustrate for the full convolutional neural networks of 13 layers above, the number of plies of full convolutional neural networks can be the Arbitrary Digit comprised between 6 to 19.The number of plies of this scope has weighed result of calculation accuracy and these two aspects of calculated amount.In addition, number and the size of wave filter recited above are also only example, and unrestricted.The number of such as wave filter can also be 100,500 or 1000 etc., and the size of wave filter can also be 1x1 or 5x5.
Multiple sample image neural network training is utilized and the semantic forecast model 1350 obtained can distinguish character area in image to be detected and non-legible region effectively through training module 1420.
Figure 15 shows the schematic block diagram of the text detection system 1500 according to the embodiment of the present invention.As shown in figure 15, text detection system 1500 comprise processor 1510, storer 1520 and in described storer 1520 store programmed instruction 1530.
Described programmed instruction 1530 can realize the function of each functional module of the text detection device according to the embodiment of the present invention when described processor 1510 runs, and/or can perform each step of the character detecting method according to the embodiment of the present invention.
Particularly, when described programmed instruction 1530 is run by described processor 1510, perform following steps: receive image to be detected; Via the character area probability graph of the full figure of image to be detected described in semantic forecast model generation, wherein, described character area probability graph uses different pixel values to distinguish the character area of described image to be detected and the non-legible region of described image to be detected; And cutting operation is carried out to described character area probability graph, to determine described character area.The semantic forecast model pixel be used in image to be detected according to the semantic forecast of image belongs to character area and still belongs to non-legible region.
In addition, when described programmed instruction 1530 is run by described processor 1510, also following steps are performed: receive original image; And pre-service is carried out to described original image, to obtain described image to be detected, wherein, described image to be detected has pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of described original image.
In addition, performed when described programmed instruction 1530 is run by described processor 1510 cutting operation is carried out to determine that the step of described character area comprises to described character area probability graph: binaryzation operation is carried out to described character area probability graph, to determine described character area.
In addition, performed when described programmed instruction 1530 is run by described processor 1510 binaryzation operation is carried out to determine that the step of described character area comprises to described character area probability graph: determine described binaryzation operate the profile of each connected region that obtains; And be quadrilateral by described contour fitting, wherein, described quadrilateral interior zone is described character area.
In addition, when described programmed instruction 1530 is run by described processor 1510, also following steps are performed: utilize multiple sample image neural network training, to obtain described semantic forecast model.
In addition, when described programmed instruction 1530 is run by described processor 1510, performed multiple sample image neural network training that utilizes comprises with the step obtaining described semantic forecast model: the markup information receiving described sample image and described sample image; The mask figure of described sample image is generated according to the markup information of described sample image and described sample image; And utilize described sample image and described mask figure to train described neural network, to obtain described semantic forecast model.
In addition, being run execution at described programmed instruction 1530 by described processor 1510 utilizes multiple sample image neural network training to obtain in the step of described semantic forecast model, described mask figure comprises two-value mask figure, and described two-value mask figure uses different pixel values to distinguish the character area of described sample image and non-legible region.
In addition, run execution at described programmed instruction 1530 by described processor 1510 and utilize multiple sample image neural network training to obtain in the step of described semantic forecast model, described neural network comprises full convolutional neural networks.
In addition, run execution at described programmed instruction 1530 by described processor 1510 and utilize multiple sample image neural network training to obtain in the step of described semantic forecast model, the number of plies of described full convolutional neural networks comprises the Arbitrary Digit between 6 to 19.
In addition, according to the embodiment of the present invention, additionally provide a kind of storage medium, store programmed instruction on said storage, when described programmed instruction is run by computing machine or processor for performing the corresponding steps of the character detecting method of the embodiment of the present invention, and for realizing according to the corresponding module in the text detection device of the embodiment of the present invention.Described storage medium such as can comprise the combination in any of the storage card of smart phone, the memory unit of panel computer, the hard disk of personal computer, ROM (read-only memory) (ROM), Erasable Programmable Read Only Memory EPROM (EPROM), portable compact disc ROM (read-only memory) (CD-ROM), USB storage or above-mentioned storage medium.Described computer-readable recording medium can be the combination in any of one or more computer-readable recording medium, such as a computer-readable recording medium comprises for neural network training to obtain the computer-readable program code of semantic forecast model, and another computer-readable recording medium comprises the computer-readable program code for carrying out text detection.
In one embodiment, described computer program instructions by each functional module of text detection device that can realize during computer run according to the embodiment of the present invention, and/or can perform the character detecting method according to the embodiment of the present invention.
In one embodiment, described computer program instructions by during computer run perform following steps: receive image to be detected; Via the character area probability graph of the full figure of image to be detected described in semantic forecast model generation, wherein, described character area probability graph uses different pixel values to distinguish the character area of described image to be detected and the non-legible region of described image to be detected; And cutting operation is carried out to described character area probability graph, to determine described character area.Described semantic forecast model belongs to character area for pixel in image to be detected according to the semantic forecast of image and still belongs to non-legible region.
In addition, described computer program instructions, being performed by during computer run, also performs following steps: receive original image; And pre-service is carried out to described original image, to obtain described image to be detected, wherein, described image to be detected has pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of described original image.
In addition, at described computer program instructions being carried out cutting operation to determine that the step of described character area comprises by performed during computer run to described character area probability graph: carry out binaryzation operation to described character area probability graph, to determine described character area.
In addition, at described computer program instructions being carried out binaryzation operation to determine that the step of described character area comprises by performed during computer run to described character area probability graph: determine described binaryzation operate the profile of each connected region that obtains; And be quadrilateral by described contour fitting, wherein, described quadrilateral interior zone is described character area.
In addition, at described computer program instructions when by computer run, also following steps are performed: utilize multiple sample image neural network training, to obtain described semantic forecast model.
In addition, at described computer program instructions when by computer run, performed multiple sample image neural network training that utilizes comprises with the step obtaining described semantic forecast model: the markup information receiving described sample image and described sample image; The mask figure of described sample image is generated according to the markup information of described sample image and described sample image; And utilize described sample image and described mask figure to train described neural network, to obtain described semantic forecast model.
In addition, described computer program instructions by during computer run perform utilize multiple sample image neural network training to obtain in the step of described semantic forecast model, described mask figure comprises two-value mask figure, and described two-value mask figure uses different pixel values to distinguish the character area of described sample image and non-legible region.
In addition, described computer program instructions by during computer run perform utilize multiple sample image neural network training to obtain in the step of described semantic forecast model, described neural network comprises full convolutional neural networks.
In addition, described computer program instructions by during computer run perform utilize multiple sample image neural network training to obtain in the step of described semantic forecast model, the number of plies of described full convolutional neural networks comprises the Arbitrary Digit between 6 to 19.
Those of ordinary skill in the art, by reading above about the detailed description of character detecting method, can understand above-mentioned text detection device, the structure of system, realization and advantage, therefore repeat no more here.
In instructions provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that, except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or device or unit.Unless expressly stated otherwise, each feature disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can use with arbitrary array mode.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize according to some modules in the text detection device of the embodiment of the present invention.The present invention can also be embodied as part or all the device program (such as, computer program and computer program) for performing method as described herein.Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.

Claims (20)

1. a character detecting method, comprising:
Receive image to be detected;
Via the character area probability graph of the full figure of image to be detected described in semantic forecast model generation, wherein, described character area probability graph uses different pixel values to distinguish the character area of described image to be detected and the non-legible region of described image to be detected; And
Cutting operation is carried out to described character area probability graph, to determine described character area.
2. the method for claim 1, also comprises:
Receive original image; And
Pre-service is carried out to described original image, to obtain described image to be detected,
Wherein, described image to be detected has pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of described original image.
3. method according to claim 1, wherein, carries out cutting operation to described character area probability graph, to determine that described character area comprises:
Binaryzation operation is carried out to described character area probability graph, to determine described character area.
4. method as claimed in claim 3, wherein, binaryzation operation is carried out to described character area probability graph, to determine that described character area comprises:
Determine that described binaryzation operates the profile of each connected region obtained; And
Be quadrilateral by described contour fitting, wherein, described quadrilateral interior zone is described character area.
5. the method for claim 1, also comprises:
Utilize multiple sample image neural network training, to obtain described semantic forecast model.
6. method as claimed in claim 5, wherein, utilizes multiple sample image neural network training, comprises to obtain described semantic forecast model:
Receive the markup information of described sample image and described sample image;
The mask figure of described sample image is generated according to the markup information of described sample image and described sample image; And
Described sample image and described mask figure is utilized to train described neural network, to obtain described semantic forecast model.
7. method as claimed in claim 6, wherein, described mask figure comprises two-value mask figure, and described two-value mask figure uses different pixel values to distinguish the character area of described sample image and non-legible region.
8. method as claimed in claim 5, wherein, described neural network comprises full convolutional neural networks.
9. method as claimed in claim 8, wherein, the number of plies of described full convolutional neural networks comprises the Arbitrary Digit between 6 to 19.
10. the method as described in any one of claim 1 to 9, wherein, the described semantic forecast model pixel be used in image to be detected according to the semantic forecast of described image to be detected belongs to character area and still belongs to non-legible region.
11. 1 kinds of text detection devices, comprising:
Semantic module, for receiving image to be detected, and use semantic forecast model to generate the character area probability graph of the full figure of described image to be detected, wherein, described character area probability graph uses different pixel values to distinguish the character area of described image to be detected and the non-legible region of described image to be detected; And
Segmentation module, for carrying out cutting operation to described character area probability graph, to determine described character area.
12. text detection devices as claimed in claim 11, described device comprises further:
Image pre-processing module, for receiving original image, and carries out pre-service to described original image, to obtain described image to be detected,
Wherein, described image to be detected has pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of described original image.
13. text detection devices according to claim 11, wherein, described segmentation module is further used for carrying out binaryzation operation to described character area probability graph, to determine described character area.
14. text detection devices as claimed in claim 13, wherein, described segmentation module is further used for determining that described binaryzation operates the profile of each connected region obtained, and is quadrilateral by described contour fitting, wherein, described quadrilateral interior zone is described character area.
15. text detection devices as claimed in claim 11, described device also comprises:
Training module, is connected to described semantic module, for utilizing multiple sample image neural network training, to obtain described semantic forecast model.
16. text detection devices as claimed in claim 15, wherein, described training module is further used for the markup information receiving described sample image and described sample image, the mask figure of described sample image is generated according to the markup information of described sample image and described sample image, and utilize described sample image and described mask figure to train described neural network, to obtain described semantic forecast model.
17. text detection devices as claimed in claim 16, wherein, described mask figure comprises two-value mask figure, and described two-value mask figure uses different pixel values to distinguish the character area of described sample image and non-legible region.
18. text detection devices as claimed in claim 15, wherein, described neural network comprises full convolutional neural networks.
19. text detection devices as claimed in claim 18, wherein, the number of plies of described full convolutional neural networks comprises the Arbitrary Digit between 6 to 19.
20. text detection devices as described in any one of claim 11 to 19, wherein, the semantic forecast model pixel be used in image to be detected according to the semantic forecast of described image to be detected belongs to character area and still belongs to non-legible region.
CN201510970839.2A 2015-12-22 2015-12-22 Character detecting method and device Active CN105574513B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510970839.2A CN105574513B (en) 2015-12-22 2015-12-22 Character detecting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510970839.2A CN105574513B (en) 2015-12-22 2015-12-22 Character detecting method and device

Publications (2)

Publication Number Publication Date
CN105574513A true CN105574513A (en) 2016-05-11
CN105574513B CN105574513B (en) 2017-11-24

Family

ID=55884621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510970839.2A Active CN105574513B (en) 2015-12-22 2015-12-22 Character detecting method and device

Country Status (1)

Country Link
CN (1) CN105574513B (en)

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295629A (en) * 2016-07-15 2017-01-04 北京市商汤科技开发有限公司 Structured text detection method and system
CN106778928A (en) * 2016-12-21 2017-05-31 广州华多网络科技有限公司 Image processing method and device
CN106897732A (en) * 2017-01-06 2017-06-27 华中科技大学 Multi-direction Method for text detection in a kind of natural picture based on connection word section
CN107025457A (en) * 2017-03-29 2017-08-08 腾讯科技(深圳)有限公司 A kind of image processing method and device
CN107633527A (en) * 2016-07-19 2018-01-26 北京图森未来科技有限公司 Target tracking method and device based on full convolutional neural networks
CN107886093A (en) * 2017-11-07 2018-04-06 广东工业大学 A kind of character detection method, system, equipment and computer-readable storage medium
CN108108731A (en) * 2016-11-25 2018-06-01 中移(杭州)信息技术有限公司 Method for text detection and device based on generated data
CN108197623A (en) * 2018-01-19 2018-06-22 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN108229575A (en) * 2018-01-19 2018-06-29 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN108305262A (en) * 2017-11-22 2018-07-20 腾讯科技(深圳)有限公司 File scanning method, device and equipment
CN108304814A (en) * 2018-02-08 2018-07-20 海南云江科技有限公司 A kind of construction method and computing device of literal type detection model
CN108427950A (en) * 2018-02-01 2018-08-21 北京捷通华声科技股份有限公司 A kind of literal line detection method and device
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN108717542A (en) * 2018-04-23 2018-10-30 北京小米移动软件有限公司 Identify the method, apparatus and computer readable storage medium of character area
CN108830827A (en) * 2017-05-02 2018-11-16 通用电气公司 Neural metwork training image generation system
CN108921158A (en) * 2018-06-14 2018-11-30 众安信息技术服务有限公司 Method for correcting image, device and computer readable storage medium
CN108989793A (en) * 2018-07-20 2018-12-11 深圳市华星光电技术有限公司 A kind of detection method and detection device of text pixel
CN109040824A (en) * 2018-08-28 2018-12-18 百度在线网络技术(北京)有限公司 Method for processing video frequency, device, electronic equipment and readable storage medium storing program for executing
WO2018232592A1 (en) * 2017-06-20 2018-12-27 Microsoft Technology Licensing, Llc. Fully convolutional instance-aware semantic segmentation
CN109389116A (en) * 2017-08-14 2019-02-26 高德软件有限公司 A kind of character detection method and device
CN109410211A (en) * 2017-08-18 2019-03-01 北京猎户星空科技有限公司 The dividing method and device of target object in a kind of image
CN109492638A (en) * 2018-11-07 2019-03-19 北京旷视科技有限公司 Method for text detection, device and electronic equipment
CN109685055A (en) * 2018-12-26 2019-04-26 北京金山数字娱乐科技有限公司 Text filed detection method and device in a kind of image
CN109961553A (en) * 2017-12-26 2019-07-02 航天信息股份有限公司 Invoice number recognition methods, device and tax administration self-service terminal system
CN110059685A (en) * 2019-04-26 2019-07-26 腾讯科技(深圳)有限公司 Word area detection method, apparatus and storage medium
CN110110777A (en) * 2019-04-28 2019-08-09 网易有道信息技术(北京)有限公司 Image processing method and training method and device, medium and calculating equipment
CN110119742A (en) * 2019-04-25 2019-08-13 添维信息科技(天津)有限公司 A kind of recognition methods of container number, device and mobile terminal
CN110458162A (en) * 2019-07-25 2019-11-15 上海兑观信息科技技术有限公司 A kind of method of intelligent extraction pictograph information
CN110503159A (en) * 2019-08-28 2019-11-26 北京达佳互联信息技术有限公司 Character recognition method, device, equipment and medium
CN110503103A (en) * 2019-08-28 2019-11-26 上海海事大学 A kind of character cutting method in line of text based on full convolutional neural networks
WO2019232853A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Chinese model training method, chinese image recognition method, device, apparatus and medium
CN110807454A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Character positioning method, device and equipment based on image segmentation and storage medium
CN111242120A (en) * 2020-01-03 2020-06-05 中国科学技术大学 Character detection method and system
CN111259878A (en) * 2018-11-30 2020-06-09 中移(杭州)信息技术有限公司 Method and equipment for detecting text
CN111626283A (en) * 2020-05-20 2020-09-04 北京字节跳动网络技术有限公司 Character extraction method and device and electronic equipment
CN111723815A (en) * 2020-06-23 2020-09-29 中国工商银行股份有限公司 Model training method, image processing method, device, computer system, and medium
CN111753727A (en) * 2020-06-24 2020-10-09 北京百度网讯科技有限公司 Method, device, equipment and readable storage medium for extracting structured information
CN111753836A (en) * 2019-08-27 2020-10-09 北京京东尚科信息技术有限公司 Character recognition method and device, computer readable medium and electronic equipment
CN111767921A (en) * 2020-06-30 2020-10-13 上海媒智科技有限公司 Express bill positioning and correcting method and device
CN112001406A (en) * 2019-05-27 2020-11-27 杭州海康威视数字技术股份有限公司 Text region detection method and device
CN112789623A (en) * 2018-11-16 2021-05-11 北京比特大陆科技有限公司 Text detection method, device and storage medium
CN112801911A (en) * 2021-02-08 2021-05-14 苏州长嘴鱼软件有限公司 Method and device for removing Chinese character noise in natural image and storage medium
DE102019134387A1 (en) * 2019-12-13 2021-06-17 Beckhoff Automation Gmbh Process for real-time optical character recognition in an automation system and automation system
JP2022501719A (en) * 2018-09-21 2022-01-06 ネイバー コーポレーションNAVER Corporation Character detection device, character detection method and character detection system
CN114067192A (en) * 2022-01-07 2022-02-18 北京许先网科技发展有限公司 Character recognition method and system
CN114078108A (en) * 2020-08-11 2022-02-22 天津拓影科技有限公司 Method and device for processing abnormal area in image and method and device for image segmentation
CN114495129A (en) * 2022-04-18 2022-05-13 阿里巴巴(中国)有限公司 Character detection model pre-training method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100269102B1 (en) * 1994-06-24 2000-10-16 윤종용 Numeric character recognition with neural network
CN103745213A (en) * 2014-02-28 2014-04-23 中国人民解放军63680部队 Optical character recognition method based on LVQ neural network
CN104899586A (en) * 2014-03-03 2015-09-09 阿里巴巴集团控股有限公司 Method for recognizing character contents included in image and device thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100269102B1 (en) * 1994-06-24 2000-10-16 윤종용 Numeric character recognition with neural network
CN103745213A (en) * 2014-02-28 2014-04-23 中国人民解放军63680部队 Optical character recognition method based on LVQ neural network
CN104899586A (en) * 2014-03-03 2015-09-09 阿里巴巴集团控股有限公司 Method for recognizing character contents included in image and device thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
周东傲: "图像中文字检测技术研究与实现", 《国防科学技术大学》 *
李英等: "一种图像中的文字区域检测新方法", 《西安电子科技大学学报(自然科学版)》 *
鲍胜利: "基于多算法集成和神经网络的汉字识别系统的研究", 《四川大学》 *

Cited By (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295629A (en) * 2016-07-15 2017-01-04 北京市商汤科技开发有限公司 Structured text detection method and system
US10937166B2 (en) 2016-07-15 2021-03-02 Beijing Sensetime Technology Development Co., Ltd. Methods and systems for structured text detection, and non-transitory computer-readable medium
CN106295629B (en) * 2016-07-15 2018-06-15 北京市商汤科技开发有限公司 structured text detection method and system
CN107633527B (en) * 2016-07-19 2020-07-07 北京图森未来科技有限公司 Target tracking method and device based on full convolution neural network
CN107633527A (en) * 2016-07-19 2018-01-26 北京图森未来科技有限公司 Target tracking method and device based on full convolutional neural networks
CN108108731A (en) * 2016-11-25 2018-06-01 中移(杭州)信息技术有限公司 Method for text detection and device based on generated data
CN106778928A (en) * 2016-12-21 2017-05-31 广州华多网络科技有限公司 Image processing method and device
CN106897732A (en) * 2017-01-06 2017-06-27 华中科技大学 Multi-direction Method for text detection in a kind of natural picture based on connection word section
CN107025457A (en) * 2017-03-29 2017-08-08 腾讯科技(深圳)有限公司 A kind of image processing method and device
CN107025457B (en) * 2017-03-29 2022-03-08 腾讯科技(深圳)有限公司 Image processing method and device
WO2018177237A1 (en) * 2017-03-29 2018-10-04 腾讯科技(深圳)有限公司 Image processing method and device, and storage medium
CN108830827A (en) * 2017-05-02 2018-11-16 通用电气公司 Neural metwork training image generation system
WO2018232592A1 (en) * 2017-06-20 2018-12-27 Microsoft Technology Licensing, Llc. Fully convolutional instance-aware semantic segmentation
CN109389116A (en) * 2017-08-14 2019-02-26 高德软件有限公司 A kind of character detection method and device
CN109410211A (en) * 2017-08-18 2019-03-01 北京猎户星空科技有限公司 The dividing method and device of target object in a kind of image
CN107886093A (en) * 2017-11-07 2018-04-06 广东工业大学 A kind of character detection method, system, equipment and computer-readable storage medium
CN108305262A (en) * 2017-11-22 2018-07-20 腾讯科技(深圳)有限公司 File scanning method, device and equipment
CN109961553A (en) * 2017-12-26 2019-07-02 航天信息股份有限公司 Invoice number recognition methods, device and tax administration self-service terminal system
CN108229575A (en) * 2018-01-19 2018-06-29 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN108197623A (en) * 2018-01-19 2018-06-22 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN108427950B (en) * 2018-02-01 2021-02-19 北京捷通华声科技股份有限公司 Character line detection method and device
CN108427950A (en) * 2018-02-01 2018-08-21 北京捷通华声科技股份有限公司 A kind of literal line detection method and device
CN108304814A (en) * 2018-02-08 2018-07-20 海南云江科技有限公司 A kind of construction method and computing device of literal type detection model
CN108304814B (en) * 2018-02-08 2020-07-14 海南云江科技有限公司 Method for constructing character type detection model and computing equipment
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN108717542A (en) * 2018-04-23 2018-10-30 北京小米移动软件有限公司 Identify the method, apparatus and computer readable storage medium of character area
CN108717542B (en) * 2018-04-23 2020-09-15 北京小米移动软件有限公司 Method and device for recognizing character area and computer readable storage medium
WO2019232853A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Chinese model training method, chinese image recognition method, device, apparatus and medium
CN108921158A (en) * 2018-06-14 2018-11-30 众安信息技术服务有限公司 Method for correcting image, device and computer readable storage medium
CN108989793A (en) * 2018-07-20 2018-12-11 深圳市华星光电技术有限公司 A kind of detection method and detection device of text pixel
CN109040824A (en) * 2018-08-28 2018-12-18 百度在线网络技术(北京)有限公司 Method for processing video frequency, device, electronic equipment and readable storage medium storing program for executing
JP2022501719A (en) * 2018-09-21 2022-01-06 ネイバー コーポレーションNAVER Corporation Character detection device, character detection method and character detection system
JP7198350B2 (en) 2018-09-21 2022-12-28 ネイバー コーポレーション CHARACTER DETECTION DEVICE, CHARACTER DETECTION METHOD AND CHARACTER DETECTION SYSTEM
CN109492638A (en) * 2018-11-07 2019-03-19 北京旷视科技有限公司 Method for text detection, device and electronic equipment
CN112789623A (en) * 2018-11-16 2021-05-11 北京比特大陆科技有限公司 Text detection method, device and storage medium
CN111259878A (en) * 2018-11-30 2020-06-09 中移(杭州)信息技术有限公司 Method and equipment for detecting text
CN109685055B (en) * 2018-12-26 2021-11-12 北京金山数字娱乐科技有限公司 Method and device for detecting text area in image
CN109685055A (en) * 2018-12-26 2019-04-26 北京金山数字娱乐科技有限公司 Text filed detection method and device in a kind of image
CN110119742B (en) * 2019-04-25 2023-07-07 添维信息科技(天津)有限公司 Container number identification method and device and mobile terminal
CN110119742A (en) * 2019-04-25 2019-08-13 添维信息科技(天津)有限公司 A kind of recognition methods of container number, device and mobile terminal
CN110059685B (en) * 2019-04-26 2022-10-21 腾讯科技(深圳)有限公司 Character area detection method, device and storage medium
CN110059685A (en) * 2019-04-26 2019-07-26 腾讯科技(深圳)有限公司 Word area detection method, apparatus and storage medium
CN110110777A (en) * 2019-04-28 2019-08-09 网易有道信息技术(北京)有限公司 Image processing method and training method and device, medium and calculating equipment
CN112001406B (en) * 2019-05-27 2023-09-08 杭州海康威视数字技术股份有限公司 Text region detection method and device
CN112001406A (en) * 2019-05-27 2020-11-27 杭州海康威视数字技术股份有限公司 Text region detection method and device
CN110458162A (en) * 2019-07-25 2019-11-15 上海兑观信息科技技术有限公司 A kind of method of intelligent extraction pictograph information
CN111753836A (en) * 2019-08-27 2020-10-09 北京京东尚科信息技术有限公司 Character recognition method and device, computer readable medium and electronic equipment
CN110503103A (en) * 2019-08-28 2019-11-26 上海海事大学 A kind of character cutting method in line of text based on full convolutional neural networks
CN110503103B (en) * 2019-08-28 2023-04-07 上海海事大学 Character segmentation method in text line based on full convolution neural network
CN110503159A (en) * 2019-08-28 2019-11-26 北京达佳互联信息技术有限公司 Character recognition method, device, equipment and medium
CN110807454A (en) * 2019-09-19 2020-02-18 平安科技(深圳)有限公司 Character positioning method, device and equipment based on image segmentation and storage medium
DE102019134387A1 (en) * 2019-12-13 2021-06-17 Beckhoff Automation Gmbh Process for real-time optical character recognition in an automation system and automation system
CN111242120A (en) * 2020-01-03 2020-06-05 中国科学技术大学 Character detection method and system
CN111242120B (en) * 2020-01-03 2022-07-29 中国科学技术大学 Character detection method and system
CN111626283B (en) * 2020-05-20 2022-12-13 北京字节跳动网络技术有限公司 Character extraction method and device and electronic equipment
CN111626283A (en) * 2020-05-20 2020-09-04 北京字节跳动网络技术有限公司 Character extraction method and device and electronic equipment
CN111723815A (en) * 2020-06-23 2020-09-29 中国工商银行股份有限公司 Model training method, image processing method, device, computer system, and medium
CN111753727B (en) * 2020-06-24 2023-06-23 北京百度网讯科技有限公司 Method, apparatus, device and readable storage medium for extracting structured information
CN111753727A (en) * 2020-06-24 2020-10-09 北京百度网讯科技有限公司 Method, device, equipment and readable storage medium for extracting structured information
CN111767921A (en) * 2020-06-30 2020-10-13 上海媒智科技有限公司 Express bill positioning and correcting method and device
CN114078108A (en) * 2020-08-11 2022-02-22 天津拓影科技有限公司 Method and device for processing abnormal area in image and method and device for image segmentation
CN114078108B (en) * 2020-08-11 2023-12-22 北京阅影科技有限公司 Method and device for processing abnormal region in image, and method and device for dividing image
CN112801911A (en) * 2021-02-08 2021-05-14 苏州长嘴鱼软件有限公司 Method and device for removing Chinese character noise in natural image and storage medium
CN112801911B (en) * 2021-02-08 2024-03-26 苏州长嘴鱼软件有限公司 Method and device for removing text noise in natural image and storage medium
CN114067192A (en) * 2022-01-07 2022-02-18 北京许先网科技发展有限公司 Character recognition method and system
CN114495129A (en) * 2022-04-18 2022-05-13 阿里巴巴(中国)有限公司 Character detection model pre-training method and device
CN114495129B (en) * 2022-04-18 2022-09-09 阿里巴巴(中国)有限公司 Character detection model pre-training method and device

Also Published As

Publication number Publication date
CN105574513B (en) 2017-11-24

Similar Documents

Publication Publication Date Title
CN105574513A (en) Character detection method and device
CN107545262B (en) Method and device for detecting text in natural scene image
CN108885699A (en) Character identifying method, device, storage medium and electronic equipment
CN111257341B (en) Underwater building crack detection method based on multi-scale features and stacked full convolution network
US9965695B1 (en) Document image binarization method based on content type separation
CN109685065B (en) Layout analysis method and system for automatically classifying test paper contents
CN111274957A (en) Webpage verification code identification method, device, terminal and computer storage medium
US20210081695A1 (en) Image processing method, apparatus, electronic device and computer readable storage medium
US11600088B2 (en) Utilizing machine learning and image filtering techniques to detect and analyze handwritten text
CN113191358B (en) Metal part surface text detection method and system
CN108268641A (en) Invoice information recognition methods and invoice information identification device, equipment and storage medium
CN110689134A (en) Method, apparatus, device and storage medium for performing machine learning process
CN111932577A (en) Text detection method, electronic device and computer readable medium
CN116071294A (en) Optical fiber surface defect detection method and device
Li et al. Gated auxiliary edge detection task for road extraction with weight-balanced loss
CN103824257A (en) Two-dimensional code image preprocessing method
CN116311214A (en) License plate recognition method and device
CN112052907A (en) Target detection method and device based on image edge information and storage medium
CN107886093B (en) Character detection method, system, equipment and computer storage medium
Choodowicz et al. Hybrid algorithm for the detection and recognition of railway signs
KR102026280B1 (en) Method and system for scene text detection using deep learning
CN112215229B (en) License plate recognition method and device based on lightweight network end-to-end
CN111402185A (en) Image detection method and device
CN112801960B (en) Image processing method and device, storage medium and electronic equipment
CN115063826A (en) Mobile terminal driver license identification method and system based on deep learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100190 Beijing, Haidian District Academy of Sciences, South Road, No. 2, block A, No. 313

Applicant after: MEGVII INC.

Applicant after: Beijing maigewei Technology Co., Ltd.

Address before: 100190 Beijing, Haidian District Academy of Sciences, South Road, No. 2, block A, No. 313

Applicant before: MEGVII INC.

Applicant before: Beijing aperture Science and Technology Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant