CN105574513B - Character detecting method and device - Google Patents

Character detecting method and device Download PDF

Info

Publication number
CN105574513B
CN105574513B CN201510970839.2A CN201510970839A CN105574513B CN 105574513 B CN105574513 B CN 105574513B CN 201510970839 A CN201510970839 A CN 201510970839A CN 105574513 B CN105574513 B CN 105574513B
Authority
CN
China
Prior art keywords
image
character area
detected
probability graph
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510970839.2A
Other languages
Chinese (zh)
Other versions
CN105574513A (en
Inventor
姚聪
周舒畅
周昕宇
印奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Beijing Maigewei Technology Co Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Beijing Maigewei Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd, Beijing Maigewei Technology Co Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN201510970839.2A priority Critical patent/CN105574513B/en
Publication of CN105574513A publication Critical patent/CN105574513A/en
Application granted granted Critical
Publication of CN105574513B publication Critical patent/CN105574513B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/274Syntactic or semantic context, e.g. balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of character detecting method and device.The character detecting method includes:Receive image to be detected;The character area probability graph of the full figure of described image to be detected is generated via semantic forecast model, wherein, the character area probability graph distinguishes the character area of described image to be detected and the non-legible region of described image to be detected using different pixel values;The semantic forecast model is neutral net;And cutting operation is carried out to the character area probability graph, to determine the character area.Above-mentioned character detecting method and device while the interference of complex background is effectively suppressed, can detect the word of different language, direction, color, font and size, wide adaptation range.In addition, the character detecting method and device have the characteristics of strong robustness, can successfully manage picture noise, image be fuzzy, in image the factor such as complex background, inhomogeneous illumination interference.

Description

Character detecting method and device
Technical field
The present invention relates to image processing field, and in particular to a kind of character detecting method and device.
Background technology
Developed rapidly with the widely available and mobile Internet of smart mobile phone, pass through the shooting of the mobile terminals such as mobile phone Head obtains, retrieves and share information and progressively turn into a kind of life style.(Camera-based's) based on camera should With the understanding more emphasized to photographed scene.Generally, often more paid close attention to first in word and other objects and the scene deposited, user Text information in scene, thus the word in correct identification image user is shot be intended to have deeper into understanding.This is just Text detection technology be relate to identify the character area in shooting image.
The text detection basic technology important as one, there is huge application value and wide application prospect, it is special It is not the text detection of natural scene image.For example, the text detection technology of natural scene image may be directly applied to enhancing now The fields such as reality, geo-location, man-machine interaction, robot navigation, autonomous driving vehicle and industrial automation.
However, include more complicated background mostly in image to be detected, and its quality may be by noise, fuzzy, non-equal The influence of the factors such as even illumination;In addition, word has diversity, such as, the word in natural scene image may have difference Color, size, font and direction etc..These factors can all bring huge difficulty and challenge to text detection.Based on above-mentioned Reason, existing character detecting method easily produce false-alarm (false alarm), also i.e. by the non-legible composition mistake in background Ground is determined as word.In addition, existing character detecting method in terms of adaptability there is also weak point, for example, major part side Method can only detection level direction word, it is then helpless for the word that tilts or rotates.In another example some methods are merely able to Detected applied to Chinese, can not directly be generalized to the word of different classes of language (such as English, Russian, Korean).And when figure When serious noise, fuzzy or inhomogeneous illumination as in be present, existing character detecting method often produces mistake again.Always It, existing character detecting method and system are in precision and the scope of application etc. existing defects.
The content of the invention
In view of the above problems, it is proposed that the present invention is to provide a kind of text detection to solve the above problems at least in part Method and apparatus.
According to one aspect of the invention, there is provided a kind of character detecting method, including:
Receive the markup information of multiple sample images and the sample image;According to the sample image and the sample graph The markup information of picture generates the mask figure of the sample image;Utilize the sample image and mask figure training nerve net Network, to obtain semantic forecast model;Receive image to be detected;Described image to be detected is generated via the semantic forecast model The character area probability graph of full figure, wherein, the character area probability graph distinguishes the mapping to be checked using different pixel values The non-legible region of the character area of picture and described image to be detected;And segmentation behaviour is carried out to the character area probability graph Make, to determine the character area.
According to a further aspect of the invention, a kind of text detection device, including training module, semantic module are additionally provided With segmentation module.Training module is used for the markup information for receiving multiple sample images and the sample image, according to the sample The markup information of image and the sample image generates the mask figure of the sample image, and utilizes the sample image and institute Mask figure training neutral net is stated, to obtain semantic forecast model.Semantic module is used to receive image to be detected, and uses The semantic forecast model to generate the character area probability graph of the full figure of described image to be detected, wherein, the character area Probability graph distinguishes the character area of described image to be detected and the non-legible area of described image to be detected using different pixel values Domain.Split module to be used to carry out cutting operation to the character area probability graph, to determine the character area.
In above-mentioned character detecting method and device, support directly to carry out the full figure of image to be detected text detection, it is different In split based on simple threshold values, the algorithm of sliding window or connected component.It can effectively suppress the same of the interference of complex background When, detection different language, direction, color, the word of font and size, wide adaptation range.In addition, the character detecting method and dress The characteristics of putting with strong robustness, picture noise can be successfully managed, image obscures, complex background, inhomogeneous illumination in image Etc. the interference of factor.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the embodiment of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, it is various other the advantages of and benefit it is common for this area Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 a and Fig. 1 b schematically illustrate image to be detected according to an embodiment of the invention and after testing respectively Image;
Fig. 2 schematically illustrates the flow chart of character detecting method according to an embodiment of the invention;
Fig. 3 a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b schematically illustrate according to this hair respectively The character area probability graph of the full figure of image to be detected of bright embodiment and its corresponding generation.
Fig. 7 schematically illustrates the flow chart of the method for acquisition image to be detected according to an embodiment of the invention;
Fig. 8 is schematically illustrated according to an embodiment of the invention carries out cutting operation to character area probability graph The flow chart of method;
Fig. 9 schematically illustrates the flow chart of the method for training neutral net according to an embodiment of the invention;
Figure 10 a, Figure 10 b, Figure 10 c and Figure 10 d, which are respectively illustrated, according to an embodiment of the invention has markup information Sample image;
Figure 11 a and Figure 11 b respectively illustrate the sample image according to an embodiment of the invention with markup information and Its corresponding mask artwork;
Figure 12 schematically illustrates the schematic diagram of full convolutional neural networks according to an embodiment of the invention;
Figure 13 schematically illustrates the schematic block diagram of text detection device according to an embodiment of the invention;
Figure 14 schematically illustrates the schematic block diagram of text detection device in accordance with another embodiment of the present invention;With And
Figure 15 schematically illustrates the schematic block diagram of text detection system according to an embodiment of the invention.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although the disclosure is shown in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Completely it is communicated to those skilled in the art.
For character area in more reasonably automatic identification image, the invention provides a kind of character detecting method.Fig. 1 a Image to be detected according to an embodiment of the invention and after testing image are schematically illustrated respectively with Fig. 1 b.Fig. 2 is shown The flow chart of character detecting method 200 according to an embodiment of the invention.As shown in Fig. 2 this method 200 includes step S210 to step S230.
In step S210, image to be detected is received.Image to be detected can be original image or to original graph As the image obtained after being pre-processed.In one embodiment of the invention, can be by entering to the original image collected Row pretreatment obtains described image to be detected.Carried out below in conjunction with the method that specific accompanying drawing pre-processes to described image detailed Description.
In step S220, the character area probability of the full figure of described image to be detected is generated via semantic forecast model Figure, wherein, the character area probability graph distinguishes the character area of described image to be detected and described using different pixel value The non-legible region of image to be detected.According to one embodiment of present invention, character area refers to the area that word is included in image Domain.By taking Fig. 1 a and Fig. 1 b as an example, the region in Fig. 1 b inside two black quadrangles is character area.In first character area In, comprising word " I is growing ", in second character area, include word " me please not stepped on ".
In one embodiment, character area probability graph represents that different probability is described to distinguish using different pixel values The non-legible region of the character area of image to be detected and described image to be detected.In one embodiment, the pixel value of image The probability that higher expression pixel region belongs to character area is higher, and the pixel value of image is more low, represents pixel place The probability that region belongs to character area is lower.Such as the black picture element that pixel value is 0 represents that the pixel region belongs to word The probability in region is 0, and the white pixel that pixel value is 255 represents that the probability that the pixel region belongs to character area is 100%.
According to one embodiment of present invention, the full figure of image to be detected is general via semantic forecast model generation character area Rate figure.Semantic forecast model is used for the semantic generation character area probability graph according to image to be detected, to predict image to be detected In pixel belong to character area and still fall within non-legible region.Image, semantic is the high-level characteristic of image, although it is with image Color, texture, based on shape etc. low-level image feature, it is but dramatically different with these low-level image features.Image, semantic, which is used as, to be known Know information basic description carrier, complete picture material can be converted into can intuitivism apprehension class text language performance, scheming As playing vital effect in understanding.Image understanding input is view data, and output is knowledge, and it belongs to image and ground Study carefully the High-level content in field.Semantic forecast model can realize image understanding, and it directly can identify image according to image, semantic In character area, this is dramatically different with each model based on Threshold segmentation image.It is right that semantic forecast model can be based on its The understanding of image to be detected, it is to be checked so as to predict according to the semanteme of image to be detected, generation more preferably character area probability graph Pixel in altimetric image belongs to character area and still falls within non-legible region, to obtain more reasonably character area.
The semantic forecast model can be by training neutral net to obtain.Neutral net can be used for according to substantial amounts of input Estimate the unknown approximate function of in general.Neutral net can machine learning, there is stronger self-adaptive property.Trained god An arbitrary function can be approached through network, it can be from given data " study ".Thus, neutral net be highly suitable for by Training is used as semantic forecast model, to identify the character area in image to be detected.Hereinafter in connection with Fig. 9 to Figure 12 to instruction Practice neutral net acquisition semantic forecast model to be described in detail.
Fig. 3 a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b are according to an embodiment of the invention respectively The full figure of image to be detected and the corresponding character area probability graph generated via semantic forecast model.Fig. 3 a, Fig. 4 a, Fig. 5 a and Fig. 6 a can be the full figure of image to be detected, also, contain character area on image, for example, in the character area on Fig. 3 a Comprising be Chinese;The character area included on Fig. 4 a images includes Chinese and English, and as shown in fig. 4 a, the text in Fig. 4 a The direction in block domain is non-horizontal;Character area on Fig. 5 a image includes Russian;The character area of Fig. 6 a images includes Korean. And it is possible to find out, Fig. 3 a, Fig. 4 a, Fig. 5 a and Fig. 6 a image have different backgrounds, and background is more complicated;On also, The word stated in image also has diversity, such as these words have the information such as different colors, font, languages and size. Fig. 3 b, Fig. 4 b, Fig. 5 b and Fig. 6 b are shown respectively Fig. 3 a, Fig. 4 a, Fig. 5 a and Fig. 6 a full figure of image to be detected and pass through language The character area probability graph generated after adopted forecast model.The character area probability graph of generation is represented using different pixel values Different probability is to distinguish the character area of image to be detected and non-legible region.For example, the pixel filling using pixel value 255 Character area, represent that the region belongs to the probability highest of character area, use the non-legible region (example of the pixel filling of pixel value 0 Such as, background area), represent the region belong to character area probability it is minimum, so as to distinguish the literal field in image to be detected Domain and non-legible region.By taking Fig. 4 b as an example, Fig. 4 b character area probability graph distinguishes image to be detected using different pixel values Character area and non-legible region in 4a.For example, using in the pixel filling image to be detected 4a for being 255 with pixel value Two character areas, " unauthorized No Admittance " and " Authorized PersonnelOnly ", so as to obtain as shown in Figure 4 b The character area probability graph gone out, also, Fig. 4 b character area probability graph is also shown in image to be detected 4a complete and accurate Character area direction.
In step S230, the character area probability graph generated to step S220 carries out cutting operation, to determine word Region.Because the numerical value of the pixel in character area probability graph can represent that the pixel region belongs to the general of character area Rate, so as to distinguish character area and non-legible region, it is possible to according to low-level image feature (such as gray scale of image) to literal field Domain probability is split.
For example, step S230 can be by carrying out binarization operation to obtain character area to character area probability graph. In the present invention, due to it is expected to distinguish character area and non-legible region (background area), so can be real using binarization operation The now purpose.Binarization operation realizes that simply amount of calculation is few and speed is fast.
Binarization operation can be Threshold segmentation operation.Alternatively, threshold value T is adjustable parameter.If gray value 255 represents The probability for belonging to character area is 100%, and gray value 0 represents that the probability for belonging to character area is 0, then can set threshold value For 128.
Binarization operation can also be the cutting operation increased based on region.Region growing methods are according to same object areas The similar quality of pixel assembles the method for pixel in domain.Specifically, from prime area (for example, picture in character area probability graph Element is worth larger pixel) start, by the adjacent pixel with same property (poor smaller with the pixel value of current pixel) It is integrated into current region so as to progressively growth region, until without can be untill the pixel of merger.
It is considered that the less region of average pixel value is non-legible region in the image that is obtained after segmentation, other regions For character area.Determine that the character area is described in detail to binarization operation below in conjunction with specific accompanying drawing.
It will appreciated by the skilled person that the above method 200 has universality.It can be used for any image Text detection.This method 200 can be directed to file and picture and carry out text detection and identification, file and picture such as certificate and bill The scanned copy etc. of photo, paper document.This method 200 can also be directed to natural scene image and carry out text detection and identification.
The above method 200 of the present invention has abandoned the detection mode based on sliding window and the detection side based on connected component Formula, employ the brand-new detection mode based on semantic segmentation.This method 200 can realize that full figure is predicted, that is, input and export all Be entire image, rather than regional area or window, therefore the contextual information in image can be better profited from, particularly from Contextual information in right scene image, so as to obtain more accurately text detection result.
This method 200 can handle different scenes, the image of different quality.This method 200 can suppress complicated effective While the interference of background, the word of detection different colours, font and size.This method 200 can be with automatic Prediction literal line Direction, can directly in detection image different directions word.This method 200 is insensitive to the language belonging to word, Ke Yitong When detect word corresponding to different classes of language (such as Chinese, English, Korean etc.).In addition, this method 200 has strong robustness Feature, the interference of the factor such as noise, fuzzy, complex background, inhomogeneous illumination can be successfully managed.
Fig. 7 shows the flow chart according to an embodiment of the invention for obtaining described image to be detected.
In step S710, original image is received.In one embodiment, original image can have complicated background letter Breath, its character area included can also have diversity, such as character area can include different colors, font, language The text information of kind and size etc..
In step S720, the original image received is pre-processed, to obtain image to be detected.In an implementation In example, the original image received can be subjected to dimension normalization, i.e., by the maximum dimension of original image (for example, original graph The greater in the height and width of picture) pre-set dimension is zoomed to, the pre-set dimension can include 480,640,800 and 960 Pixel etc..Kept in the Aspect Ratio of image to be detected that dimension normalization operation obtains afterwards and the Aspect Ratio of original image It is identical.
Fig. 8 is schematically illustrated according to an embodiment of the invention carries out cutting operation to character area probability graph The flow chart of method.
In step S810, binarization operation is carried out to the character area probability graph of image to be detected.
It is appreciated that character area directly can be obtained according to the result of binarization operation.In the present invention, due to the phase Hope and distinguish character area and non-legible region (background area), so the purpose can be achieved using binarization operation.Binaryzation Operation realizes that simply amount of calculation is few and speed is fast.
Binarization operation can be Threshold segmentation operation.Alternatively, threshold value T is adjustable parameter.If gray value 255 represents The probability for belonging to character area is 100%, and gray value 0 represents that the probability for belonging to character area is 0, then can set threshold value For 128.
Binarization operation can also be the cutting operation increased based on region.Region growing methods are according to same object areas The similar quality of pixel assembles the method for pixel in domain.Specifically, from prime area (for example, picture in character area probability graph Element is worth larger pixel) start, by the adjacent pixel with same property (poor smaller with the pixel value of current pixel) It is integrated into current region so as to progressively growth region, until without can be untill the pixel of merger
In the embodiment shown in fig. 8, after binarization operation, in addition to step S820 and step S830.
In step S820, the profile for each connected region that binarization operation is obtained is determined.The step can be used existing Any edge detection methods of research and development have or following realizes, such as based on the various edges such as Sobel or Canny operators Detection method.
It is quadrangle to determine the character area by the contour fitting of each connected region in step S830.One In individual embodiment, the interior zone of all quadrangles can be used as character area.Specifically, it is assumed that the collection of all quadrangle compositions It is combined into B, B={ bk, k=1,2 ... Q, wherein bkThe quadrangle that fitting obtains is represented, Q represents the number of quadrangle, and k is subscript. Then set B is the result output of text detection.
The region that quadrangle surrounds can preferably include any direction, the word of language, and it is calculated simply.Such as Shown in Fig. 6 b character area probability graph, the various reasons such as noise in image, image Chinese word shape may cause character area Probability graph fails more preferably to represent the probability that pixel belongs to character area.Character area is fitted by using quadrilateral area, It may further ensure that in character area and include whole word contents, so as to ensure the precision of text detection.
Fig. 9 schematically illustrates training neutral net according to an embodiment of the invention to obtain semantic forecast model Method flow chart.The purpose of this method is that the model can be with effective district from sample image learning semantic forecast model The character area divided in image to be detected and non-legible region.
Sample image is the image of known wherein character area., can be with as described above, neutral net has " study " ability Neutral net is trained to obtain available semantic forecast model by using multiple sample images.In this embodiment, the training Method enables semanteme of the semantic forecast model according to image to be detected, generation more accurately character area probability graph, so as to Predict that the pixel in described image to be detected belongs to character area or non-legible region, so as to so that character detecting method The accuracy of testing result is higher.
It will appreciated by the skilled person that for text detection system, the semantic forecast model can be pre- It is first stored in wherein.
In step S910, multiple sample images and its markup information are received.
In one embodiment, can be gathered from separate sources largely the various images comprising word as sample image, For example, natural scene image.It is expected that sample image species is abundant and number is more, to obtain preferable semantic forecast model. In one embodiment, the number of sample image is no less than 1000.
Polygon can be used to mark all character areas in the sample image in each sample image, so as to obtain Obtain the markup information of sample image.The ground literal unit of mark can be literal line or word.Character area in sample image Markup information can be preserved in the form of polygon (for example, quadrangle).Specifically, in one embodiment, can only protect Deposit the coordinate on four summits of quadrangle.Any direction, language can not only be met by preserving markup information with the shape of quadrangle Word, and be easy to calculate.
Figure 10 a, Figure 10 b, Figure 10 c and Figure 10 d respectively illustrate having through mark according to an embodiment of the invention The sample image of markup information.Go out as shown in these figures, quadrangle (light quadrangle in figure) mark sample graph can be used Character area as in, and the tab area is applied to the direction of any font, languages and word.
In step S920, the mask figure of sample image is generated according to the sample image and its markup information.Specifically, For sample image I and corresponding markup information a, one width of generation and sample image I mask figures of the same size.In an implementation In example, the mask figure can include two-value mask figure R.In the two-value mask figure R, sample is distinguished using different pixel values The character area of this image and non-legible region.In one embodiment, for sample image I, using with the first pixel value The character area that is marked of pixel filling markup information, using the non-legible region of pixel filling with the second pixel value, from And two-value mask figure R is generated, wherein, the first pixel value and the second pixel value are different, to distinguish the character area and non-legible Region.For example, in two-value mask figure R, the picture of the character area (namely the interior zone marked using quadrangle) marked Plain value is filled to be 255, rather than the pixel value of character area is filled to be 0.
Figure 11 a and Figure 11 b respectively illustrate the sample with markup information according to an embodiment of the invention through mark This image mask figure corresponding with its.As shown in fig. 11a, using quadrangle by the word segment of original sample image (for example, " sea Shallow lake construction security ", " Haidian Middle St ", " HAIDIANZHONGJIE ", " Haidian South Road ") mark out and, and accordingly generate Figure 11 b Shown in mask artwork.Wherein, the word segment come using being marked out with the pixel filling that pixel value is 255, uses pixel It is worth the non-legible part of pixel filling for 0, so as to obtain the mask figure shown in Figure 11 b.
In step S930, training set is built using sample image and its mask figure, and trains neutral net, to obtain language Adopted forecast model M.Original sample image mask figure composing training sample set S corresponding with its.S={ (Ii,Ri), i=1, 2 ..., N, wherein IiRepresent original sample image, RiFor original sample image IiCorresponding mask figure, N are training sample Collect the number of sample image in S, i is subscript.
In one embodiment, neutral net can include full convolutional neural networks.Full convolutional neural networks are a kind of special Different neutral net, its feature are that from all of output are input to comprising that can learn the layer of parameter be all convolutional layer (convolutional layer).Full convolutional neural networks avoid the pretreatment complicated early stage to image, can directly input Original image, it can make the text detection result of image especially suitable for the analyzing and processing to the image with complex background It is more accurate.
According to a specific embodiment of the invention, the full convolutional neural networks being made up of 13 layers can be used.Figure 12 show the schematic diagram of the full convolutional neural networks.
Except including convolutional layer, in addition to maximum pond layer in the full convolutional neural networks.Maximum pond layer separates company Continuous convolutional layer, it can effectively reduce amount of calculation, while the robustness of strength neural network.
The input of the full convolutional neural networks is raw image data.As shown in figure 12, the full convolutional neural networks include First convolutional layer and the second convolutional layer, the number of its median filter can be 64, and wave filter size can be 3x3.Second convolution Layer the first maximum pond layer (maxpool layer) of connection.Followed by the 3rd convolutional layer and Volume Four lamination, its median filter Number can be 128, wave filter size can be 3x3.Volume Four lamination connects the second maximum pond layer.Followed by the 5th Convolutional layer, the 6th convolutional layer and the 7th convolutional layer, the number of its median filter is 256, and wave filter size is 3x3.7th convolution Layer the 3rd maximum pond layer of connection.Followed by the 8th convolutional layer, the 9th convolutional layer and the tenth convolutional layer, the number of its median filter Mesh can be 512, and wave filter size can be 3x3.Tenth convolutional layer connects the 4th maximum pond layer.It is a roll of followed by the tenth Lamination, the 12nd convolutional layer and the 13rd convolutional layer, the number of its median filter can be 512, and wave filter size can be 3x3。
In the training process, a sample image and corresponding mask figure are input in full convolutional neural networks every time, Initial learning rate can be 0.00000001, often be reduced to original 1/10 by 10000 wheel iteration, learning rate.Work as iteration After 100000 wheels, training process can terminate.The full convolutional neural networks that training process is obtained when terminating are desired language Adopted forecast model.Via the semantic forecast model trained, mapping to be checked can be generated according to the semanteme of image to be detected The character area probability graph of the full figure of picture, so as to predict the character area in image to be detected.
Although it will appreciated by the skilled person that illustrate above by taking 13 layers of full convolutional neural networks as an example, But the number of plies of full convolutional neural networks can include the Arbitrary Digit between 6 to 19.The number of plies of this scope has weighed calculating As a result the two aspects of accuracy and amount of calculation.In addition, the number and size of wave filter recited above are also only example, rather than Limitation.Such as the number of wave filter can also be 100,500 or 1000 etc., the size of wave filter can also be 1x1 or 5x5.
According to a further aspect of the invention, a kind of text detection device is additionally provided.Figure 13 is shown according to of the invention one The schematic block diagram of the text detection device 1300 of embodiment.As shown in figure 13, text detection device 1300 includes semantic analysis Module 1330 and segmentation module 1340.According to one embodiment of present invention, the semantic module 1330 also includes Semantic forecast model 1350.
Semantic module 1330 is used to receive image to be detected, and is generated using semantic forecast model 1350 described to be checked The character area probability graph of the full figure of altimetric image.Semantic forecast model is used for the semantic generation character area according to image to be detected Probability graph, non-legible region is still fallen within to predict that the pixel in described image to be detected belongs to character area.The literal field Domain probability graph using different pixel value represents different probability to distinguish the character area of described image to be detected and described treat The non-legible region of detection image.
In one embodiment, image to be detected can be original image or original image is pre-processed The image obtained afterwards.
In one embodiment, semantic forecast model 1350 can be by training neutral net to obtain.Hereinafter will knot Figure 14 is closed training neutral net acquisition semantic forecast model 1350 is described in detail.
With reference to Fig. 3 a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b descriptive text area probability figures.Figure 3a and Fig. 3 b, Fig. 4 a and Fig. 4 b, Fig. 5 a and Fig. 5 b, Fig. 6 a and Fig. 6 b are image to be detected according to an embodiment of the invention respectively Full figure and via semantic forecast model 1350 generate corresponding character area probability graph.Fig. 3 a, Fig. 4 a, Fig. 5 a and Fig. 6 a can To be the full figure of image to be detected, also, contain character area on image, Fig. 3 b, Fig. 4 b, Fig. 5 b and Fig. 6 b show the figure 3a, Fig. 4 a, Fig. 5 a and Fig. 6 a full figure of image to be detected pass through the character area that is generated after semantic forecast model 1350 Probability graph.The character area probability graph of generation represents different probability to distinguish the text of image to be detected using different pixel value Block domain and non-legible region.For example, using the pixel filling character area of pixel value 255, represent that the region belongs to literal field The probability highest in domain, using the non-legible region of the pixel filling of pixel value 0 (for example, background area), represent that the region belongs to text The probability in block domain is minimum, so as to distinguish character area and the non-legible region in image to be detected.By taking Fig. 4 b as an example, Fig. 4 b Character area probability graph character area and non-legible region in image to be detected 4a are distinguished using different pixel values.Example Such as, using two character areas in the pixel filling image to be detected 4a for being 255 with pixel value, " unauthorized No Admittance " " Authorized Personnel Only ", so as to the character area probability graph gone out as shown in Figure 4 b, also, Fig. 4 b Character area probability graph also show the direction of the character area in former image to be detected 4a complete and accurate
Split module 1340 to be used to carry out cutting operation to the character area probability graph, to determine character area.Because The numerical value of pixel in character area probability graph can represent that the pixel belongs to the probability of character area, it is possible to according to bottom Feature (such as gray scale of image) is split to character area probability.
For example, segmentation module 1340 can be by carrying out binarization operation to obtain literal field to character area probability graph Domain.In the present invention, due to it is expected to distinguish character area and non-legible region (background area), so utilizing binarization operation Realize the purpose.Binarization operation realizes that simply amount of calculation is few and speed is fast.
Binarization operation can be Threshold segmentation operation.Alternatively, threshold value T is adjustable parameter.If gray value 255 represents The probability for belonging to character area is 100%, and gray value 0 represents that the probability for belonging to character area is 0, then can set threshold value For 128.
Binarization operation can also be the cutting operation increased based on region.Region growing methods are according to same object areas The similar quality of pixel assembles the method for pixel in domain.Specifically, from prime area (for example, picture in character area probability graph Element is worth larger pixel) start, by the adjacent pixel with same property (poor smaller with the pixel value of current pixel) It is integrated into current region so as to progressively growth region, until without can be untill the pixel of merger.
After binarization operation, the segmentation module 1340 can be also used for determining that binarization operation is obtained and each connect The profile in logical region.It can be realized with any edge detection method of existing or following research and development, such as based on such as Sobel Or the various edge detection methods such as Canny operators.Segmentation module 1340 can be also used for the contour fitting of each connected region It is quadrangle to determine the character area.In one embodiment, the interior zone of all quadrangles can be used as literal field Domain.Specifically, it is assumed that the collection of all quadrangle compositions is combined into B, B={ bk, k=1,2 ... Q, wherein bkRepresent what fitting obtained Quadrangle, Q represent the number of quadrangle, and k is subscript.Then set B is the result output of text detection.
The region that quadrangle surrounds can preferably include any direction, the word of language, and it is calculated simply.Such as Shown in Fig. 6 b character area probability graph, the various reasons such as noise in image, image Chinese word shape may cause character area Probability graph fails more preferably to represent the probability that pixel belongs to character area.Character area is fitted by using quadrilateral area, It may further ensure that in character area and include whole word contents, so as to ensure the precision of text detection.
In one embodiment, it is believed that average pixel value is smaller in the image that segmentation module 1340 is obtained after splitting Region be non-legible region, other regions are character area.
Figure 14 shows the schematic block diagram of text detection device 1400 according to another embodiment of the present invention.Text detection Semantic module 1330 in device 1400 is similar with the semantic module 1330 in text detection device 1300, word inspection The segmentation module 1340 surveyed in device 1400 is similar with the segmentation module 1340 in text detection device 1300, for sake of simplicity, This is repeated no more.
Compared with text detection device 1300, text detection device 1400 adds image pre-processing module 1410 and training Module 1420.
According to an embodiment of the invention, described image pretreatment module 1410 receives original image.In one embodiment, Original image can have complicated background information, can include having multifarious character area, for example, there is different face Color, font, the text information of languages and size.
Image pre-processing module 1410 pre-processes to the original image received.In one embodiment, image is pre- Processing module 1410 can carry out dimension normalization to the original image that receives, i.e., by the maximum dimension of original image (for example, The greater in the height and width of original image) zoom to pre-set dimension, the pre-set dimension can include 480,640, 800 and 960 pixels etc..Also, the Aspect Ratio for the Aspect Ratio and the original image for pre-processing the image obtained afterwards is protected Hold identical.
After pretreatment, image pre-processing module 1410 obtains described image to be detected and by described image to be detected Full figure export to the semantic module 1330 and handled.Wherein, according to described above, image to be detected tool There is pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of the original image.
According to one embodiment of present invention, training module 1420 is used for using multiple sample images training neutral net, To obtain semantic forecast model 1350, the model can efficiently differentiate character area and non-legible area in image to be detected Domain.
In one embodiment, training module 1420 can gather the largely various images comprising word from separate sources and make For sample image and receive the markup information of sample image.Sample image is, for example, natural scene image.It is expected sample image kind Class is abundant and number is more, to obtain preferable semantic forecast model.In one embodiment, the number of sample image is no less than 1000。
All character areas in each sample image can be marked using polygon in the sample image.The base of mark This word unit can be literal line or word.The markup information of character area can be with polygon (for example, four in sample image Side shape) form preserve.Specifically, in one embodiment, the coordinate on four summits of quadrangle can only be preserved.With four sides The shape of shape, which preserves markup information, can not only meet any direction, the word of language, and be easy to calculate.
Figure 10 a, Figure 10 b, Figure 10 c and Figure 10 d respectively illustrate having through mark according to an embodiment of the invention The sample image of markup information.Go out as shown in these figures, quadrangle (light quadrangle in figure) mark sample graph can be used Character area as in, and the tab area is applied to the direction of any font, languages and word.
Training module 1420 is additionally operable to the mask figure according to sample image and its markup information generation sample image.At one In embodiment, the mask figure includes two-value mask figure.Specifically, for sample image I and corresponding markup information a, training Module 1420 generates a width and sample image I mask figures of the same size, for example, two-value mask figure R.Two-value mask figure R is used Different pixel values distinguishes the character area of sample image and non-legible region.In one embodiment, for sample image I, The character area marked using the pixel filling with the first pixel value, use the non-text of pixel filling with the second pixel value Block domain, so as to generate the mask figure, wherein, the first pixel value and the second pixel value are different, to distinguish the character area With non-legible region.For example, the pixel value of the character area (namely the interior zone marked using quadrangle) marked is filled out Fill for 255, rather than the pixel value of character area is filled to be 0.
Training module 1420 is further used for using sample image and its mask figure structure training set, and trains nerve net Network, to obtain semantic forecast model 1350.Specifically, the training sample that original sample image mask figure corresponding with its is formed Integrate as S.S={ (Ii,Ri), i=1, wherein 2 ..., N, IiRepresent original sample image, RiFor original sample image IiIt is right The mask figure answered, N are the number of sample image in training sample set S, and i is subscript.
In one embodiment, neutral net can be full convolutional neural networks.Full convolutional neural networks are a kind of special Neutral net, its feature is that from all of output are input to comprising that can learn the layer of parameter be all convolutional layer.Full convolutional Neural Network avoids the pretreatment complicated early stage to image, can directly input original image, it is especially suitable for complexity The analyzing and processing of the image of background, the text detection result of image can be made more accurate.
Training sample set S is inputted full convolutional neural networks and is trained by training module 1420, to obtain semantic forecast mould Type 1350.According to a specific embodiment of the invention, the full convolutional neural networks being made up of 13 layers can be used.Figure 12 Show the schematic diagram of the full convolutional neural networks.
Except including convolutional layer, in addition to maximum pond layer in the full convolutional neural networks.Maximum pond layer separates company Continuous convolutional layer, it can effectively reduce amount of calculation, while the robustness of strength neural network.
The input of the full convolutional neural networks is raw image data.As shown in figure 12, the full convolutional neural networks include First convolutional layer and the second convolutional layer, the number of its median filter can be 64, and wave filter size can be 3x3.Second convolution Layer the first maximum pond layer of connection.Followed by the 3rd convolutional layer and Volume Four lamination, the number of its median filter can be 128, wave filter size can be 3x3.Volume Four lamination connects the second maximum pond layer.Followed by the 5th convolutional layer, the 6th Convolutional layer and the 7th convolutional layer, the number of its median filter is 256, and wave filter size is 3x3.7th convolutional layer connection the 3rd is most Great Chiization layer.Followed by the 8th convolutional layer, the 9th convolutional layer and the tenth convolutional layer, the number of its median filter can be 512, Wave filter size can be 3x3.Tenth convolutional layer connects the 4th maximum pond layer.Followed by the 11st convolutional layer, the 12nd Convolutional layer and the 13rd convolutional layer, the number of its median filter can be 512, and wave filter size can be 3x3.
In the training process, a sample image and corresponding mask figure are input in full convolutional neural networks every time, Initial learning rate can be 0.00000001, often be reduced to original 1/10 by 10000 wheel iteration, learning rate.Work as iteration After 100000 wheels, training process can terminate.The full convolutional neural networks that training process is obtained when terminating are desired language Adopted forecast model., can be according to the semantic generation character area of image to be detected via the semantic forecast model trained Probability graph, so as to predict the character area in image to be detected.
Although it will appreciated by the skilled person that illustrate above by taking 13 layers of full convolutional neural networks as an example, But the number of plies of full convolutional neural networks can include the Arbitrary Digit between 6 to 19.The number of plies of this scope has weighed calculating As a result the two aspects of accuracy and amount of calculation.In addition, the number and size of wave filter recited above are also only example, rather than Limitation.Such as the number of wave filter can also be 100,500 or 1000 etc., the size of wave filter can also be 1x1 or 5x5.
The semantic forecast model 1350 trained neutral net using multiple sample images by training module 1420 and obtained The character area in image to be detected and non-legible region can be efficiently differentiated.
Figure 15 shows the schematic block diagram of text detection system 1500 according to embodiments of the present invention.As shown in figure 15, Text detection system 1500 includes processor 1510, memory 1520 and the programmed instruction stored in the memory 1520 1530。
Described program instruction 1530 can realize word according to embodiments of the present invention when the processor 1510 is run The function of each functional module of detection means, and/or character detecting method according to embodiments of the present invention can be performed Each step.
Specifically, when described program instruction 1530 is run by the processor 1510, following steps are performed:Receive to be checked Altimetric image;The character area probability graph of the full figure of described image to be detected is generated via semantic forecast model, wherein, the word Area probability figure distinguishes the character area of described image to be detected and the non-text of described image to be detected using different pixel values Block domain;And cutting operation is carried out to the character area probability graph, to determine the character area.Semantic forecast model is used Pixel in image to be detected described in the semantic forecast according to image belongs to character area and still falls within non-legible region.
In addition, when described program instruction 1530 is run by the processor 1510, following steps are also performed:Receive original Image;And the original image is pre-processed, to obtain described image to be detected, wherein, image to be detected tool There is pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of the original image.
It is in addition, performed general to the character area when described program instruction 1530 is run by the processor 1510 The step of rate figure progress cutting operation is to determine the character area includes:Binaryzation behaviour is carried out to the character area probability graph Make, to determine the character area.
It is in addition, performed general to the character area when described program instruction 1530 is run by the processor 1510 The step of rate figure progress binarization operation is to determine the character area includes:Determine that the binarization operation is obtained each The profile of connected region;And by the contour fitting be quadrangle, wherein, the quadrangle interior zone is the literal field Domain.
In addition, when described program instruction 1530 is run by the processor 1510, following steps are also performed:Using multiple Sample image trains neutral net, to obtain the semantic forecast model.
In addition, when described program instruction 1530 is run by the processor 1510, the multiple sample graphs of performed utilization As training neutral net is included with obtaining the step of the semantic forecast model:Receive the sample image and the sample image Markup information;The mask figure of the sample image is generated according to the markup information of the sample image and the sample image; And the neutral net is trained using the sample image and the mask figure, to obtain the semantic forecast model.
In addition, execution is run using multiple sample images training god by the processor 1510 in described program instruction 1530 In the step of through network to obtain the semantic forecast model, the mask figure includes two-value mask figure, and the two-value is covered Film figure distinguishes the character area of the sample image and non-legible region using different pixel values.
In addition, execution is run using multiple sample images training god by the processor 1510 in described program instruction 1530 In the step of through network to obtain the semantic forecast model, the neutral net includes full convolutional neural networks.
In addition, execution is run using multiple sample images training god by the processor 1510 in described program instruction 1530 In the step of through network to obtain the semantic forecast model, the number of plies of the full convolutional neural networks is included between 6 to 19 Arbitrary Digit.
In addition, according to embodiments of the present invention, a kind of storage medium is additionally provided, stores program on said storage Instruction, when described program instruction is run by computer or processor for performing the character detecting method of the embodiment of the present invention Corresponding steps, and for realizing the corresponding module in text detection device according to embodiments of the present invention.The storage medium Such as the storage card of smart phone, the memory unit of tablet personal computer, hard disk, the read-only storage of personal computer can be included (ROM), Erasable Programmable Read Only Memory EPROM (EPROM), portable compact disc read-only storage (CD-ROM), USB storage, Or any combination of above-mentioned storage medium.The computer-readable recording medium, which can be that one or more is computer-readable, to be deposited Any combination of storage media, such as a computer-readable recording medium include and are used to train neutral net to obtain semantic forecast The computer-readable program code of model, another computer-readable recording medium include the calculating for being used for carrying out text detection The readable program code of machine.
In one embodiment, the computer program instructions can be realized when being run by computer according to of the invention real Each functional module of the text detection device of example is applied, and/or text detection according to embodiments of the present invention can be performed Method.
In one embodiment, the computer program instructions perform following steps when being run by computer:Reception is treated Detection image;The character area probability graph of the full figure of described image to be detected is generated via semantic forecast model, wherein, the text Block domain probability graph using different pixel value distinguish described image to be detected character area and described image to be detected it is non- Character area;And cutting operation is carried out to the character area probability graph, to determine the character area.The semantic forecast Model belongs to character area for pixel in image to be detected according to the semantic forecast of image and still falls within non-legible region.
In addition, the computer program instructions perform when being run by computer, following steps are also performed:Receive original graph Picture;And the original image is pre-processed, to obtain described image to be detected, wherein, described image to be detected has Pre-set dimension size, and the Aspect Ratio of described image to be detected is identical with the Aspect Ratio of the original image.
It is in addition, performed to the character area probability graph when the computer program instructions are being run by computer Carrying out the step of cutting operation is to determine the character area includes:Binarization operation is carried out to the character area probability graph, To determine the character area.
It is in addition, performed to the character area probability graph when the computer program instructions are being run by computer Carrying out the step of binarization operation is to determine the character area includes:Determine each connection that the binarization operation is obtained The profile in region;And by the contour fitting be quadrangle, wherein, the quadrangle interior zone is the character area.
In addition, when the computer program instructions are being run by computer, following steps are also performed:Utilize multiple samples Image trains neutral net, to obtain the semantic forecast model.
In addition, when the computer program instructions are being run by computer, the multiple sample image instructions of performed utilization Practice neutral net is included with obtaining the step of the semantic forecast model:Receive the mark of the sample image and the sample image Note information;The mask figure of the sample image is generated according to the markup information of the sample image and the sample image;And The neutral net is trained using the sample image and the mask figure, to obtain the semantic forecast model.
Multiple sample images training nerve is utilized in addition, being performed when the computer program instructions are being run by computer In the step of network is to obtain the semantic forecast model, the mask figure includes two-value mask figure, and the two-value mask Figure distinguishes the character area of the sample image and non-legible region using different pixel values.
Multiple sample images training nerve is utilized in addition, being performed when the computer program instructions are being run by computer In the step of network is to obtain the semantic forecast model, the neutral net includes full convolutional neural networks.
Multiple sample images training nerve is utilized in addition, being performed when the computer program instructions are being run by computer In the step of network is to obtain the semantic forecast model, the number of plies of the full convolutional neural networks includes appointing between 6 to 19 Meaning number.
Those of ordinary skill in the art are by reading the detailed description above for character detecting method, it is to be understood that above-mentioned Text detection device, the structure of system, realization and advantage, therefore repeat no more here.
In the specification that this place provides, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, Above in the description to the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor The application claims of shield features more more than the feature being expressly recited in each claim.It is more precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself Separate embodiments all as the present invention.
Those skilled in the art are appreciated that except at least one in such feature and/or process or unit Outside excluding each other, any combinations can be used in this specification (including adjoint claim, summary and accompanying drawing) Disclosed all features and so disclosed any method or all processes or unit of device are combined.Unless in addition It is expressly recited, each feature disclosed in this specification (including adjoint claim, summary and accompanying drawing) can be by offer phase The alternative features of same, equivalent or similar purpose replace.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed One of meaning mode can use in any combination.
The all parts embodiment of the present invention can be realized with hardware, or to be run on one or more processor Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that it can use in practice Microprocessor or digital signal processor (DSP) realize some moulds in text detection device according to embodiments of the present invention The some or all functions of block.The present invention is also implemented as the part or complete for performing method as described herein The program of device (for example, computer program and computer program product) in portion.Such program for realizing the present invention can store On a computer-readable medium, or can the form with one or more signal.Such signal can be from internet Download and obtain on website, either provide on carrier signal or provided in the form of any other.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of some different elements and being come by means of properly programmed computer real It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame Claim.

Claims (16)

1. a kind of character detecting method, including:
Receive the markup information of multiple sample images and the sample image;
The mask figure of the sample image is generated according to the markup information of the sample image and the sample image;
Using the sample image and mask figure training neutral net, to obtain semantic forecast model;
Receive image to be detected;
The character area probability graph of the full figure of described image to be detected is generated via the semantic forecast model, wherein, the text Block domain probability graph using different pixel value distinguish described image to be detected character area and described image to be detected it is non- Character area;And
Cutting operation is carried out to the character area probability graph, to determine the character area.
2. the method as described in claim 1, in addition to:
Receive original image;And
The original image is pre-processed, to obtain described image to be detected,
Wherein, described image to be detected has pre-set dimension size, and the Aspect Ratio of described image to be detected and the original The Aspect Ratio of beginning image is identical.
3. the method described in claim 1, wherein, cutting operation is carried out to the character area probability graph, to determine the text Block domain includes:
Binarization operation is carried out to the character area probability graph, to determine the character area.
4. method as claimed in claim 3, wherein, binarization operation is carried out to the character area probability graph, to determine Stating character area includes:
Determine the profile for each connected region that the binarization operation is obtained;And
It is quadrangle by the contour fitting, wherein, the quadrangle interior zone is the character area.
5. the method for claim 1, wherein the mask figure includes two-value mask figure, and the two-value mask figure The character area of the sample image and non-legible region are distinguished using different pixel values.
6. the method for claim 1, wherein the neutral net includes full convolutional neural networks.
7. method as claimed in claim 6, wherein, the numbers of plies of the full convolutional neural networks includes any between 6 to 19 Number.
8. the method as described in any one of claim 1 to 7, wherein, the semantic forecast model is used for according to described to be detected Pixel in image to be detected described in the semantic forecast of image belongs to character area and still falls within non-legible region.
9. a kind of text detection device, including:
Training module, for receiving the markup information of multiple sample images and the sample image, according to the sample image and The markup information of the sample image generates the mask figure of the sample image, and utilizes the sample image and the mask Figure training neutral net, to obtain semantic forecast model;
Semantic module, the training module is connected to, for receiving image to be detected, and uses the semantic forecast model To generate the character area probability graph of the full figure of described image to be detected, wherein, the character area probability graph uses different Pixel value distinguishes the character area of described image to be detected and the non-legible region of described image to be detected;And
Split module, for carrying out cutting operation to the character area probability graph, to determine the character area.
10. text detection device as claimed in claim 9, described device further comprise:
Image pre-processing module, pre-processed for receiving original image, and to the original image, it is described to be checked to obtain Altimetric image,
Wherein, described image to be detected has pre-set dimension size, and the Aspect Ratio of described image to be detected and the original The Aspect Ratio of beginning image is identical.
11. the text detection device described in claim 9, wherein, the segmentation module is further used for the character area Probability graph carries out binarization operation, to determine the character area.
12. text detection device as claimed in claim 11, wherein, the segmentation module is further used for determining the two-value Change the profile for each connected region that operation is obtained, and be quadrangle by the contour fitting, wherein, inside the quadrangle Region is the character area.
13. text detection device as claimed in claim 9, wherein, the mask figure includes two-value mask figure, and described two It is worth mask figure and distinguishes the character area of the sample image and non-legible region using different pixel values.
14. text detection device as claimed in claim 9, wherein, the neutral net includes full convolutional neural networks.
15. text detection device as claimed in claim 14, wherein, the number of plies of the full convolutional neural networks includes 6 to 19 Between Arbitrary Digit.
16. the text detection device as described in any one of claim 9 to 15, wherein, semantic forecast model is used for according to Pixel in image to be detected described in the semantic forecast of image to be detected belongs to character area and still falls within non-legible region.
CN201510970839.2A 2015-12-22 2015-12-22 Character detecting method and device Active CN105574513B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510970839.2A CN105574513B (en) 2015-12-22 2015-12-22 Character detecting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510970839.2A CN105574513B (en) 2015-12-22 2015-12-22 Character detecting method and device

Publications (2)

Publication Number Publication Date
CN105574513A CN105574513A (en) 2016-05-11
CN105574513B true CN105574513B (en) 2017-11-24

Family

ID=55884621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510970839.2A Active CN105574513B (en) 2015-12-22 2015-12-22 Character detecting method and device

Country Status (1)

Country Link
CN (1) CN105574513B (en)

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295629B (en) * 2016-07-15 2018-06-15 北京市商汤科技开发有限公司 structured text detection method and system
CN107633527B (en) * 2016-07-19 2020-07-07 北京图森未来科技有限公司 Target tracking method and device based on full convolution neural network
CN108108731B (en) * 2016-11-25 2021-02-05 中移(杭州)信息技术有限公司 Text detection method and device based on synthetic data
CN106778928B (en) * 2016-12-21 2020-08-04 广州华多网络科技有限公司 Image processing method and device
CN106897732B (en) * 2017-01-06 2019-10-08 华中科技大学 It is a kind of based on connection text section natural picture in multi-direction Method for text detection
CN107025457B (en) * 2017-03-29 2022-03-08 腾讯科技(深圳)有限公司 Image processing method and device
US10262236B2 (en) * 2017-05-02 2019-04-16 General Electric Company Neural network training image generation system
WO2018232592A1 (en) * 2017-06-20 2018-12-27 Microsoft Technology Licensing, Llc. Fully convolutional instance-aware semantic segmentation
CN109389116B (en) * 2017-08-14 2022-02-08 阿里巴巴(中国)有限公司 Character detection method and device
CN109410211A (en) * 2017-08-18 2019-03-01 北京猎户星空科技有限公司 The dividing method and device of target object in a kind of image
CN107886093B (en) * 2017-11-07 2021-07-06 广东工业大学 Character detection method, system, equipment and computer storage medium
CN108305262A (en) * 2017-11-22 2018-07-20 腾讯科技(深圳)有限公司 File scanning method, device and equipment
CN109961553A (en) * 2017-12-26 2019-07-02 航天信息股份有限公司 Invoice number recognition methods, device and tax administration self-service terminal system
CN108229575A (en) * 2018-01-19 2018-06-29 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN108197623A (en) * 2018-01-19 2018-06-22 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN108427950B (en) * 2018-02-01 2021-02-19 北京捷通华声科技股份有限公司 Character line detection method and device
CN108304814B (en) * 2018-02-08 2020-07-14 海南云江科技有限公司 Method for constructing character type detection model and computing equipment
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN108717542B (en) * 2018-04-23 2020-09-15 北京小米移动软件有限公司 Method and device for recognizing character area and computer readable storage medium
CN109102037B (en) * 2018-06-04 2024-03-05 平安科技(深圳)有限公司 Chinese model training and Chinese image recognition method, device, equipment and medium
CN108921158A (en) * 2018-06-14 2018-11-30 众安信息技术服务有限公司 Method for correcting image, device and computer readable storage medium
CN108989793A (en) * 2018-07-20 2018-12-11 深圳市华星光电技术有限公司 A kind of detection method and detection device of text pixel
CN109040824B (en) * 2018-08-28 2020-07-28 百度在线网络技术(北京)有限公司 Video processing method and device, electronic equipment and readable storage medium
KR102211763B1 (en) * 2018-09-21 2021-02-03 네이버 주식회사 Apparatus, method and system for detecting character
CN109492638A (en) * 2018-11-07 2019-03-19 北京旷视科技有限公司 Method for text detection, device and electronic equipment
CN112789623A (en) * 2018-11-16 2021-05-11 北京比特大陆科技有限公司 Text detection method, device and storage medium
CN111259878A (en) * 2018-11-30 2020-06-09 中移(杭州)信息技术有限公司 Method and equipment for detecting text
CN109685055B (en) * 2018-12-26 2021-11-12 北京金山数字娱乐科技有限公司 Method and device for detecting text area in image
CN110119742B (en) * 2019-04-25 2023-07-07 添维信息科技(天津)有限公司 Container number identification method and device and mobile terminal
CN110059685B (en) * 2019-04-26 2022-10-21 腾讯科技(深圳)有限公司 Character area detection method, device and storage medium
CN110110777A (en) * 2019-04-28 2019-08-09 网易有道信息技术(北京)有限公司 Image processing method and training method and device, medium and calculating equipment
CN112001406B (en) * 2019-05-27 2023-09-08 杭州海康威视数字技术股份有限公司 Text region detection method and device
CN110458162B (en) * 2019-07-25 2023-06-23 上海兑观信息科技技术有限公司 Method for intelligently extracting image text information
CN111753836A (en) * 2019-08-27 2020-10-09 北京京东尚科信息技术有限公司 Character recognition method and device, computer readable medium and electronic equipment
CN110503103B (en) * 2019-08-28 2023-04-07 上海海事大学 Character segmentation method in text line based on full convolution neural network
CN110503159B (en) * 2019-08-28 2022-10-11 北京达佳互联信息技术有限公司 Character recognition method, device, equipment and medium
CN110807454B (en) * 2019-09-19 2024-05-14 平安科技(深圳)有限公司 Text positioning method, device, equipment and storage medium based on image segmentation
DE102019134387A1 (en) * 2019-12-13 2021-06-17 Beckhoff Automation Gmbh Process for real-time optical character recognition in an automation system and automation system
CN111242120B (en) * 2020-01-03 2022-07-29 中国科学技术大学 Character detection method and system
CN113496223A (en) * 2020-03-19 2021-10-12 顺丰科技有限公司 Method and device for establishing text region detection model
CN111626283B (en) * 2020-05-20 2022-12-13 北京字节跳动网络技术有限公司 Character extraction method and device and electronic equipment
CN111723815B (en) * 2020-06-23 2023-06-30 中国工商银行股份有限公司 Model training method, image processing device, computer system and medium
CN111753727B (en) * 2020-06-24 2023-06-23 北京百度网讯科技有限公司 Method, apparatus, device and readable storage medium for extracting structured information
CN111767921A (en) * 2020-06-30 2020-10-13 上海媒智科技有限公司 Express bill positioning and correcting method and device
CN114078108B (en) * 2020-08-11 2023-12-22 北京阅影科技有限公司 Method and device for processing abnormal region in image, and method and device for dividing image
CN112801911B (en) * 2021-02-08 2024-03-26 苏州长嘴鱼软件有限公司 Method and device for removing text noise in natural image and storage medium
CN114067192A (en) * 2022-01-07 2022-02-18 北京许先网科技发展有限公司 Character recognition method and system
CN114495129B (en) * 2022-04-18 2022-09-09 阿里巴巴(中国)有限公司 Character detection model pre-training method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100269102B1 (en) * 1994-06-24 2000-10-16 윤종용 Numeric character recognition with neural network
CN103745213A (en) * 2014-02-28 2014-04-23 中国人民解放军63680部队 Optical character recognition method based on LVQ neural network
CN104899586B (en) * 2014-03-03 2018-10-12 阿里巴巴集团控股有限公司 Method and device is identified to the word content for including in image

Also Published As

Publication number Publication date
CN105574513A (en) 2016-05-11

Similar Documents

Publication Publication Date Title
CN105574513B (en) Character detecting method and device
CN109800736B (en) Road extraction method based on remote sensing image and deep learning
Wang et al. Tire defect detection using fully convolutional network
Ke et al. A review of methods for automatic individual tree-crown detection and delineation from passive remote sensing
CN108154105B (en) Underwater biological detection and identification method and device, server and terminal equipment
CN113887459B (en) Open-pit mining area stope change area detection method based on improved Unet +
CN111986099A (en) Tillage monitoring method and system based on convolutional neural network with residual error correction fused
CN109977191B (en) Problem map detection method, device, electronic equipment and medium
Qu et al. A pedestrian detection method based on yolov3 model and image enhanced by retinex
Wang et al. Extraction of coastal raft cultivation area with heterogeneous water background by thresholding object-based visually salient NDVI from high spatial resolution imagery
CN113989662A (en) Remote sensing image fine-grained target identification method based on self-supervision mechanism
Ghorai et al. Extracting shoreline from satellite imagery for GIS analysis
CN108108731A (en) Method for text detection and device based on generated data
Xiao et al. Treetop detection using convolutional neural networks trained through automatically generated pseudo labels
Yue et al. Texture extraction for object-oriented classification of high spatial resolution remotely sensed images using a semivariogram
CN103946865B (en) Method and apparatus for contributing to the text in detection image
CN114399480A (en) Method and device for detecting severity of vegetable leaf disease
CN113887472A (en) Remote sensing image cloud detection method based on cascade color and texture feature attention
CN110991430A (en) Ground feature identification and coverage rate calculation method and system based on remote sensing image
CN110570442A (en) Contour detection method under complex background, terminal device and storage medium
CN115019181B (en) Remote sensing image rotating target detection method, electronic equipment and storage medium
CN111881706B (en) Living body detection, image classification and model training method, device, equipment and medium
CN111860465A (en) Remote sensing image extraction method, device, equipment and storage medium based on super pixels
Dong et al. A cloud detection method for GaoFen-6 wide field of view imagery based on the spectrum and variance of superpixels
CN112651351B (en) Data processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100190 Beijing, Haidian District Academy of Sciences, South Road, No. 2, block A, No. 313

Applicant after: MEGVII INC.

Applicant after: Beijing maigewei Technology Co., Ltd.

Address before: 100190 Beijing, Haidian District Academy of Sciences, South Road, No. 2, block A, No. 313

Applicant before: MEGVII INC.

Applicant before: Beijing aperture Science and Technology Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant