CN106796647B - Scene text detecting system and method - Google Patents

Scene text detecting system and method Download PDF

Info

Publication number
CN106796647B
CN106796647B CN201480081759.5A CN201480081759A CN106796647B CN 106796647 B CN106796647 B CN 106796647B CN 201480081759 A CN201480081759 A CN 201480081759A CN 106796647 B CN106796647 B CN 106796647B
Authority
CN
China
Prior art keywords
text
text component
confidence score
ingredient
component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201480081759.5A
Other languages
Chinese (zh)
Other versions
CN106796647A (en
Inventor
汤晓鸥
黄韡林
乔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Publication of CN106796647A publication Critical patent/CN106796647A/en
Application granted granted Critical
Publication of CN106796647B publication Critical patent/CN106796647B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A kind of scene text detecting system is disclosed.The system may include maximum stable extremal region (MSER) detector, trained convolutional neural networks (CNN) grader, selector and constructor.Maximum stable extremal region (MSER) detector can be configured to generate the set of text component from image, wherein the text component generated is arranged in MSER tree structures.Trained convolutional neural networks (CNN) grader can be configured to each text component being assigned to ingredient confidence score in the set of text component.Selector can be configured to have the text component of the higher ingredient confidence score in distributed ingredient confidence score from selection in text component set.Constructor is configured to selected text component to construct final text.A kind of scene text detection method is also disclosed.

Description

Scene text detecting system and method
Technical field
Invention relates generally to the fields of image procossing, more specifically to scene text detecting system and scene text This detection method.
Background technology
In recent years, the fast development with high-performance mobile and wearable device and universal, scene text detection and positioning It is received more and more attention because of its a large amount of potential application.Text in image usually contains important semantic information, institute It is extremely important for fully understanding image with the detection of text and identification.
Text Mode of the challenge from extreme diversity of scene text detection, highly complex background information and serious Real world influences.For example, it can be very small size or relative to background color to appear in the text in natural image Relatively low contrast, or even conventional text all can because intense illumination, masking or it is fuzzy due to be distorted.In addition, much noise and class text This exception (such as, window, leaf and brick) can be included in image background, and frequently result in detection process and occur very More false alarms.
Recently, the method for scene text detection mainly has two groups:Based on the method for sliding window with based on being connected into The method divided.Based on the method for sliding window by detecting text in all position sliding sub-windows of image with multiple scales Information.Text and non-textual information are then distinguished by trained grader, the trained grader usually using from The low-level features for the manual designs extracted in window, such as, SIFT and histograms of oriented gradients.Significant challenge is for handling The design of the local feature of the larger difference of text and height for scanning a large amount of windows calculate demand, which is directed to Image with N number of pixel rises to N2.
Method based on Connected component is first separated text and non-textual pixel by running quick rudimentary filter, The text pixel grouping that will then have similarity (for example, intensity, stroke width or color), to construct text component time Choosing.Stroke width transformation (SWT) and maximum stable extremal region (MSER) are recent acquirement immense successes suitable for scene text The representative bottom filter of two of detection.
MSER generally produces a large amount of non-textual ingredient, cause text in MSER ingredients and it is non-textual between ambiguity It is higher.Steadily separate they have become improve the method based on MSER performance critical issue.Although having been directed to Handle this problem, but the methods for MSER filterings most at present focus mostly on exploitation low-level image feature (such as, inspire characteristic or Geometric properties), to filter out non-textual ingredient.These low-level image features are not steady enough or not enough have discrimination, cannot be distinguished real Text and often with real text have it is similar inspiration or geometric properties class text exception.
Invention content
According to an embodiment of the present application, scene text detecting system is disclosed.The system includes:Maximum stable extremal region Detector is configured to generate text component set from image, wherein the text component generated is ranked into maximum stable pole It is worth region tree structure;Convolutional neural networks grader, the convolutional neural networks grader include two convolutional layers, at least one A average pond layer and support vector machine classifier, and be wherein, after each convolutional layer average pond layer and multiple filtering Device, the convolutional neural networks grader are configured to each text being assigned to ingredient confidence score in the text component set This ingredient;Selector, be configured to from the text component set selection with distribution ingredient confidence score in compared with The text component of high ingredient confidence score;And constructor, it is configured to construct final text using the text component of selection This, wherein the filter of the first convolutional layer of described two convolutional layers be configured to by using non-supervisory K mean values come according to from The image block set extracted in predetermined training set is learnt to generate response, and the second convolutional layer of described two convolutional layers Filter be configured to the support vector cassification error generated from the support vector machine classifier by backpropagation Learnt come the response based on generation to obtain the ingredient confidence score of the text component.
According to the embodiment of the present application, the convolutional neural networks grader is trained using predetermined training set, with distribution The confidence score.
According to the embodiment of the present application, the selector further includes:Caliberating device, be configured to based on the distribution at Divide confidence score and the maximum stable extremal region tree structure, incorrect link is calibrated from the selected text component Text component;And segmenting device, being configured to the text component of the incorrect link being divided into has higher ingredient The text component of confidence score.
According to the embodiment of the present application, the segmenting device further includes:Dimension cells are adjusted, are configured to be calibrated The text component of incorrect link be adjusted to predetermined size;Scanner, the text component being configured to after scanning adjustment size, To obtain the one-dimensional array of ingredient confidence score by sliding window;And recognition unit, it is configured to be based on described one Array is tieed up to identify the peak position of the text component of the incorrect link, the text component of the incorrect link is divided into tool There is the text component of higher ingredient confidence score.
Condition according to the embodiment of the present application, the text component for demarcating incorrect link includes:The text component The aspect ratio of width is more than 2;The text component has positive confidence score;And the text component is in the maximum The end node of stable extremal region tree structure, or with than all sons in the maximum stable extremal region tree structure The big confidence score of generation's node.
According to the embodiment of the present application, the constructor further includes:Pairing unit is configured to the selected text Two text components pairing that there is similar geometry and inspire property in ingredient;And combining unit, it is configured to by suitable Sequence merges the pairing with identical component and similar direction, to construct the final text.
According to an embodiment of the present application, open scene text detection method, and this method includes:Text is generated from image This component collections, wherein the text component generated is ranked into maximum stable extremal region tree structure;By trained convolution Ingredient confidence score is assigned to each text component in the text component set by neural network classifier;From the text The text component of higher ingredient confidence score in component collections in ingredient confidence score of the selection with distribution;And use choosing The text component taken constructs final text, wherein the convolutional neural networks grader include two convolutional layers, it is at least one Average pond layer and support vector machine classifier, and be average pond layer after wherein described each convolutional layer and have Multiple filters, the convolutional neural networks grader are trained to by following operation:Image is extracted from predetermined training set Set of blocks;By described two convolutional layers the first convolutional layer filter by using non-supervisory K mean values come according to described image Set of blocks is learnt to generate response;And backpropagation is passed through by the filter of the second convolutional layer of described two convolutional layers The support vector cassification error that is generated from the support vector machine classifier is learned according to the response of generation It practises to obtain the ingredient confidence score of the text component.
According to the embodiment of the present application, generation text component set includes from image:By using maximum stable extremal area Area detector generates the text component set from described image.
According to the embodiment of the present application, this method further includes:The convolutional neural networks point are trained using predetermined training set Class device, to distribute the ingredient confidence score.
According to the embodiment of the present application, each text ingredient confidence score being assigned in the text component set Ingredient includes:Ingredient confidence score and the maximum stable extremal region tree structure based on the distribution are described to calibrate The text component of incorrect link in the text component of selection;And by the text component of the incorrect link be divided into compared with The text component of high ingredient confidence score.
According to the embodiment of the present application, the text component of the incorrect link is divided into higher ingredient confidence score Text component further includes:The text component of the incorrect link of calibration is adjusted to predetermined size;The text after size is adjusted with scanning This ingredient, to obtain the one-dimensional array of ingredient confidence score by sliding window;And it is identified based on the one-dimensional array The peak position of the text component of the incorrect link, to be divided the text component of the incorrect link based on the peak position At the text component with higher confidence score.
Condition according to the embodiment of the present application, the text component for calibrating incorrect link includes:The text component Width aspect ratio be more than 2;The text component has positive confidence score;And the text component it is described most The end node of big stable extremal region tree structure, or with than all in the maximum stable extremal region tree structure The big confidence score of child node.
According to the embodiment of the present application, final text is constructed using the selected text component further includes:By the choosing The two text components pairing that there is similar geometry and inspire property for the text component selected;And in order will have it is identical at Divide and the text pairing in similar direction merges, to construct the final text.
Description of the drawings
The exemplary non-limiting embodiments of the present invention are described referring to the attached drawing below.Attached drawing is illustrative, and generally not In definite ratio.Same or like element on different figures quotes identical Ref. No..
Fig. 1 is the schematic diagram for the scene text detecting system for showing to meet embodiments herein.
Fig. 2 is the schematic diagram shown when the scene text detecting system for meeting some open embodiments is implemented in software.
Fig. 3 is the schematic diagram for showing to meet the convolutional neural networks grader of some open embodiments.
Fig. 4 is the schematic diagram for showing to meet the selector of the scene text detecting system of some open embodiments.
Fig. 5 is the schematic diagram for showing to meet the segmenting device of the selector of some open embodiments.
Fig. 6 is the schematic flow diagram for showing to meet the scene text detection method of some open embodiments.
Fig. 7 is the schematic flow diagram of the process for the selection text component for showing to meet some open embodiments.
Specific implementation mode
Now with detailed reference to exemplary embodiment, the example of these embodiments will illustrate in the accompanying drawings.When appropriate It waits, identical Ref. No. refers to same or similar part always in attached drawing.Fig. 1 is to show to meet showing for some open embodiments The schematic diagram of example property scene text detecting system 1000.
With reference to figure 1, in the case where system 1000 is implemented by hardware, it may include that maximum stable extremal region (MSER) is examined Survey device 100, convolutional neural networks (CNN) grader 200, selector 300 and constructor 400.
It will be appreciated that a certain hardware, software or combination thereof can be used to implement for system 1000.In addition, the reality of the present invention It applies example and may be adapted to computer program product, the computer program product is embodied in one or more containing computer program code (include but not limited to, magnetic disk storage, CD-ROM, optical memory etc.) on a computer readable storage medium.Fig. 2 is to show Meet the schematic diagram when scene text detecting system 1000 that some disclose embodiments is implemented in software.
With software implementation system 1000, system 1000 may include all-purpose computer, computer cluster, mainstream Computer is exclusively used in providing the computing device or computer network of online content, and the computer network includes one group to collect In or distribution mode operation computer.As shown in Fig. 2, system 1000 may include one or more processors (processor 102, 104,106 etc.), the information exchange between the various devices of memory 112, storage device 116 and promotion system 1000 is total Line.Processor 102 to 106 may include central processing unit (" CPU "), graphics processing unit (" GPU ") or other are suitable Information processing unit.According to the type of used hardware, processor 102 to 106 may include one or more printed circuit boards And/or one or more microprocessors chip.102 to 106 executable computer program of processor instruction sequence, with execute by The various methods illustrated in further detail below.
Memory 112 can especially include random access memory (" RAM ") and read-only memory (" ROM ").Computer journey Sequence instruction can be stored by memory 112, access and read from the memory, so as to by one in processor 102 to 106 or Multiple processors execute.For example, memory 112 can store one or more software applications.In addition, memory 112 can store it is whole The one of the software application that a software application or only storage can be executed by one or more of processor 102 to 106 processor Part.It should be noted that although only showing a frame in Fig. 2, memory 112 may include being mounted on central processing unit or different meters Calculate multiple physical units on device.
In the embodiment shown in fig. 1, MSER detectors can be configured to generate text component set from image, and The text component of generation is ranked into MSER tree structures.Extremal region is limited to the Connected component of such a image by MSER, That is, the pixel of the image has intensity contrast relative to boundary pixel.Intensity contrast is measured by increasing intensity value, and And control area area.Low contrast value will generate a large amount of rudimentary region, and the rudimentary region passes through small between pixel Intensity difference is separated.When contrast value increases, rudimentary region can with when prime accumulation or with other even lower level areas Domain merges, to construct more advanced region.Therefore, extremal region tree can be constructed when reaching maximum-contrast.If extremal region Variation be less than its father node and child node, then extremal region is defined as MSER.Therefore, MSER can be considered size and exist The special extremal region remained unchanged in a certain range of threshold value.
In embodiment, the independent character of each of text in image can be detected as by MSER detectors extremal region or MSER.Two notable advantages make MSER detectors obtain immense success in scene text detection.First, MSER detector is High speed detector and the pixel quantity in image can be calculated in linear session.Second, it is that have very strong ability low to handle The powerful detector of quality text (such as, low contrast, low resolution and fuzzy).By this ability, MSER can be detected Most scene texts in natural image.
According to embodiment, CNN graders 200 can be configured to ingredient confidence score being assigned in text component set Each text component.As shown in figure 3, CNN graders 200 may include at least one convolutional layer, at least one average pond layer and Support vector machines (SVM) grader.It is average pond layer after each convolutional layer, and there are multiple filters.For example, as schemed Shown in 3, CNN graders include two convolutional layers, and the second layer stacks on the first layer.The number of filter for two layers Amount is 96 and 64 respectively.
In embodiment, CNN graders are trained using predetermined training set, to distribute text component confidence score.When When training CNN graders, the filter of the first convolutional layer of two convolutional layers is configured to carry out root by using non-supervisory K mean values Learnt to generate response, and the second of two convolutional layers according to image block (patch) set extracted in predetermined training set The filter of convolutional layer is configured to the svm classifier error that is generated from SVM classifier by backpropagation to be based on being given birth to At response learnt to obtain the confidence score of text component.For example, during training process shown in Fig. 3, extraction Image block has fixed dimension 32 × 32.The filter of first convolutional layer is configured to learn by using non-supervisory K mean values Image block set, to generate response.For example, as shown, training first by using the variants of K mean values with non supervision model Layer, to learn filter set from the set of 8 × 8 image blocksAnd k is the dimension of the image block for convolution Degree is be used for 8 × 8 64 herein.N1 is the quantity 96 of the filter in first layer.The response (r) of first layer is calculated as:
R=max 0, | DTx-θ|} (1)
WhereinIt is the input vector of 8 × 8 image blocks, and θ=0.5.The first layer response diagram of gained has ruler Very little 25 × 25 × 96.Then, the average pond with window size 5 × 5 is applied to response diagram, to obtain with size 5 × 5 × 96 reduction figure.
In embodiment, the filter of the second convolutional layer is configured to generate from SVM classifier by backpropagation Svm classifier error come learn generate response, to obtain the ingredient confidence score of text component.The final output of two layers is 64 Dimensional feature vector, 64 dimensional feature vector are input to SVM classifier to obtain the final confidence score of text component.In the second layer Parameter full-mesh, and be trained by backpropagation svm classifier error.
Fig. 4 shows the selector 300 for meeting the scene text detecting system 1000 of some open embodiments.As shown, Selector 300 may include caliberating device 310 and segmenting device 320.In embodiment, caliberating device 310 can be configured to be based on The ingredient confidence score that is distributed and MSER tree structures calibrate the text of the incorrect link in selected text component Ingredient.Segmenting device 320 can be configured to by the text component of incorrect link be divided into the text with higher confidence score at Point.
In the embodiment shown in fig. 5, it is single to may also include adjustment dimension cells 321, scanner 322 and identification for segmenting device Member 323.The text component that adjustment dimension cells 321 can be configured to the incorrect link that will be calibrated is adjusted to predetermined size. Scanner 322 can be configured to the text component after scanning adjustment size, to obtain ingredient confidence score by sliding window One-dimensional array.Recognition unit 323 can be configured to identify the peak position of the text component of incorrect link based on one-dimensional array, The text component of incorrect link is divided into the text component with higher confidence score.
In embodiment, there are three significant characteristics for the ingredient tool of incorrect link.First, it often have it is higher in length and breadth Than the width of its bounding box is more much longer than height.Second, other non-textual ingredients are different from, for example, usually being classified by CNN The scoring of device 200 is the long horizontal line or item of negative the value of the confidence, and the ingredient of incorrect link actually includes some text messages, but not It is very strong, because CNN graders are the training on monocase ingredient.Third, MSER trees it is high-level in ingredient usually wrap Multiple text characters are included, for example, the ingredient in the root of tree.In these ingredients it is most by their byte point ingredient just It really separates, the byte point ingredient usually has confidence score more higher than their father node.
Therefore, the condition of the text component for limiting incorrect link includes:1) width of text component is vertical Horizontal ratio is more than 2;2) text component has positive confidence score;And 3) end node or tool of the text component in MSER tree structures There is the confidence score bigger than all child nodes in MSER tree structure.Example for searching for the ingredient connected with segmentation errors Property algorithm provides as follows.
According to embodiment, constructor 400 may also include pairing unit and combining unit (not shown).Pairing unit can by with It is set to two text components pairing that there is similar geometry and inspire property of selected text component.Combining unit can quilt Be configured in order by with identical component and be similarly oriented to merge, to construct final text.
Fig. 6 is the schematic flow diagram for showing to meet the scene text detection method 2000 of some open embodiments.Hereafter may be used Method 2000 is described in detail with reference to figure 6.
At step S210, text component set is generated from image.In embodiment, by using maximum stable extremal Region (MSER) detector generates text component set from image.The text component of generation is ranked into MSER tree structures.
At step S220, each text component for ingredient confidence score being assigned in text component set.For example, by Ingredient confidence score is assigned to each text component by trained convolutional neural networks (CNN) grader.In embodiment, Convolutional neural networks grader is trained using predetermined training set, to be distributed into a point confidence score.
According to embodiment, convolutional neural networks grader includes at least one convolutional layer, at least one average pond and branch Vector machine (SVM) grader is held, and is wherein average pond layer after each of convolutional layer and there are multiple filters. For example, convolutional neural networks grader may include two convolutional layers.During training process, the figure from predetermined training set is extracted As set of blocks.Then, the filter of the first convolutional layer of two convolutional layers is come by using non-supervisory K mean values according to image block collection Conjunction learnt with generate response, and the filter of the second convolutional layer of two convolutional layers by backpropagation by svm classifier The svm classifier error that device generates is learnt based on the response generated, to obtain the confidence score of text component.
At step S230, there is the higher ingredient in distributed ingredient confidence score from selection in text component set The text component of confidence score.Here is the possibility mode of text component of the selection with higher confidence score.For example, such as Fig. 7 It is shown, based on the ingredient confidence score and MSER tree structures distributed, demarcates and make mistake among selected text component The text component of connection.The condition of text component for calibrating incorrect link includes:1) width of text component Aspect ratio is more than 2;2) text component has positive confidence score;And 3) text component MSER tree structures end node or With the confidence score bigger than all child nodes in MSER tree structure.
If at the text component for belonging to incorrect link, it is adjusted to predetermined size.Adjust the text after size Ingredient is scanned by (for example) sliding window, to obtain the one-dimensional array of ingredient confidence score.For example, non-maximum value is inhibited (NMS) method is applied to the one-dimensional array of ingredient confidence score, to estimate multiple character positions.Mistake is identified based on one-dimensional array The peak position of the text component of misconnection has higher confidence to be divided into the text component of incorrect link based on peak position The text component of score.
At step S240, final text is constructed using selected text component.When constructing final text, by institute Two text components pairing that there is similar geometry and inspire property of the text component of selection, and will have identical component and Be similarly oriented in order with construct final text to merge, to construct final text.
By the scene text detecting system and method for the application, efficiently utilize the deep learning model of large capacity with Solve two main problems of the current MSER methods for text detection.In addition, the system of the application can be with relatively strong steady Property and high resolving ability, with by be incorporated to MSER detectors and trained CNN graders distinguish text with it is a large amount of non- Text component.Sliding window model is combined together with CNN graders, is correctly positioned with to further increase MSER detectors and is chosen The ability of war text component.The present processes make much progress than current method in standard basis data set.
Although the preferable example of the present invention has been described, after understanding basic conception of the present invention, those skilled in the art Member can be changed or change to these examples.The appended claims are intended to preferably show including what is fallen within the scope of the present invention Example and all changes or change.
Obviously, without departing from the spirit and scope of the present invention, those skilled in the art can to the present invention into Row variation or change.Therefore, if these variations or change belong to the range of claims and equivalence techniques, they It can fall within the scope of the present invention.

Claims (13)

1. a kind of scene text detecting system comprising:
Maximum stable extremal region detector is configured to generate text component set from image, wherein the text generated Constituent order is at maximum stable extremal region tree structure;
Convolutional neural networks grader, the convolutional neural networks grader include two convolutional layers, at least one average pond Layer and support vector machine classifier, and be wherein, after each convolutional layer average pond layer and multiple filters, the volume Product neural network classifier is configured to each text component being assigned to ingredient confidence score in the text component set;
Selector, be configured to from the text component set selection with distribution ingredient confidence score in it is higher at Divide the text component of confidence score;And
Constructor is configured to construct final text using the text component of selection,
Wherein, the filter of the first convolutional layer of described two convolutional layers is configured to carry out basis by using non-supervisory K mean values The image block set extracted from predetermined training set is learnt to generate response, and the second convolution of described two convolutional layers The filter of layer is configured to miss by the support vector cassification that backpropagation is generated from the support vector machine classifier Difference is learnt come the response based on generation to obtain the ingredient confidence score of the text component.
2. scene text detecting system according to claim 1 makes a reservation for wherein the convolutional neural networks grader uses Training set is trained, to distribute the confidence score.
3. scene text detecting system according to claim 1, wherein the selector further includes:
Caliberating device is configured to the ingredient confidence score based on the distribution and the tree-like knot of the maximum stable extremal region Structure calibrates the text component of incorrect link from the selected text component;And
Segmenting device is configured to the text component of the incorrect link being divided into the text with higher ingredient confidence score This ingredient.
4. scene text detecting system according to claim 3, wherein the segmenting device further includes:
Dimension cells are adjusted, the text component for being configured to the incorrect link that will be calibrated is adjusted to predetermined size;
Scanner, the text component being configured to after scanning adjustment size, to obtain ingredient confidence point by sliding window Several one-dimensional arrays;And
Recognition unit is configured to identify the peak position of the text component of the incorrect link based on the one-dimensional array, The text component of the incorrect link is divided into the text component with higher ingredient confidence score.
5. scene text detecting system according to claim 3, wherein the item of the text component for demarcating incorrect link Part includes:
The aspect ratio of the width of the text component is more than 2;
The text component has positive confidence score;And
The text component the maximum stable extremal region tree structure end node, or with than the maximum stable The big confidence score of all child nodes in extremal region tree structure.
6. scene text detecting system according to claim 1, wherein the constructor further includes:
Pairing unit is configured to two texts for having similar geometry and inspiring property in the selected text component This ingredient matches;And
Combining unit is configured in order merge the pairing with identical component and similar direction, with described in construction most Whole text.
7. a kind of scene text detection method comprising:
Text component set is generated from image, wherein the text component generated is ranked into the tree-like knot of maximum stable extremal region Structure;
Ingredient confidence score is assigned to by trained convolutional neural networks grader every in the text component set A text component;
From the text of the higher ingredient confidence score in ingredient confidence score of the selection with distribution in the text component set Ingredient;And
Final text is constructed using the text component of selection,
The wherein described convolutional neural networks grader includes two convolutional layers, at least one average pond layer and support vector machines Grader, and be average pond layer after wherein described each convolutional layer and there is multiple filters, the convolutional Neural Network classifier is trained to by following operation:Image block set is extracted from predetermined training set;By described two convolutional layers The filter of the first convolutional layer learnt to generate sound according to described image set of blocks by using non-supervisory K mean values It answers;And by described two convolutional layers the second convolutional layer filter by backpropagation from the support vector machine classifier The support vector cassification error of middle generation is learnt to obtain the text component according to the response of generation The ingredient confidence score.
8. scene text detection method according to claim 7, wherein generation text component set includes from image:
The text component set is generated from described image by using maximum stable extremal region detector.
9. scene text detection method according to claim 7, further includes:
The convolutional neural networks grader is trained using predetermined training set, to distribute the ingredient confidence score.
10. scene text detection method according to claim 7, wherein described be assigned to the text by ingredient confidence score Each text component in this component collections includes:
Ingredient confidence score and the maximum stable extremal region tree structure based on the distribution calibrate the selection Text component in incorrect link text component;And
The text component of the incorrect link is divided into the text component with higher ingredient confidence score.
11. scene text detection method according to claim 10, wherein the text component of the incorrect link is divided Further include for the text component with higher ingredient confidence score:
The text component of the incorrect link of calibration is adjusted to predetermined size;With
Text component after scanning adjustment size, to obtain the one-dimensional array of ingredient confidence score by sliding window;And
The peak position that the text component of the incorrect link is identified based on the one-dimensional array, will to be based on the peak position The text component of the incorrect link is divided into the text component with higher confidence score.
12. scene text detection method according to claim 10, wherein the text component for calibrating incorrect link Condition include:
The aspect ratio of the width of the text component is more than 2;
The text component has positive confidence score;And
The text component the maximum stable extremal region tree structure end node, or with than the maximum stable The big confidence score of all child nodes in extremal region tree structure.
13. scene text detection method according to claim 7, wherein being constructed most using the selected text component Whole text further includes:
By two text components pairing that there is similar geometry and inspire property of the selected text component;And
The text with identical component and similar direction is matched in order and is merged, to construct the final text.
CN201480081759.5A 2014-09-05 2014-09-05 Scene text detecting system and method Active CN106796647B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/000830 WO2016033710A1 (en) 2014-09-05 2014-09-05 Scene text detection system and method

Publications (2)

Publication Number Publication Date
CN106796647A CN106796647A (en) 2017-05-31
CN106796647B true CN106796647B (en) 2018-09-14

Family

ID=55438963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201480081759.5A Active CN106796647B (en) 2014-09-05 2014-09-05 Scene text detecting system and method

Country Status (2)

Country Link
CN (1) CN106796647B (en)
WO (1) WO2016033710A1 (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10032110B2 (en) 2016-12-13 2018-07-24 Google Llc Performing average pooling in hardware
US10037490B2 (en) 2016-12-13 2018-07-31 Google Llc Performing average pooling in hardware
CN107704509B (en) * 2017-08-31 2021-11-02 北京联合大学 Reordering method combining stable region and deep learning
CN110135446B (en) * 2018-02-09 2021-01-22 北京世纪好未来教育科技有限公司 Text detection method and computer storage medium
FR3079056A1 (en) * 2018-03-19 2019-09-20 Stmicroelectronics (Rousset) Sas METHOD FOR CONTROLLING SCENES DETECTION BY AN APPARATUS, FOR EXAMPLE A WIRELESS COMMUNICATION APPARATUS, AND APPARATUS THEREFOR
CN109086663B (en) * 2018-06-27 2021-11-05 大连理工大学 Natural scene text detection method based on scale self-adaption of convolutional neural network
CN109816022A (en) * 2019-01-29 2019-05-28 重庆市地理信息中心 A kind of image-recognizing method based on three decisions and CNN
CN110348280A (en) * 2019-03-21 2019-10-18 贵州工业职业技术学院 Water book character recognition method based on CNN artificial neural
WO2020218111A1 (en) * 2019-04-24 2020-10-29 富士フイルム株式会社 Learning method and device, program, learned model, and text generation device
CN110321893A (en) * 2019-06-27 2019-10-11 电子科技大学 A kind of scene text identification network focusing enhancing
CN110516554A (en) * 2019-07-31 2019-11-29 杭州电子科技大学 A kind of more scene multi-font Chinese text detection recognition methods
CN112183523A (en) * 2020-12-02 2021-01-05 北京云测信息技术有限公司 Text detection method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136523A (en) * 2012-11-29 2013-06-05 浙江大学 Arbitrary direction text line detection method in natural image

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102201053B (en) * 2010-12-10 2013-07-24 上海合合信息科技发展有限公司 Method for cutting edge of text image
US9916538B2 (en) * 2012-09-15 2018-03-13 Z Advanced Computing, Inc. Method and system for feature detection
US9262699B2 (en) * 2012-07-19 2016-02-16 Qualcomm Incorporated Method of handling complex variants of words through prefix-tree based decoding for Devanagiri OCR

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136523A (en) * 2012-11-29 2013-06-05 浙江大学 Arbitrary direction text line detection method in natural image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于极大稳定极值区的视频文本检测算法研究;陈丽娇;《中国优秀硕士学位论文全文数据库》;20121015;摘要、正文第16-18,33-34页 *
面向自然场景的端对端英文文字识别研究;廖威敏;《中国优秀硕士学位论文全文数据库》;20140815;摘要、正文第20-21,37-48,57-58页 *

Also Published As

Publication number Publication date
WO2016033710A1 (en) 2016-03-10
CN106796647A (en) 2017-05-31

Similar Documents

Publication Publication Date Title
CN106796647B (en) Scene text detecting system and method
Yan et al. A fast uyghur text detector for complex background images
WO2020221298A1 (en) Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
US10229347B2 (en) Systems and methods for identifying a target object in an image
US10013636B2 (en) Image object category recognition method and device
Mathias et al. Handling occlusions with franken-classifiers
US8737739B2 (en) Active segmentation for groups of images
CN110287328B (en) Text classification method, device and equipment and computer readable storage medium
US9367766B2 (en) Text line detection in images
US9201879B2 (en) Method, apparatus and system for generating a feature vector
Zamberletti et al. Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions
KR101114135B1 (en) Low resolution ocr for camera acquired documents
CN107835113A (en) Abnormal user detection method in a kind of social networks based on network mapping
JP6188976B2 (en) Method, apparatus and computer-readable recording medium for detecting text contained in an image
JP6897749B2 (en) Learning methods, learning systems, and learning programs
CN104933420B (en) A kind of scene image recognition methods and scene image identify equipment
CN109871803A (en) Robot winding detection method and device
CN101334786A (en) Formulae neighborhood based data dimensionality reduction method
Zhu et al. Deep residual text detection network for scene text
Kalyoncu et al. GTCLC: leaf classification method using multiple descriptors
CN113420669A (en) Document layout analysis method and system based on multi-scale training and cascade detection
CN111626250A (en) Line dividing method and device for text image, computer equipment and readable storage medium
Kim et al. A rule-based method for table detection in website images
CN116091414A (en) Cardiovascular image recognition method and system based on deep learning
Xu et al. Robust seed localization and growing with deep convolutional features for scene text detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant