CN106570497A - Text detection method and device for scene image - Google Patents
Text detection method and device for scene image Download PDFInfo
- Publication number
- CN106570497A CN106570497A CN201610878795.5A CN201610878795A CN106570497A CN 106570497 A CN106570497 A CN 106570497A CN 201610878795 A CN201610878795 A CN 201610878795A CN 106570497 A CN106570497 A CN 106570497A
- Authority
- CN
- China
- Prior art keywords
- text
- frame
- text candidates
- scene image
- candidates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a text detection method for a scene image. The method comprises the steps of obtaining a scene image, and extracting the convolution features of the scene image through a convolutional neural network model; loading the convolution features of the scene into a recursive neural network model to generate a text candidate frame sequence, and subjecting the text candidate frame sequence to post-treatment to obtain a text line region. According to the technical scheme of the invention, the convolution features are trained through the recursive neural network model, and then the context text information of the convolution features are trained. Therefore, the text detection robustness is improved. Meanwhile, the method is not limited to a single language classifier, and can be applied to the text detection process of multiple languages. Moreover, complex priori conditions are not manually preset, so that the detection stability is improved under different scenes. Based on the computing resources of overlapped regions, the computational efficiency is effectively improved. Based on an end-to-end model, the calculating and processing course is simplified.
Description
Technical field
The invention belongs to field of image detection, more particularly to a kind of Method for text detection and device of scene image.
Background technology
Word in scene image is effectively recognized, can be people's bringing great convenience property of life.Such as
Relevant information of vehicle etc. can be automatically searched according to license plate number to the knowledge figure of the contents such as the car plate in image.And it is accurate
It is text filed in true detection and positioning image, it is basis and the premise of unrestricted natural scene Chinese version identification.
At present for the method for text detection, the Method for text detection based on connection component is generally comprised, and based on cunning
The Method for text detection of dynamic window.
Wherein, the Method for text detection based on connection component, with a quick method (such as MSER (Maximally
Stable Extremal Regions, maximum stable extremal region), SWT (applying paintbrush width) etc.) go to separate text pixel
With non-textual pixel.Then text pixel is stroke or character candidates by wolfishly group is beaten using low-level image feature.The bottom for using
Layer is generally characterized by gray value, color or gradient etc..Based on the Method for text detection of connection component, to non-interconnected word (ratio
Such as be right text) can not effectively be detected.
Based on the Method for text detection of sliding window, one window of slip intensive in the picture is referred to, while in sliding window
Each position at applying detection algorithm (using manual designs low-level image feature or CNN (convolutional neural networks)).Based on cunning
Although the method for dynamic window not there is a problem of not supporting non-interconnected word, need to process Issues On Multi-scales.In order to solve
Issues On Multi-scales, generally require and make good use of the sliding window of several yardsticks and slide respectively over image, can so increase the amount of calculation of text detection.
Because current Method for text detection is normally based on the grader of single character, candidate is acted on sliding window
Frame, when scene is complicated, such as illumination, shade, the impact of natural conditions such as blocks, and affects the robustness of character classification, it is impossible to
Single treatment includes the image of polyglot text, when detecting under different scenes, the less stable of detection.
The content of the invention
It is an object of the invention to provide a kind of Method for text detection of scene image, is examined with the text for solving prior art
In survey method, the poor robustness of character classification, it is impossible to which single treatment includes the image of polyglot text is examined under different scenes
The problem of the less stable of survey.
In a first aspect, embodiments providing a kind of Method for text detection of scene image, methods described includes:
Scene image is obtained, by the convolution feature of scene image described in convolutional neural networks model extraction;
The convolution feature of the scene image is sent into into recurrent neural networks model, text candidates frame sequence is generated;
Post processing is carried out to the text candidates frame sequence, line of text region is obtained.
It is described by convolutional neural networks mould with reference in a first aspect, in the first possible implementation of first aspect
Type extracts the convolution characterization step of the scene image to be included:
Convolutional neural networks computing is carried out to the scene image by VGG convolutional neural networks, the scene graph is obtained
The convolutional layer of picture;
The convolution feature of last convolutional layer of the scene image specific region is obtained using predetermined sliding window.
With reference to second possible implementation of first aspect, in second possible implementation of first aspect, institute
State and the convolution feature of the scene image is sent into into recurrent neural networks model, generating text candidates frame sequence step includes:
The convolution feature is sent into into recurrent neural networks model by row, using each convolution feature as shot and long term mould is remembered
The time frame input of type is trained, and is fixed the text candidates anchor frame of width;
The lower edges of the text candidates anchor frame of the fixed width are returned, detected and connected, text is generated and is waited
Select frame sequence.
With reference in a first aspect, in the third possible implementation of first aspect, waiting to the text of the fixed width
The lower edges for selecting anchor frame are returned, detected and connected, and generating text candidates frame sequence step includes:
The supervision message of the text candidates anchor frame is obtained, the supervision message includes:Text candidates anchor frame is text
The first offset distance, the text candidates anchor frame distance of score value, text candidates anchor frame apart from oneself nearest text row bound upper end
Second offset distance of oneself nearest text row bound lower end;
According to the supervision message of the text candidates anchor frame, score value is selected more than the text candidates anchor frame of predetermined value, knot
First offset distance and the second offset distance are closed, text candidates frame sequence is generated.
Second of the first possible implementation, first aspect with reference to first aspect, first aspect may realization side
Formula, the third possible implementation of first aspect, it is described to the text in the 4th kind of possible implementation of first aspect
This candidate frame sequence carries out post processing, and obtaining line of text region step includes:
According to the difference in height and horizontal range of the text candidates frame sequence, select text box in horizontal edge it
Between horizontal range, and in the text candidates frame at vertical edge, generate line of text region.
Second aspect, embodiments provides a kind of text detection device of scene image, and described device includes:
Convolution feature acquiring unit, for obtaining scene image, by scene graph described in convolutional neural networks model extraction
The convolution feature of picture;
Text candidates frame signal generating unit, for the convolution feature of the scene image to be sent into into recurrent neural networks model,
Generate text candidates frame sequence;
Line of text area acquisition unit, for carrying out post processing to the text candidates frame sequence, obtains line of text area
Domain..
With reference to second aspect, in the first possible implementation of second aspect, the convolution feature acquiring unit bag
Include:
Convolutional layer obtains subelement, for carrying out convolutional Neural net to the scene image by VGG convolutional neural networks
Network computing, obtains the convolutional layer of the scene image;
Convolution feature is slided and takes subelement, for using predetermined sliding window obtain the scene image specific region last
The convolution feature of individual convolutional layer.
With reference to second aspect, in second possible implementation of second aspect, the text candidates frame signal generating unit
Including:
Text candidates anchor frame training subelement, for the convolution feature to be sent into into two-way shot and long term memory modelses, will be every
Individual convolution feature is trained as the time frame input of shot and long term memory modelses, is fixed the text candidates anchor frame of width;
Text candidates frame detection sub-unit, for carrying out back to the lower edges of the text candidates anchor frame of the fixed width
Return, detect and connect, generate text candidates frame sequence.
With reference to second possible implementation of second aspect, in the third possible implementation of second aspect, institute
Stating text candidates frame detection sub-unit includes:
Supervision message acquisition module, for obtaining the supervision message of the text candidates anchor frame, the supervision message includes:
Text candidates anchor frame is the score value of text, text candidates anchor frame apart from oneself nearest text row bound upper end the first offset distance
From, text candidates anchor frame from oneself nearest text row bound lower end the second offset distance;
Comparison module is selected, for according to the supervision message of the text candidates anchor frame, selecting score value to be more than predetermined value
Text candidates anchor frame, with reference to first offset distance and the second offset distance, generate text candidates frame sequence.
Second of the first possible implementation, second aspect with reference to second aspect, second aspect may realization side
Formula, the third possible implementation of second aspect, in the 4th kind of possible implementation of second aspect, the line of text area
Domain acquiring unit specifically for:
According to the difference in height and horizontal range of the text candidates frame sequence, select text box in horizontal edge it
Between horizontal range, and in the text candidates frame at vertical edge, generate line of text region.
In the present invention, the convolution feature of scene image is extracted by convolutional neural networks, by recurrent neural network pair
The convolution feature is trained, and obtains text candidates frame sequence, and carries out post processing to the text candidates frame sequence, generates
Line of text region.Because this method is trained by recurrent neural networks model to convolution feature, it is possible to use convolution feature
Text message up and down be trained, be conducive to improving the robustness of text detection, and be not limited to single language classification device,
The detection requirement of multi-language text is adapted to, and complicated priori conditions need not be manually set, it is favourable under different scenes
In the stability for improving detection.
Description of the drawings
Fig. 1 is the flowchart of the Method for text detection of scene image provided in an embodiment of the present invention;
Fig. 2 is the schematic network structure of the text detection of scene image provided in an embodiment of the present invention;
Fig. 3 is detection process schematic diagram provided in an embodiment of the present invention;
Fig. 4 is the structural representation of the text detection device of scene image provided in an embodiment of the present invention.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, and
It is not used in the restriction present invention.
The purpose of the embodiment of the present invention is the Method for text detection and device for providing a kind of scene image, existing to solve
In technology for the Method for text detection of scene image in, be typically based on the grader of single character, acted on sliding window
Candidate frame, it is impossible to enough make full use of context and sequence information, and when scene is more complicated, such as illumination, shade, block
Etc. the impact of natural conditions, monocase grader does not have good robustness, and at present overwhelming majority grader is all base
In monolingual grader, it is impossible to one-time detection process polyglot text, and most methods have it is troublesome after
Process, there are the priori conditions being much manually set, in the scene of different scene or complexity, detector does not have stable
Property, in addition, most Method for text detection have very complicated flow process and it is artificial participate in, run time is longer, realizes valency
Value is limited.The present invention is improved for drawbacks described above, it is proposed that it is a kind of end to end, multi-language text can be detected, and
The Method for text detection of the high scene image of detection efficiency, is described in detail below to this method:
What Fig. 1 showed the Method for text detection of scene image provided in an embodiment of the present invention realizes flow process, and details are as follows:
In step S101, scene image is obtained, by the convolution of scene image described in convolutional neural networks model extraction
Feature.
Specifically, scene image described in the embodiment of the present invention, can be dynamic video image, it is also possible to the static state of acquisition
Photo.For dynamic video image, the frame in video can be extracted, calculation process is carried out to the frame in video.
The convolutional neural networks, for extracting intensive convolution feature.As one kind of the invention preferred embodiment,
The convolutional neural networks can select the convolutional neural networks of the VGG network architectures, and convolution feature is extracted.
It is a kind of preferred embodiment in, the convolution by scene image described in convolutional neural networks model extraction is special
Levying step includes:
Convolutional neural networks computing is carried out to the scene image by VGG convolutional neural networks, the scene graph is obtained
The convolutional layer of picture;
The convolution feature of last convolutional layer of the scene image specific region is obtained using predetermined sliding window.
The specific region, can be the image-region at the word place tentatively identified by VGG models.
The VGG convolutional neural networks can select the VGG16 network architectures, and the default sliding window can be 3*3 sizes
Sliding window.As shown in Fig. 2 from VGG16 [10], generating multiple convolutional layers, and (such as can be to last convolutional layer
Con5 characteristic pattern) carries out slide window processing.The sliding window can select the sliding window of 3*3, can obtain sliding window operation each time right
The convolution feature answered.
Wherein, the VGG network architectures and google net are the image classification models of two quasi-representatives.This two class models knot
It is deep learning that structure has a common feature.With unlike googlenet, VGG inherits the one of lenet and alexnet
A little frameworks.It is especially similar with alexnet frameworks.VGG can have 5 or more convolutional layer, 2 layers of fc characteristics of image, one layer
Fc characteristic of divisions, can regard as alex net 8 parts altogether as.According to front 5 convolutional layers.Difference in each layer
Configuration.
In step s 102, the convolution feature of the scene image is sent into into recurrent neural networks model, generates text and wait
Select frame sequence.
The acquisition order of the convolution feature in the scene image, successively send the convolution feature of the scene image
Enter in recurrent neural networks model and be trained.Wherein, the recurrent neural networks model can remember mould for two-way shot and long term
Type LSTM (Long Short-Term Memory) is a kind of time recurrent neural network.
Specifically, the convolution feature by the scene image sends into recurrent neural networks model, generates text candidates
Frame sequence step can include:
The convolution feature is sent into into recurrent neural networks model by row, the recurrent neural networks model can be two-way
Shot and long term memory modelses, are trained each convolution feature as the time frame input of shot and long term memory modelses, are fixed
The text candidates anchor frame of width;
The lower edges of the text candidates anchor frame of the fixed width are returned, detected and connected, text is generated and is waited
Select frame sequence.
Specifically, the frame of text candidates anchor described in the embodiment of the present invention, to refer to and confined certain area according to fixed width
Frame, but be not necessarily just belonging in text candidates anchor frame text filed.The text candidates frame sequence is represented through to text
After the detection of candidate anchor frame, filter out and meet the region that one or more text candidates anchor frame of text requirement is constituted.And
The line of text region subsequently mentioned, then refer to the standard text area obtained after the accurate adjustment that post processing is carried out by text candidates frame.
Last convolutional layer that we obtain convolutional neural networks carries out sliding window and extracts after convolution feature, and convolution is special
Levy by row input into the two-way shot and long term memory modelses LSTM of recurrent neural network, the order that will be extracted according to sliding window, successively
It is input into into two-way shot and long term memory modelses LSTM.So, as shown in Fig. 2 last convolutional layer of convolutional neural networks
Every a line on characteristic image, all as the time series of a two-way shot and long term memory modelses LSTM, sequentially inputs to two-way
In shot and long term memory modelses LSTM.So, each point on characteristic image, is equivalent to two-way shot and long term memory modelses LSTM
In each time frame.By such design, when our network just can be judged present frame, using upper frame
With the information of lower frame so that it is more accurate to judge.
As shown in Fig. 2 there are 256 outputs in the two-way shot and long term memory modelses LSTM layers, and may be coupled to one
It is individual to have 512 full connections for exporting.Can connect three output layers behind this connection.These three output layers can respectively predict text
This candidate anchor frame is the probability (can be represented by score value Score) of the text candidates anchor frame with text, on vertical direction
Vertical coordinate (Vertical coordinate) and horizontal level skew (Side-refinement), and export it is fixed wide
The sequence of the text candidates frame of degree.Can by the color of text candidates frame represent text candidate frame be text probability
(Score) candidate frame of the fraction more than certain threshold value, is only depicted in second figure of Fig. 2 and Fig. 3.
Specifically, the lower edges of the text candidates anchor frame to the fixed width of the present invention are returned, detected
With connect, generate text candidates frame sequence step can include:
The supervision message of the text candidates anchor frame is obtained, the supervision message includes:Text candidates anchor frame is text
Score value, text candidates anchor apart from oneself nearest text row bound upper end the first offset distance, text candidates anchor frame distance from
Second offset distance of oneself nearest text row bound lower end;
According to the supervision message of the text candidates anchor frame, score value is selected more than the text candidates anchor frame of predetermined value, knot
First offset distance and the second offset distance are closed, text candidates frame sequence is generated.
In the training process, we add supervision message to text candidates frame so that network can converge to us and wish
The result of prestige.Supervision message can include:
(1) each text candidates anchor frame is the classification supervision message of text.
(2) each text candidates anchor frame is relative to the side-play amount apart from oneself nearest text row bound upper end.
(3) second offset distance of each text candidates anchor frame apart from oneself nearest text row bound lower end.
We can use Softmax functions as our Classification Loss function.For recurrence, we can use
SmoothL1Loss [12] functions are used as our loss function.
In test process, we can obtain according to above-mentioned supervision message:
(1) each text candidates anchor frame is the probability (fraction) of text.
(2) each text candidates anchor frame is relative to the skew apart from oneself nearest text row bound upper end.
(3) skew of each text candidates anchor frame apart from oneself nearest text row bound lower end.
By above-mentioned supervision message, we can choose those fractions more than preset value, such as the text candidates more than 0.7
Anchor frame, then can be obtained by text candidates frame sequence plus corresponding skew.
In step s 103, according to the difference in height and horizontal range of the text candidates frame sequence, select to be in level
Horizontal range between the text box at edge, and in the text candidates frame at vertical edge, generate line of text region.
We can construct a reachable graph using some simple clues.Then in this drawing by the company of searching
The mode in logical domain allows us to obtain final line of text region connecting these text candidates frames.These simple clues
Difference in height including the horizontal range between text box and text box etc..Become text when we connect these text candidates frames
After one's own profession region, we choose those text candidates frames in edge, corresponding skew are added, so as to effectively complete
Accurate adjustment operation to the horizontal coordinate of line of text.As shown in figure 3, after input picture, generating text candidates frame sequence, further
Accurate adjustment obtains final detection result, that is, generate line of text region.
This method is trained by recurrent neural networks model to convolution feature, it is possible to use the context of convolution feature
This information is trained, and is conducive to improving the robustness of text detection, and is not limited to single language classification device, is adapted to
The detection of multi-language text is required, and need not be manually set the priori conditions of complexity, is conducive to improving inspection under different scenes
The stability of survey.In addition, this method efficiently using the computing resource of overlapping region can effectively improve calculating effect
Rate, by model end to end, can simplify and calculate the step of processing.
Fig. 4 carries the structural representation of the text detection device of scene image for the embodiment of the present invention, and details are as follows:
The text detection device of scene image described in the embodiment of the present invention, including:
Convolution feature acquiring unit 401, for obtaining scene image, by scene described in convolutional neural networks model extraction
The convolution feature of image;
Text candidates frame signal generating unit 402, for the convolution feature of the scene image to be sent into into recurrent neural network mould
Type, generates text candidates frame sequence;
Line of text area acquisition unit 403, for carrying out post processing to the text candidates frame sequence, obtains line of text area
Domain.
Preferably, the convolution feature acquiring unit includes:
Convolutional layer obtains subelement, for carrying out convolutional Neural net to the scene image by VGG convolutional neural networks
Network computing, obtains the convolutional layer of the scene image;
Convolution feature is slided and takes subelement, for using predetermined sliding window obtain the scene image specific region last
The convolution feature of individual convolutional layer.
Preferably, the text candidates frame signal generating unit includes:
Text candidates anchor frame training subelement, for the convolution feature to be sent into into two-way shot and long term memory modelses, will be every
Individual convolution feature is trained as the time frame input of shot and long term memory modelses, is fixed the text candidates anchor frame of width;
Text candidates frame detection sub-unit, for carrying out back to the lower edges of the text candidates anchor frame of the fixed width
Return, detect and connect, generate text candidates frame sequence.
Preferably, the text candidates frame detection sub-unit includes:
Supervision message acquisition module, for obtaining the supervision message of the text candidates anchor frame, the supervision message includes:
Text candidates anchor frame is the score value of text, text candidates anchor frame apart from oneself nearest text row bound upper end the first offset distance
From, text candidates anchor frame from oneself nearest text row bound lower end the second offset distance;
Comparison module is selected, for according to the supervision message of the text candidates anchor frame, selecting score value to be more than predetermined value
Text candidates anchor frame, with reference to first offset distance and the second offset distance, generate text candidates frame sequence.
Preferably, the line of text area acquisition unit specifically for:
According to the difference in height and horizontal range of the text candidates frame sequence, select text box in horizontal edge it
Between horizontal range, and in the text candidates frame at vertical edge, generate line of text region.
The text detection device of scene image described in the embodiment of the present invention, the Method for text detection pair with above-mentioned scene image
Should, here is not repeated and repeats.
In several embodiments provided by the present invention, it should be understood that disclosed apparatus and method, it can be passed through
Its mode is realized.For example, device embodiment described above is only schematic, for example, the division of the unit, and only
Only a kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can be tied
Close or be desirably integrated into another system, or some features can be ignored, or do not perform.It is another, it is shown or discussed
Coupling each other or direct-coupling or communication connection can be the INDIRECT COUPLINGs by some interfaces, device or unit or logical
Letter connection, can be electrical, mechanical or other forms.
The unit as separating component explanation can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can according to the actual needs be selected to realize the mesh of this embodiment scheme
's.
In addition, each functional unit in each embodiment of the invention can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list
Unit both can be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is realized using in the form of SFU software functional unit and as independent production marketing or used
When, during a computer read/write memory medium can be stored in.Based on such understanding, technical scheme is substantially
The part for contributing to prior art in other words or all or part of the technical scheme can be in the form of software products
Embody, the computer software product is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the invention
Portion or part.And aforesaid storage medium includes:USB flash disk, portable hard drive, read only memory (ROM, Read-Only Memory),
Random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with store program codes
Medium.
Presently preferred embodiments of the present invention is the foregoing is only, not to limit the present invention, all essences in the present invention
Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.
Claims (10)
1. a kind of Method for text detection of scene image, it is characterised in that methods described includes:
Scene image is obtained, by the convolution feature of scene image described in convolutional neural networks model extraction;
The convolution feature of the scene image is sent into into recurrent neural networks model, text candidates frame sequence is generated;
Post processing is carried out to the text candidates frame sequence, line of text region is obtained.
2. method according to claim 1, it is characterised in that described by scene graph described in convolutional neural networks model extraction
The convolution characterization step of picture includes:
Convolutional neural networks computing is carried out to the scene image by VGG convolutional neural networks, the scene image is obtained
Convolutional layer;
The convolution feature of last convolutional layer of the scene image specific region is obtained using predetermined sliding window.
3. method according to claim 1, it is characterised in that the convolution feature by the scene image sends into recurrence god
Jing network modeies, generating text candidates frame sequence step includes:
The convolution feature is sent into into recurrent neural networks model by row, using each convolution feature as shot and long term memory modelses
Time frame input is trained, and is fixed the text candidates anchor frame of width;
The lower edges of the text candidates anchor frame of the fixed width are returned, detected and connected, text candidates frame is generated
Sequence.
4. method according to claim 3, it is characterised in that the text candidates anchor frame to the fixed width it is upper and lower
Edge is returned, detected and connected, and generating text candidates frame sequence step includes:
The supervision message of the text candidates anchor frame is obtained, the supervision message includes:Text candidates anchor frame is the scoring of text
Value, text candidates anchor frame apart from oneself nearest text row bound upper end the first offset distance, text candidates anchor frame apart from oneself
Second offset distance of nearest text row bound lower end;
According to the supervision message of the text candidates anchor frame, score value is selected more than the text candidates anchor frame of predetermined value, with reference to institute
The first offset distance and the second offset distance are stated, text candidates frame sequence is generated.
5. according to claim 1-4 any one methods described, it is characterised in that it is described the text candidates frame sequence is carried out after
Process, obtaining line of text region step includes:
According to the difference in height and horizontal range of the text candidates frame sequence, select between the text box of horizontal edge
Horizontal range, and in the text candidates frame at vertical edge, generate line of text region.
6. the text detection device of a kind of scene image, it is characterised in that described device includes:
Convolution feature acquiring unit, for obtaining scene image, by scene image described in convolutional neural networks model extraction
Convolution feature;
Text candidates frame signal generating unit, for the convolution feature of the scene image to be sent into into recurrent neural networks model, generates
Text candidates frame sequence;
Line of text area acquisition unit, for carrying out post processing to the text candidates frame sequence, obtains line of text region.
7. device according to claim 6, it is characterised in that the convolution feature acquiring unit includes:
Convolutional layer obtains subelement, for carrying out convolutional neural networks fortune to the scene image by VGG convolutional neural networks
Calculate, obtain the convolutional layer of the scene image;
Convolution feature is slided and takes subelement, for obtaining last volume of the scene image specific region using predetermined sliding window
The convolution feature of lamination.
8. device according to claim 6, it is characterised in that the text candidates frame signal generating unit includes:
Text candidates anchor frame training subelement, for the convolution feature to be sent into into recurrent neural networks model by row, by each
Convolution feature is trained as the time frame input of shot and long term memory modelses, is fixed the text candidates anchor frame of width;
Text candidates frame detection sub-unit, for being returned to the lower edges of the text candidates anchor frame of the fixed width,
Detect and connect, generate text candidates frame sequence.
9. device according to claim 8, it is characterised in that the text candidates frame detection sub-unit includes:
Supervision message acquisition module, for obtaining the supervision message of the text candidates anchor frame, the supervision message includes:Text
Candidate anchor frame is the score value of text, text candidates anchor frame apart from oneself nearest text row bound upper end the first offset distance,
Second offset distance of the text candidates anchor frame from oneself nearest text row bound lower end;
Comparison module is selected, for according to the supervision message of the text candidates anchor frame, selecting score value more than the text of predetermined value
This candidate anchor frame, with reference to first offset distance and the second offset distance, generates text candidates frame sequence.
10. according to claim 6-9 any one described device, it is characterised in that the line of text area acquisition unit is specifically used
In:
According to the difference in height and horizontal range of the text candidates frame sequence, select between the text box of horizontal edge
Horizontal range, and in the text candidates frame at vertical edge, generate line of text region.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610878795.5A CN106570497A (en) | 2016-10-08 | 2016-10-08 | Text detection method and device for scene image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610878795.5A CN106570497A (en) | 2016-10-08 | 2016-10-08 | Text detection method and device for scene image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106570497A true CN106570497A (en) | 2017-04-19 |
Family
ID=58532561
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610878795.5A Pending CN106570497A (en) | 2016-10-08 | 2016-10-08 | Text detection method and device for scene image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106570497A (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423756A (en) * | 2017-07-05 | 2017-12-01 | 武汉科恩斯医疗科技有限公司 | Nuclear magnetic resonance image sequence sorting technique based on depth convolutional neural networks combination shot and long term memory models |
CN107480726A (en) * | 2017-08-25 | 2017-12-15 | 电子科技大学 | A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon |
CN107545262A (en) * | 2017-07-31 | 2018-01-05 | 华为技术有限公司 | A kind of method and device that text is detected in natural scene image |
CN108154145A (en) * | 2018-01-24 | 2018-06-12 | 北京地平线机器人技术研发有限公司 | The method and apparatus for detecting the position of the text in natural scene image |
CN108304761A (en) * | 2017-09-25 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Method for text detection, device, storage medium and computer equipment |
CN108564084A (en) * | 2018-05-08 | 2018-09-21 | 北京市商汤科技开发有限公司 | character detecting method, device, terminal and storage medium |
CN108694393A (en) * | 2018-05-30 | 2018-10-23 | 深圳市思迪信息技术股份有限公司 | A kind of certificate image text area extraction method based on depth convolution |
CN108734169A (en) * | 2018-05-21 | 2018-11-02 | 南京邮电大学 | One kind being based on the improved scene text extracting method of full convolutional network |
CN108764228A (en) * | 2018-05-28 | 2018-11-06 | 嘉兴善索智能科技有限公司 | Word object detection method in a kind of image |
CN108846379A (en) * | 2018-07-03 | 2018-11-20 | 南京览笛信息科技有限公司 | Face list recognition methods, system, terminal device and storage medium |
CN109344822A (en) * | 2018-09-03 | 2019-02-15 | 电子科技大学 | A kind of scene text detection method based on shot and long term memory network |
CN109685055A (en) * | 2018-12-26 | 2019-04-26 | 北京金山数字娱乐科技有限公司 | Text filed detection method and device in a kind of image |
CN109740585A (en) * | 2018-03-28 | 2019-05-10 | 北京字节跳动网络技术有限公司 | A kind of text positioning method and device |
CN109871843A (en) * | 2017-12-01 | 2019-06-11 | 北京搜狗科技发展有限公司 | Character identifying method and device, the device for character recognition |
CN109948615A (en) * | 2019-03-26 | 2019-06-28 | 中国科学技术大学 | Multi-language text detects identifying system |
CN109993040A (en) * | 2018-01-03 | 2019-07-09 | 北京世纪好未来教育科技有限公司 | Text recognition method and device |
CN110008900A (en) * | 2019-04-02 | 2019-07-12 | 北京市遥感信息研究所 | A kind of visible remote sensing image candidate target extracting method by region to target |
CN110084240A (en) * | 2019-04-24 | 2019-08-02 | 网易(杭州)网络有限公司 | A kind of Word Input system, method, medium and calculate equipment |
CN110110777A (en) * | 2019-04-28 | 2019-08-09 | 网易有道信息技术(北京)有限公司 | Image processing method and training method and device, medium and calculating equipment |
CN110245545A (en) * | 2018-09-26 | 2019-09-17 | 浙江大华技术股份有限公司 | A kind of character recognition method and device |
CN110276351A (en) * | 2019-06-28 | 2019-09-24 | 中国科学技术大学 | Multilingual scene text detection and recognition methods |
CN111127593A (en) * | 2018-10-30 | 2020-05-08 | 珠海金山办公软件有限公司 | Document content erasing method and device, electronic equipment and readable storage medium |
CN111222589A (en) * | 2018-11-27 | 2020-06-02 | 中国移动通信集团辽宁有限公司 | Image text recognition method, device, equipment and computer storage medium |
CN111612003A (en) * | 2019-02-22 | 2020-09-01 | 北京京东尚科信息技术有限公司 | Method and device for extracting text in picture |
CN111695377A (en) * | 2019-03-13 | 2020-09-22 | 杭州海康威视数字技术股份有限公司 | Text detection method and device and computer equipment |
CN112070784A (en) * | 2020-09-15 | 2020-12-11 | 桂林电子科技大学 | Perception edge detection method based on context enhancement network |
CN114677691A (en) * | 2022-04-06 | 2022-06-28 | 北京百度网讯科技有限公司 | Text recognition method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631426A (en) * | 2015-12-29 | 2016-06-01 | 中国科学院深圳先进技术研究院 | Image text detection method and device |
CN105654127A (en) * | 2015-12-30 | 2016-06-08 | 成都数联铭品科技有限公司 | End-to-end-based picture character sequence continuous recognition method |
CN105678292A (en) * | 2015-12-30 | 2016-06-15 | 成都数联铭品科技有限公司 | Complex optical text sequence identification system based on convolution and recurrent neural network |
-
2016
- 2016-10-08 CN CN201610878795.5A patent/CN106570497A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105631426A (en) * | 2015-12-29 | 2016-06-01 | 中国科学院深圳先进技术研究院 | Image text detection method and device |
CN105654127A (en) * | 2015-12-30 | 2016-06-08 | 成都数联铭品科技有限公司 | End-to-end-based picture character sequence continuous recognition method |
CN105678292A (en) * | 2015-12-30 | 2016-06-15 | 成都数联铭品科技有限公司 | Complex optical text sequence identification system based on convolution and recurrent neural network |
Non-Patent Citations (1)
Title |
---|
ZHI TIAN等: "Detecting Text in Natural Image withe Connectionist Text Proposal Network", 《COMPUTER VISION-EECV 2016》 * |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423756A (en) * | 2017-07-05 | 2017-12-01 | 武汉科恩斯医疗科技有限公司 | Nuclear magnetic resonance image sequence sorting technique based on depth convolutional neural networks combination shot and long term memory models |
CN107545262A (en) * | 2017-07-31 | 2018-01-05 | 华为技术有限公司 | A kind of method and device that text is detected in natural scene image |
CN107480726A (en) * | 2017-08-25 | 2017-12-15 | 电子科技大学 | A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon |
US11030471B2 (en) | 2017-09-25 | 2021-06-08 | Tencent Technology (Shenzhen) Company Limited | Text detection method, storage medium, and computer device |
CN108304761A (en) * | 2017-09-25 | 2018-07-20 | 腾讯科技(深圳)有限公司 | Method for text detection, device, storage medium and computer equipment |
CN109871843B (en) * | 2017-12-01 | 2022-04-08 | 北京搜狗科技发展有限公司 | Character recognition method and device for character recognition |
CN109871843A (en) * | 2017-12-01 | 2019-06-11 | 北京搜狗科技发展有限公司 | Character identifying method and device, the device for character recognition |
CN109993040B (en) * | 2018-01-03 | 2021-07-30 | 北京世纪好未来教育科技有限公司 | Text recognition method and device |
CN109993040A (en) * | 2018-01-03 | 2019-07-09 | 北京世纪好未来教育科技有限公司 | Text recognition method and device |
CN108154145A (en) * | 2018-01-24 | 2018-06-12 | 北京地平线机器人技术研发有限公司 | The method and apparatus for detecting the position of the text in natural scene image |
CN108154145B (en) * | 2018-01-24 | 2020-05-19 | 北京地平线机器人技术研发有限公司 | Method and device for detecting position of text in natural scene image |
CN109740585A (en) * | 2018-03-28 | 2019-05-10 | 北京字节跳动网络技术有限公司 | A kind of text positioning method and device |
CN108564084A (en) * | 2018-05-08 | 2018-09-21 | 北京市商汤科技开发有限公司 | character detecting method, device, terminal and storage medium |
CN108734169A (en) * | 2018-05-21 | 2018-11-02 | 南京邮电大学 | One kind being based on the improved scene text extracting method of full convolutional network |
CN108764228A (en) * | 2018-05-28 | 2018-11-06 | 嘉兴善索智能科技有限公司 | Word object detection method in a kind of image |
CN108694393A (en) * | 2018-05-30 | 2018-10-23 | 深圳市思迪信息技术股份有限公司 | A kind of certificate image text area extraction method based on depth convolution |
CN108846379A (en) * | 2018-07-03 | 2018-11-20 | 南京览笛信息科技有限公司 | Face list recognition methods, system, terminal device and storage medium |
CN109344822A (en) * | 2018-09-03 | 2019-02-15 | 电子科技大学 | A kind of scene text detection method based on shot and long term memory network |
CN109344822B (en) * | 2018-09-03 | 2022-06-03 | 电子科技大学 | Scene text detection method based on long-term and short-term memory network |
CN110245545A (en) * | 2018-09-26 | 2019-09-17 | 浙江大华技术股份有限公司 | A kind of character recognition method and device |
CN111127593B (en) * | 2018-10-30 | 2023-10-31 | 珠海金山办公软件有限公司 | Document content erasing method and device, electronic equipment and readable storage medium |
CN111127593A (en) * | 2018-10-30 | 2020-05-08 | 珠海金山办公软件有限公司 | Document content erasing method and device, electronic equipment and readable storage medium |
CN111222589B (en) * | 2018-11-27 | 2023-07-18 | 中国移动通信集团辽宁有限公司 | Image text recognition method, device, equipment and computer storage medium |
CN111222589A (en) * | 2018-11-27 | 2020-06-02 | 中国移动通信集团辽宁有限公司 | Image text recognition method, device, equipment and computer storage medium |
CN109685055B (en) * | 2018-12-26 | 2021-11-12 | 北京金山数字娱乐科技有限公司 | Method and device for detecting text area in image |
CN109685055A (en) * | 2018-12-26 | 2019-04-26 | 北京金山数字娱乐科技有限公司 | Text filed detection method and device in a kind of image |
CN111612003B (en) * | 2019-02-22 | 2024-08-20 | 北京京东尚科信息技术有限公司 | Method and device for extracting text in picture |
CN111612003A (en) * | 2019-02-22 | 2020-09-01 | 北京京东尚科信息技术有限公司 | Method and device for extracting text in picture |
CN111695377A (en) * | 2019-03-13 | 2020-09-22 | 杭州海康威视数字技术股份有限公司 | Text detection method and device and computer equipment |
CN111695377B (en) * | 2019-03-13 | 2023-09-29 | 杭州海康威视数字技术股份有限公司 | Text detection method and device and computer equipment |
CN109948615A (en) * | 2019-03-26 | 2019-06-28 | 中国科学技术大学 | Multi-language text detects identifying system |
CN109948615B (en) * | 2019-03-26 | 2021-01-26 | 中国科学技术大学 | Multi-language text detection and recognition system |
CN110008900A (en) * | 2019-04-02 | 2019-07-12 | 北京市遥感信息研究所 | A kind of visible remote sensing image candidate target extracting method by region to target |
CN110008900B (en) * | 2019-04-02 | 2023-12-12 | 北京市遥感信息研究所 | Method for extracting candidate target from visible light remote sensing image from region to target |
CN110084240A (en) * | 2019-04-24 | 2019-08-02 | 网易(杭州)网络有限公司 | A kind of Word Input system, method, medium and calculate equipment |
CN110110777A (en) * | 2019-04-28 | 2019-08-09 | 网易有道信息技术(北京)有限公司 | Image processing method and training method and device, medium and calculating equipment |
CN110276351A (en) * | 2019-06-28 | 2019-09-24 | 中国科学技术大学 | Multilingual scene text detection and recognition methods |
CN110276351B (en) * | 2019-06-28 | 2022-09-06 | 中国科学技术大学 | Multi-language scene text detection and identification method |
CN112070784B (en) * | 2020-09-15 | 2022-07-01 | 桂林电子科技大学 | Perception edge detection method based on context enhancement network |
CN112070784A (en) * | 2020-09-15 | 2020-12-11 | 桂林电子科技大学 | Perception edge detection method based on context enhancement network |
CN114677691B (en) * | 2022-04-06 | 2023-10-03 | 北京百度网讯科技有限公司 | Text recognition method, device, electronic equipment and storage medium |
CN114677691A (en) * | 2022-04-06 | 2022-06-28 | 北京百度网讯科技有限公司 | Text recognition method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106570497A (en) | Text detection method and device for scene image | |
Nakamura et al. | Scene text eraser | |
Paris et al. | A topological approach to hierarchical segmentation using mean shift | |
US20180114071A1 (en) | Method for analysing media content | |
Tang et al. | Deeply-supervised recurrent convolutional neural network for saliency detection | |
CN110222686B (en) | Object detection method, object detection device, computer equipment and storage medium | |
US11853892B2 (en) | Learning to segment via cut-and-paste | |
WO2021115345A1 (en) | Image processing method and apparatus, computer device, and storage medium | |
CN107945153A (en) | A kind of road surface crack detection method based on deep learning | |
CN107871125A (en) | Architecture against regulations recognition methods, device and electronic equipment | |
CN112488229B (en) | Domain self-adaptive unsupervised target detection method based on feature separation and alignment | |
CN111401293B (en) | Gesture recognition method based on Head lightweight Mask scanning R-CNN | |
WO2020077940A1 (en) | Method and device for automatic identification of labels of image | |
CN103810503A (en) | Depth study based method for detecting salient regions in natural image | |
CN104008401A (en) | Method and device for image character recognition | |
Li et al. | Fast and effective text detection | |
CN110516676A (en) | A kind of bank's card number identifying system based on image procossing | |
Zhang et al. | Deep salient object detection by integrating multi-level cues | |
Karaoglu et al. | Con-text: text detection using background connectivity for fine-grained object classification | |
Zhang et al. | Multi-scale salient object detection with pyramid spatial pooling | |
CN105404682B (en) | A kind of book retrieval method based on digital image content | |
CN108280388A (en) | The method and apparatus and type of face detection method and device of training face detection model | |
CN109409224A (en) | A kind of method of natural scene fire defector | |
CN111652288B (en) | Improved SSD small target detection method based on dense feature pyramid | |
Gui et al. | A fast caption detection method for low quality video images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170419 |