CN110263779A - Text filed detection method and device, Method for text detection, computer-readable medium - Google Patents
Text filed detection method and device, Method for text detection, computer-readable medium Download PDFInfo
- Publication number
- CN110263779A CN110263779A CN201810225220.2A CN201810225220A CN110263779A CN 110263779 A CN110263779 A CN 110263779A CN 201810225220 A CN201810225220 A CN 201810225220A CN 110263779 A CN110263779 A CN 110263779A
- Authority
- CN
- China
- Prior art keywords
- text filed
- text
- filed
- detection
- fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Character Discrimination (AREA)
Abstract
The present invention relates to field of computer technology, a kind of text filed detection method and device, Method for text detection, computer-readable medium are provided, text method for detecting area includes: to extract the feature of an original image to obtain a characteristic spectrum;Text filed detection is carried out based on the characteristic spectrum, obtains multiple text filed fragments;Obtain the sequence information of multiple text filed fragments, and according to the sequence information by multiple text filed crumb forms at text filed chip sequence;The text filed chip sequence polymerization is obtained text filed.Present invention reduces the false detection rate of text detection and omission factors, improve accuracy and the Feasible degree of text detection.
Description
Technical field
The present invention relates to field of computer technology, in particular to a kind of text filed detection method and device, text
This detection method, computer-readable medium.
Background technique
With the rapid development of internet technology and the rapid proliferation of smart phone, make our life more and more just
It is prompt.Usually we will use digital camera, camera or mobile phone photograph and uploaded material (such as identity card, business license, head portrait
Deng), for identity, the qualification of operator verifying user, but since background of taking pictures under natural scene is complicated, environmental disturbances factor
More, the text in picture is difficult to distinguish with background, and the case where be at least partially obscured in photo there is also text, this is to text
This detection causes very big challenge.In order to identify the text in natural scene image, many OCR of expert design
(Optical Character Recognition, optical character identification) character recognition system, these systems are in document
Text usually has preferable detection effect, but poor for the text detection effect in scene image.This is because scene figure
As the variation multiplicity of text, and image background is also relative complex, is difficult directly to identify by OCR software.Therefore String localization
It is the first step for understanding scene image text, Classification and Identification further will could be carried out to content of text after String localization.
It should be noted that information is only used for reinforcing to background of the invention disclosed in above-mentioned background technology part
Understand, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
It can the purpose of the present invention is to provide a kind of text filed detection method and device, Method for text detection, computer
Medium is read, and then reduces the false detection rate and omission factor of text detection, improves accuracy and the Feasible degree of text detection.
Other characteristics and advantages of the invention will be apparent from by the following detailed description, or partially by this hair
Bright practice and acquistion.
According to the first aspect of the invention, a kind of text filed detection method is provided characterized by comprising extract one
The feature of original image obtains a characteristic spectrum;Text filed detection is carried out based on the characteristic spectrum, obtains multiple text areas
Domain fragment;Obtain the sequence information of multiple text filed fragments, and according to the sequence information by multiple text areas
Domain crumb form is at text filed chip sequence;The text filed chip sequence polymerization is obtained text filed.
According to the second aspect of the invention, a kind of Method for text detection is provided characterized by comprising according to above-mentioned
Text filed detection method obtains described text filed;Identify it is described it is text filed in text information, obtain text.
According to the third aspect of the invention we, a kind of text filed detection device is provided characterized by comprising feature mentions
Modulus block, the feature for extracting an original image obtain a characteristic spectrum;Fragment obtains module, for being based on the feature
Map carries out text filed detection, obtains multiple text filed fragments;Sequence information obtains module, multiple described for obtaining
The sequence information of text filed fragment, and according to the sequence information by multiple text filed crumb forms at text filed
Chip sequence;Aggregation module, it is text filed for obtaining the text filed chip sequence polymerization.
In some embodiments of the invention, aforementioned schemes are based on, characteristic extracting module of the invention includes: convolution list
Member obtains the characteristic spectrum for carrying out multistage convolution to the original image by residual error network model.
In some embodiments of the invention, aforementioned schemes are based on, it includes: that anchor point is set that fragment of the invention, which obtains module,
Unit is set, for one group of anchor point to be arranged in each pixel of the characteristic spectrum;Feature extraction unit, for passing through sliding window
The corresponding characteristics of image of the anchor point is extracted, multiple text filed fragments are generated.
In some embodiments of the invention, aforementioned schemes are based on, anchor point setting unit of the invention includes: that width is set
Order member, for the fixed anchor point of one group of width to be arranged in each pixel of the characteristic spectrum.
In some embodiments of the invention, aforementioned schemes are based on, it includes: sequence that sequence information of the invention, which obtains module,
Column information extraction unit, for the text filed fragment to be input to a shot and long term memory models, to obtain the text area
The sequence information of domain fragment, and according to the sequence information by multiple text filed crumb forms at text filed chip sequence.
In some embodiments of the invention, aforementioned schemes are based on, shot and long term memory network is that two-way shot and long term remembers mould
Type, including it is preceding to shot and long term memory models and backward shot and long term memory models.
In some embodiments of the invention, aforementioned schemes are based on, before sequence information extraction unit of the invention includes:
To sequence extraction unit, for learning the letter above of each text filed fragment by the forward direction shot and long term memory models
Breath;Backward sequence extraction unit, for being learnt under each text filed fragment by the backward shot and long term memory models
Literary information;Combining unit forms the text filed chip sequence for merging learning outcome series connection.
In some embodiments of the invention, aforementioned schemes are based on, aggregation module of the invention includes: selecting unit, is used
In choosing multiple text filed fragments, wherein the spacing of the adjacent two text filed fragment is less than a set distance, and
Longitudinal overlap ratio is greater than a setting value;Connection unit, for multiple text filed fragment connections to be obtained the text area
Domain.
In some embodiments of the invention, aforementioned schemes, text filed detection device of the invention are based on further include:
Sample acquisition module, for obtaining training sample;Model training module, for according to the training sample and different study
Rate carries out machine training to the residual error network model and the two-way shot and long term memory models;Judging unit, for when described
When the loss function minimum of residual error network model and the two-way shot and long term memory models, terminate machine training.
In some embodiments of the invention, aforementioned schemes, text filed detection device of the invention are based on further include:
Feature Mapping module, for the text filed chip sequence to be mapped to full articulamentum, to predict the text of text filed fragment
This confidence level and text position.
In some embodiments of the invention, aforementioned schemes, text filed detection device of the invention are based on further include:
Correction module, for according to the opposite position between the text filed predicted boundary and the text filed boundary true value
It moves and corrects the text filed boundary error.
In some embodiments of the invention, aforementioned schemes are based on, it includes: that screening is single that fragment of the invention, which obtains module,
Member, for carrying out non-maxima suppression to the text filed fragment, to obtain multiple text confidence levels greater than a setting value
Text filed fragment.
According to the fourth aspect of the invention, a kind of computer-readable medium is provided, computer program is stored thereon with,
Such as above-mentioned text filed detection method as described in the examples and text detection side are realized when described program is executed by processor
Method.
According to the text filed detection method in this example embodiment, server receives an original image, and extracts the original
The feature of beginning image obtains a characteristic spectrum;Text filed detection is carried out based on characteristic spectrum, is obtained multiple text filed broken
Piece;Then the sequence information of multiple text filed fragments is obtained, and according to sequence information by multiple text filed crumb forms
At text filed chip sequence;Finally the polymerization of text filed chip sequence is obtained text filed.Text area through the invention
Omission factor and false detection rate when area detecting method can reduce text detection, avoiding among text has space that can be accidentally divided into
The problem of two words, improves accuracy and the Feasible degree of text detection.
The present invention is it should be understood that above general description and following detailed description is only exemplary and explanatory
, the present invention can not be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets reality of the invention
Example is applied, and is used to explain the principle of the present invention together with specification.It should be evident that the accompanying drawings in the following description is only this
Some embodiments of invention without creative efforts, may be used also for those of ordinary skill in the art
To obtain other drawings based on these drawings.
Fig. 1 is shown can be using the verification method or network data request that the network data of the embodiment of the present invention is requested
Verify the schematic diagram of the exemplary system architecture of device;
Fig. 2 shows the structural schematic diagrams of the computer system of the electronic equipment suitable for being used to realize the embodiment of the present invention;
Fig. 3 shows the method flow diagram of text detection in the related technology;
Fig. 4 shows the method flow diagram of text filed detection in one embodiment of the invention;
Fig. 5 shows the block schematic illustration of text detection model in one embodiment of the invention;
Fig. 6 shows forming method flow chart text filed in one embodiment of the invention;
Fig. 7 shows the stream that in one embodiment of the invention nameplate, identity card, business card or advertising pictures are carried out with text detection
Cheng Tu;
Fig. 8 shows the method flow diagram for carrying out text detection in one embodiment of the invention to business card;
Fig. 9 shows the structural schematic diagram of text filed detection device in one embodiment of the invention.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the present invention will more
Add fully and completely, and the design of example embodiment is comprehensively communicated to those skilled in the art.
In addition, described feature, structure or characteristic can be incorporated in one or more implementations in any suitable manner
In example.In the following description, many details are provided to provide and fully understand to the embodiment of the present invention.However,
It will be appreciated by persons skilled in the art that technical solution of the present invention can be practiced without one or more in specific detail,
Or it can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes known side
Method, device, realization or operation are to avoid fuzzy each aspect of the present invention.
Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity.
I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit
These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.
Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step,
It nor is it necessary that and executed by described sequence.For example, some operation/steps can also decompose, and some operation/steps can
To merge or partially merge, therefore the sequence actually executed is possible to change according to the actual situation.
Fig. 1 shows the text filed detection method and device, Method for text detection that can apply the embodiment of the present invention
The schematic diagram of exemplary system architecture 100.
As shown in Figure 1, system architecture 100 may include terminal device 101, network 102 and server 103.Network 102
To provide the medium of communication link between terminal device 101 and server 103.Network 102 may include various connection classes
Type, such as wired, wireless communication link or fiber optic cables etc..
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.For example server 103 can be multiple server compositions
Server cluster etc..
User can be used terminal device 101 and be interacted by network 102 with server 103, to receive or send picture number
According to etc..Terminal device 101 can be the various electronic equipments with display screen, including but not limited to smart phone, plate electricity
Brain, portable computer and desktop computer etc..
Server 103 can be to provide the server of various services.Such as 103 receiving terminal apparatus 101 of server is sent
An original image, and extract the original image feature obtain a characteristic spectrum;Text filed inspection is carried out based on characteristic spectrum
It surveys, obtains multiple text filed fragments;Multiple text filed fragments are input to shot and long term memory models, obtain multiple texts
The sequence information of region fragment, and according to sequence information by multiple text filed crumb forms at text filed chip sequence;
The polymerization of text filed chip sequence is finally obtained to omission factor and false detection rate text filed, when can reduce text detection,
Improve accuracy and the Feasible degree of text detection.
Fig. 2 shows the structures of the computer system of the electronic equipment suitable for being used to realize the embodiment in the present invention to show
It is intended to.
It should be noted that Fig. 2 shows the computer system 200 of electronic equipment be only an example, should not be to this hair
The function and use scope of bright embodiment bring any restrictions.
As shown in Fig. 2, computer system 200 includes central processing unit (CPU) 201, it can be read-only according to being stored in
Program in memory (ROM) 202 is loaded into the program in random access storage device (RAM) 203 from storage section 208
And execute various movements appropriate and processing.In RAM 203, it is also stored with various programs and data needed for system operatio.
CPU 201, ROM 202 and RAM 203 are connected with each other by bus 204.Input/output (I/O) interface 205 is also connected to
Bus 204.
I/O interface 205 is connected to lower component: the importation 206 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 207 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section including hard disk etc.
208;And the communications portion 209 of the network interface card including LAN card, modem etc..Communications portion 209 via
The network of such as internet executes communication process.Driver 210 is also connected to I/O interface 205 as needed.Detachable media
211, such as disk, CD, magneto-optic disk, semiconductor memory etc., are mounted on as needed on driver 210, in order to from
The computer program read thereon is mounted into storage section 208 as needed.
Particularly, according to an embodiment of the invention, may be implemented as computer below with reference to the process of flow chart description
Software program.For example, the embodiment of the present invention includes a kind of computer program product comprising be carried on computer-readable Jie
Computer program in matter, the computer program include the program code for method shown in execution flow chart.Such
In embodiment, which can be downloaded and installed from network by communications portion 209, and/or is situated between from detachable
Matter 211 is mounted.When the computer program is executed by central processing unit (CPU) 201, executes in the system of the application and limit
Fixed various functions.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or
Computer readable storage medium either the two any combination.Computer readable storage medium for example can be ---
But be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above group
It closes.The more specific example of computer readable storage medium can include but is not limited to: have being electrically connected for one or more conducting wires
It connects, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type programmable
Reading memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic are deposited
Memory device or above-mentioned any appropriate combination.In the present invention, computer readable storage medium, which can be, any includes
Or the tangible medium of storage program, which can be commanded execution system, device or device use or in connection make
With.And in the present invention, computer-readable signal media may include propagating in a base band or as carrier wave a part
Data-signal, wherein carrying computer-readable program code.The data-signal of this propagation can use a variety of shapes
Formula, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media is also
It can be any computer-readable medium other than computer readable storage medium, which can send, pass
It broadcasts or transmits for by the use of instruction execution system, device or device or program in connection.Computer can
The program code for reading to include on medium can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable,
RF etc. or above-mentioned any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can be with
Represent a part of a module, program segment or code, a part of above-mentioned module, program segment or code include one or
Multiple executable instructions for implementing the specified logical function.It should also be noted that in some implementations as replacements, side
The function of being marked in frame can also occur in a different order than that indicated in the drawings.For example, two succeedingly indicate
Box can actually be basically executed in parallel, they can also be executed in the opposite order sometimes, this is according to related function
Depending on.It is also noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, it can
To be realized with the dedicated hardware based system for executing defined functions or operations, or specialized hardware and meter can be used
The combination of calculation machine instruction is realized.
Being described in unit involved in the embodiment of the present invention can be realized by way of software, can also be passed through
The mode of hardware realizes that described unit also can be set in the processor.Wherein, the title of these units is at certain
In the case of do not constitute restriction to the unit itself.
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be
Included in electronic equipment described in above-described embodiment;It is also possible to individualism, and without the supplying electronic equipment
In.Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by the electricity
When sub- equipment executes, so that method described in electronic equipment realization as the following examples.For example, the electronic equipment
It may be implemented such as Fig. 4-each step shown in Fig. 8.
This field in the related technology, generally use based on depth convolutional neural networks carry out text detection, commonly
Depth convolutional neural networks model has CNN, R-CNN, Fast RCNN, Faster RCNN etc., below with Faster RCNN model
Be illustrated for text detection.
Fig. 3 shows the flow chart that Faster RCNN model carries out text detection, as shown in figure 3, in step S301,
Multistage convolution is carried out to input picture, carries out convolutional layer feature extraction;In step s 302, the feature of extraction is passed through into one
The convolution operation of step extracts global convolution feature, forms image overall feature;In step S303, the feature of extraction is used
In the input of Area generation network, Area generation network can carry out rough text classification and positioning to the feature of input, generate
Candidate target region;In step s 304, image overall feature and candidate target region are input into interest pool area together
Change layer, this layer mainly arrives the input feature vector dimension pondization of variation and the subsequent complete matched dimension of articulamentum;In step S305
In, target category prediction is carried out according to the feature by pond and target position returns, wherein target category prediction is for predicting
Whether feature is text, and target position is returned for predicting that position in the picture occurs in text.
But Method for text detection in the related technology has the disadvantage in that (1) does not account for text and common object
Physical examination error of measurement is different.Object usually has complete closed boundary, but text does not have this condition, text shape border generally
It is to change with stroke, and might have space among the same word, inside usual object detection task, there is space
Object in two regions can be considered as two different objects;(2) it is successful to measure detection for common object detection algorithms
Standard is to calculate the lap between the text box predicted and the text box marked in advance to account for the ratio of the two union to weigh
Amount, it is generally recognized that it is exactly correct detection that this ratio, which is greater than 0.5, this is because can be speculated according to a part of object
The object category out.But this hypothesis is difficult to meet in text detection, because text is generally smaller, and some texts
The partial region of word is similar, so being difficult which class determination should be categorized on earth if not seeing whole texts
Not, therefore it is required that the text location more refined.
The problem of for practical application, provides firstly a kind of text filed inspection in an embodiment of the present invention
Survey method, with to there are the problem of optimize processing, with specific reference to shown in Fig. 4, text filed detection method is suitable for aforementioned
The electronic equipment in embodiment, and at least include the following steps, specifically:
Step S410: the feature for extracting an original image obtains a characteristic spectrum;
Step S420: text filed detection is carried out based on the characteristic spectrum, obtains multiple text filed fragments;
Step S430: the sequence information of multiple text filed fragments is obtained, and will be multiple according to the sequence information
The text filed crumb form is at text filed chip sequence;
Step S440: the text filed chip sequence polymerization is obtained text filed.
The original that terminal device 101 is sent according to the text filed detection method in this example embodiment, server 103
Beginning image carries out feature extraction and obtains a characteristic spectrum;Text filed detection is carried out based on characteristic spectrum, obtains multiple text areas
Domain fragment, and according to the sequence information of multiple text filed fragments by multiple text filed crumb forms at text filed broken
Piece sequence;It is finally that the polymerization acquisition of text filed chip sequence is text filed, it can reduce omission factor and mistake when text detection
Inspection rate improves accuracy and the Feasible degree of text detection.
In the following, the text filed detection method in this example embodiment is further detailed.
In step S410, the feature for extracting an original image obtains a characteristic spectrum.
In this exemplary embodiment, Fig. 5 shows the frame diagram of text filed detection, referring to Figure 5, terminal device
101 send an original image to server 103, and server 103 carries out feature extraction acquisition to it after receiving the original image
One characteristic spectrum Res4f.Terminal device 101 can take pictures to the object comprising text information by included camera
Obtain original image;Can also obtain from other image acquisition equipments (such as video camera, DV machine, camera) includes text
The original image of information.Meanwhile original image can be nameplate, identity document, business card, billboard etc. with text information
Image.
In this exemplary embodiment, after server 103 receives original image, by image characteristics extraction device to original
Image carries out feature extraction.The image characteristics extraction device can be VGG19, VGG16, ResNet50, ResNet101,
Inception V3, Xception etc., in order to make it easy to understand, the present invention carries out feature to original image using ResNet50
It extracts, as shown in figure 5, obtaining characteristic spectrum Res4f according to the feature of extraction, characteristic spectrum Res4f is the global convolution extracted
Feature.
In the step s 420, word area detection is carried out based on the characteristic spectrum, obtains multiple text filed fragments.
In this exemplary embodiment, appropriately sized sliding window can be used sliding on characteristic spectrum Res4f with certain stride
It is dynamic, it detected to text filed, extract text filed information, to obtain multiple text filed fragments.The size of sliding window can be with
It is set according to actual needs, such as the sliding window of 3 × 3,5 × 5 scales can be used, such as use scale for 3 in the present invention
× 3 sliding window carries out feature extraction to characteristic spectrum Res4f, obtains multiple text filed fragments.By sliding window in characteristic spectrum
Text filed feature is extracted in sliding on Res4f, can be saved sliding on the original image and be extracted text filed feature bring
Compute repeatedly problem, and characteristic spectrum Res4f has the diminution of larger multiple relative to original image dimension, therefore into one
Step reduces time and the computing resource of sliding window operation consumption.
In this exemplary embodiment, one group of anchor point, every group of anchor can be set in each pixel of characteristic spectrum Res4f
Point includes K anchor point (K is positive integer), those anchor points can unknown shape on Cover Characteristics map Res4f as much as possible,
The corresponding characteristics of image of each anchor point is extracted by sliding sliding window, generates multiple text filed fragments.Since text is than other mesh
Mark is difficult to be distinguished, and text detection is usually carried out as unit of word or line of text, when directly detecting word or line of text,
If character is made of the segment separated, the portion that will belong to the same word or line of text when detection may cause
Missing inspection or detection are given into two words or line of text, influences subsequent text identification result.Therefore in order to handle actual field
The changeable target of shape under scape, while reducing the search space of model optimization, the present invention fixed using width, alterable height
Anchor point, for example, it can be set to the width of anchor point be 16, height value be 7,11,18,25,35,56,67,88,100,168,
278 }, in small-scale horizontal zone, the variation of text is little, and biggish space will not occurs, when the width of anchor point
When degree is set as 16, accuracy and the Feasible degree of text detection can be improved.Certainly, the width of anchor point and/or height can also
Think other values, details are not described herein again.
In this exemplary embodiment, the characteristic spectrum Res4f of the original image given for one, acquisition has W × H
× C Feature Mapping, wherein W × H indicates that the width and height of characteristic spectrum Res4f, C indicate the port number of characteristic spectrum Res4f.
When densely being slided on characteristic spectrum Res4f using 3 × 3 sliding window window, each sliding window can use 3 × 3 × C
Convolution feature goes to give a forecast.In each prediction, the position of the horizontal position of sliding window and K anchor point be it is fixed, thus may be used
The predicted value on characteristic spectrum Res4f to be mapped to the coordinate of original image.
Further, on each position of characteristic spectrum Res4f, detector can all provide text/non-textual and set
The vertical coordinate of the score of reliability and K anchor point, and the text filed fragment that each anchor point detects can be carried out non-
Maximum inhibits, and only retains the candidate regions that text/non-textual confidence level is greater than a setting value.Such as text/non-text can be retained
This confidence level is greater than 0.7 candidate regions, naturally it is also possible to other text/non-textual confidence levels is set, to obtain text area
Domain, details are not described herein.
In step S430, the sequence information of multiple text filed fragments is obtained, and will according to the sequence information
Multiple text filed crumb forms are at text filed chip sequence.
In this exemplary embodiment, since the text filed fragment obtained in step S420 is independent prediction, hold
Erroneous detection is easily caused, for example window, brick, fence or leaf are identified as text, while may also can lose some confidence levels
Low text debris plume is more unintelligible etc. than fading if any shadow occlusion, text itself.But it can in conjunction with global scene information
To reduce this kind of erroneous detection or missing inspection, and a word or sentence are usually made of multiple characters, therefore multiple text filed
Fragment has very strong sequence characteristic.Multiple text filed fragments can be input to a shot and long term memory models, to obtain
The sequence information of multiple text filed fragments, then according to the sequence information by multiple text filed crumb forms at text filed
Chip sequence.
In this exemplary embodiment, shot and long term memory models can be two-way shot and long term memory models, including preceding to length
Phase memory models and backward shot and long term memory models, do Series Modeling for the text information to the left and right sides.Can wherein it lead to
The information above of too long each text filed fragment of short-term memory model learning;Learnt by the backward shot and long term memory models
The context information of each text filed fragment;Then learning outcome series connection is merged can be obtained text filed chip sequence.
In this exemplary embodiment, shot and long term memory models can be updated periodically its internal state Ht, expression formula
As shown in formula (1):
Wherein,It is activation primitive, generally uses Sigmoid function, shown in expression formula such as formula (2):
Wherein, Xt∈R3×3×CIt is the convolution results that t-th of (3 × 3) sliding window generates on characteristic spectrum Res4f, Ht-1Table
Show the internal state of t-1 moment shot and long term memory models.
In this exemplary embodiment, each shot and long term memory models are 128 dimensions in two-way shot and long term memory models, i.e.,
The dimension of two-way shot and long term memory models internal state is 256 dimensions, that is to say, that Ht∈R256.It can certainly be according to practical need
Change the dimension of shot and long term memory models, such as can be set to 512 dimensions, 1024 dimensions, the present invention does not do specific limit to this
It is fixed.In addition, the depth of shot and long term memory models can be increased to 3 layers to improve the extraction efficiency of sequence information and precision
Or 4 layers, principle is identical as the principle of two-way shot and long term memory models, and details are not described herein.
In step S440, the text filed chip sequence polymerization is obtained text filed.
In this exemplary embodiment, after obtaining text filed chip sequence, text filed chip sequence can be carried out
Polymerization forms text filed.Fig. 6 shows the flow chart of text filed chip sequence polymerization, as shown in fig. 6, in step S601
In, a text filed fragment is chosen in multiple text filed chip sequences as target candidate area Bi(i is positive integer);
In step S602, distance objective candidate regions B is foundiNearest candidate regions Bj(j ≠ i, and j is positive integer), candidate regions BjWith
Target candidate area BiDistance less than 50, longitudinal overlap ratio be greater than 0.7;In step S603, recurrence uses above-mentioned rule,
Qualified neighboring candidate area is linked to be a line of text.
It will be appreciated by those skilled in the art that the distance between above-mentioned candidate regions, less than 50, longitudinal overlap ratio is greater than
0.7 is merely illustrative, and also can choose other suitable distances and longitudinal overlap ratio, details are not described herein.
The present invention uses, and there is width to fix, the anchor point of alterable height carries out feature extraction to characteristic spectrum, generate text
Region fragment;Then the sequence information of text filed fragment is extracted by two-way shot and long term memory models, is formed text filed broken
Piece sequence;Finally the polymerization of text filed chip sequence is obtained text filed.Since the horizontal direction value of anchor point is fixed, institute
Need to only be returned to vertical direction, compared with existing Method for text detection needs to predict 4 coordinates of target, can subtract
The search space of few model optimization, reduces the learning difficulty of entire text detection model, is convenient at the same time and calculates
Better model is trained under resource.
It further, can be by text by the coordinate of text block in the detected text filed chip sequence of anchor point
The height of region fragment and its centre coordinate determine.The height and central point vertical component predicted value and true value of text block
Calculation formula is as follows:
Wherein, νcIndicate the predicted value of the regressive object of text block central point vertical component;cyIndicate text filed fragment
The vertical component of central point;Indicate the vertical component of corresponding anchor point centre coordinate; haIndicate the height of corresponding anchor point;νh
Indicate the predicted value of the height regressive object of text block;H indicates the height of text filed fragment;Indicate text block central point
The true value of the regressive object of vertical component;Indicate the true value of text block central point vertical component;Indicate the height of text block
The true value of regressive object;h*Indicate the true value of the height of text block.
In this exemplary embodiment, the internal state H of shot and long term memory modelstIt can be mapped to and next connect entirely
Layer and output layer are connect, for predicting the text confidence level and text position of text filed chip sequence, as shown in figure 4, can adopt
2K vertical coordinate offset, 2K text confidence level and 1K line of text horizontal boundary offset are predicted with mapping result, and wherein K is
Anchor point number in characteristic spectrum Res4f in each pixel.
In this exemplary embodiment, shot and long term memory models have been integrated into entire text detection model by the present invention, because
This can train shot and long term memory models together with image characteristics extraction device.The training method of model may is that be schemed first
As keeping the ratio of width to height that can zoom to 600 the most short side of original image, the layer in shot and long term memory models can in processor
To use the random numbers of Gaussian distribution that mean value is 0, variance 0.01, standard deviation are 0.1 to be initialized;Then to model into
The optimization of row stochastic gradient descent, makes model have the smallest loss function.When stochastic gradient descent optimizes, the potential energy item of model
For 0.9, weight 0.0005, the number of samples of each trained batch is 128, wherein the ratio of positive negative sample is 1:1, is passed through
Initial learning rate is set as 0.001, learning rate can be down to 0.0001 after 90000 iteration of training, then retraining 10000
Secondary iteration.The objective function of model optimization is as follows:
Wherein, L (si,sj,ok) indicate global optimization objective function;Respectively indicate text classification, text
This positioning, boundary optimize the loss function of task;siIndicate that i-th of anchor point is predicted to be the probability of text;Indicate i-th of anchor
Point whether be text true value;νjIndicate j-th of anchor point vertical direction coordinate predicted value;Indicate j-th of anchor point vertical direction
The true value of coordinate;okIndicate the horizontal offset predicted value of kth boundary anchor point retive boundary;Indicate k-th of boundary anchor point
The horizontal offset true value of retive boundary;θ1And θ2Respectively indicate the loss weight of String localization task, boundary optimization task;
Ns、Nν、NoRespectively indicate text classification in each trained batch, String localization, the anchor point number used of boundary optimization task.
It in this exemplary embodiment, can be to text area after the polymerization acquisition of text filed chip sequence is text filed
The boundary in domain is modified.Because the width for manually setting anchor point is 16 or other fixed values, but not all text filed
Width be all the multiple of 16 or other fixed values, so can have a boundary inaccuracy of text filed horizontal direction positioning
Problem, thus need to be modified text filed boundary.
Further, the modified method in boundary can be by between predicted boundary anchor point and the boundary of text filed true value
Relative displacement correct error, the calculation formula of horizontal offset are as follows:
Wherein, O indicates the horizontal direction offset regressive object of prediction;xsideIndicate that the text block currently segmented is opposite
In the predicted value of the left side offset of original non-cutting text block;Indicate the central point horizontal component of corresponding anchor point;waTable
Show the fixed width of current anchor (such as: 16);O*Indicate the text block currently segmented relative to inclined on the left of former non-cutting text block
The regressive object true value of shifting amount;Indicate offset of the text block currently segmented relative to the left side of original non-cutting text block
The true value of amount.
Text filed right side offset can also be modified using above-mentioned modification method.By correcting step above
After rapid, the text filed character area that can relatively accurately orient in image of prediction.
In an embodiment of the present invention, a kind of Method for text detection is additionally provided, the specific method is as follows, according to the present invention
Text filed detection method obtain it is text filed;Then the text information in text filed is identified, to obtain text
This.
Method for text detection in the present invention can be used for in the objects such as nameplate, identity card, business card, advertising pictures
Text is detected, and Fig. 7 is shown using the text detection model in the present invention to nameplate, identity card, business card or advertising pictures
In the flow chart that is detected of text, pass through upper left corner bottom right angular coordinate and confidence level to text each in image and carry out
Detection is to identify text.
Further to help to understand, the present invention is illustrated for detecting to the text in business card, and Fig. 8 is shown
The method flow diagram of text detection is carried out to business card, as shown in figure 8, in step S801, to the line of text region in business card
It is detected.Used detection method is text filed detection method of the invention, to determine the line of text area in business card
Domain.In step S802, the text information in line of text region is identified, forms text.Specifically, it is determined that line of text
Behind region, the text in each line of text region is identified one by one, such as a line of text region includes N (N is positive integer)
A text, can upper left corner bottom right angular coordinate to first text and confidence level detected to determine first text, connect
The upper left corner bottom right angular coordinate and confidence level of second text are detected to determine second text, repeat above-mentioned step
Suddenly, until completing the detection to n-th text;It is last that text is obtained according to the text identified.
The device of the invention embodiment introduced below can be used for executing the above-mentioned text filed detection method of the present invention.
For undisclosed details in apparatus of the present invention embodiment, the embodiment of the above-mentioned form validation method of the present invention is please referred to.
Fig. 9 shows a kind of structural schematic diagram of text filed detection device.Referring to shown in Fig. 9, text filed detection
900 may include: characteristic extracting module 901, fragment obtains module 902, sequence information obtains module 903, aggregation module 904.
Specifically, characteristic extracting module 901, the feature for extracting an original image obtain a characteristic spectrum;Fragment obtains
Modulus block 902 obtains multiple text filed fragments for carrying out word area detection based on the characteristic spectrum;Sequence information
Module 903 is obtained, for obtaining the sequence information of multiple text filed fragments, and will be multiple according to the sequence information
The text filed crumb form is at text filed chip sequence;Aggregation module 904 is used for the text filed chip sequence
Polymerization obtains text filed.
In this exemplary embodiment, characteristic extracting module 901 includes convolution unit 9011, for passing through residual error network mould
Type carries out multistage convolution to the original image and obtains the characteristic spectrum.
In this exemplary embodiment, it includes anchor point setting unit 9021 and feature extraction unit that fragment, which obtains module 902,
9022。
Specifically, anchor point setting unit 9021, for one group of anchor point to be arranged in each pixel of the characteristic spectrum;
Feature extraction unit 9022 generates multiple described text filed for extracting the corresponding characteristics of image of the anchor point by sliding window
Fragment.
Further, anchor point setting unit 9021 includes width setup unit 90211, in the characteristic spectrum
The fixed anchor point of one group of width is set in each pixel.
In this exemplary embodiment, it includes sequence information extraction unit 9023 that sequence information, which obtains module 902, and being used for will
The text filed fragment is input to a shot and long term memory models, to obtain the sequence information of the text filed fragment, and root
According to the sequence information by multiple text filed crumb forms at text filed chip sequence.
In this exemplary embodiment, the shot and long term memory network is two-way shot and long term memory models, including preceding to length
Phase memory models and backward shot and long term memory models.
Further, sequence information extraction unit 9023 includes forward sequence extraction unit 90231, backward sequential extraction procedures
Unit 90232 and combining unit 90233.
Specifically, forward sequence extraction unit 90231, for learning each institute by the forward direction shot and long term memory models
State the information above of text filed fragment;Backward sequence extraction unit 90232, for remembering mould by the backward shot and long term
Type learns the context information of each text filed fragment;Combining unit 90233 is formed for learning outcome to connect to merge
The text filed chip sequence.
In this exemplary embodiment, aggregation module 904 includes selecting unit 9041 and connection unit 9042.
Specifically, selecting unit 9041, for choosing multiple text filed fragments, adjacent is described text filed
At a distance of a set distance between fragment, and longitudinal overlap ratio is greater than a setting value;Connection unit 9042, being used for will be multiple described
Text filed fragment connection obtains described text filed
In this exemplary embodiment, text filed detection device 900 further includes sample acquisition module 905, model training mould
Block 906 and judgment module 907.
Specifically, sample acquisition module 905, for obtaining training sample;Model training module 906, for according to
Training sample and different learning rates carry out machine instruction to the residual error network model and the two-way shot and long term memory models
Practice;Judging unit 907, it is minimum for the loss function when the residual error network model and the two-way shot and long term memory models
When, terminate machine training.
In this exemplary embodiment, text filed detection device 900 further includes mapping block 908, is used for the text
Region chip sequence maps to full articulamentum, to predict the text confidence level and text position of text filed fragment.
In this exemplary embodiment, text filed detection device 900 further includes correction module 909, for according to the text
The text filed boundary is corrected in relative displacement between the predicted boundary of one's respective area and the text filed boundary true value
Error.
In this exemplary embodiment, it includes screening unit 9024 that fragment, which obtains module 902, for described text filed
Fragment carries out non-maxima suppression, to obtain the text filed fragment that multiple text confidence levels are greater than a setting value.
It should be noted that although being referred to several modules or list of text filed detection device in the above detailed description
Member, but this division is not enforceable.In fact, embodiment according to the present invention, it is above-described two or more
Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould
The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention
Other embodiments.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes
Or adaptive change follow general principle of the invention and including the present invention it is undocumented in the art known in often
Knowledge or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by appended
Claim point out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is only limited by the attached claims.
Claims (14)
1. a kind of text filed detection method characterized by comprising
The feature for extracting an original image obtains a characteristic spectrum;
Text filed detection is carried out based on the characteristic spectrum, obtains multiple text filed fragments;
The sequence information of multiple text filed fragments is obtained, and will be multiple described text filed broken according to the sequence information
Piece forms text filed chip sequence;
The text filed chip sequence polymerization is obtained text filed.
2. text filed detection method according to claim 1, which is characterized in that the feature for extracting an original image obtains
One characteristic spectrum includes:
Multistage convolution is carried out to the original image by residual error network model and obtains the characteristic spectrum.
3. text filed detection method according to claim 1, which is characterized in that carry out text based on the characteristic spectrum
Region detection, obtaining multiple text filed fragments includes:
One group of anchor point is set in each pixel of the characteristic spectrum;
The corresponding characteristics of image of the anchor point is extracted by sliding window, generates multiple text filed fragments.
4. text filed detection method according to claim 3, which is characterized in that in each pixel of the characteristic spectrum
One group of anchor point of upper setting includes:
The fixed anchor point of one group of width is set in each pixel of the characteristic spectrum.
5. text filed detection method according to claim 1, which is characterized in that obtain multiple text filed fragments
Sequence information, and include: at text filed chip sequence by multiple text filed crumb forms according to the sequence information
The text filed fragment is input to a shot and long term memory models, to obtain the sequence letter of the text filed fragment
Breath, and according to the sequence information by multiple text filed crumb forms at text filed chip sequence.
6. text filed detection method according to claim 5, which is characterized in that the shot and long term memory network is two-way
Shot and long term memory models, including it is preceding to shot and long term memory models and backward shot and long term memory models;By the text filed fragment
A shot and long term memory models are input to, to obtain the sequence information of the text filed fragment, and will according to the sequence information
Multiple text filed crumb forms include: at text filed chip sequence
Learn the information above of each text filed fragment by the forward direction shot and long term memory models;
Learn the context information of each text filed fragment by the backward shot and long term memory models;
Learning outcome is connected to merge and obtains the text filed chip sequence.
7. text filed detection method according to claim 1, which is characterized in that gather the text filed chip sequence
It closes to obtain and described text filed includes:
Multiple text filed fragments are chosen, wherein the spacing of the adjacent two text filed fragment is less than a set distance,
And longitudinal overlap ratio is greater than a setting value;
Multiple text filed fragment connections are obtained described text filed.
8. text filed detection method according to claim 6, which is characterized in that the text filed detection method is also wrapped
It includes:
Obtain training sample;
According to the training sample and different learning rates to the residual error network model and the two-way shot and long term memory models
Carry out machine training;
When the loss function minimum of the residual error network model and the two-way shot and long term memory models, terminate machine training.
9. text filed detection method according to claim 1, which is characterized in that the text filed detection method is also wrapped
It includes:
The text filed chip sequence is mapped into full articulamentum, with predict the text filed fragment text confidence level and
Text position.
10. text filed detection method according to claim 1, which is characterized in that the text filed detection method is also
Include:
According to the relative displacement amendment between the text filed predicted boundary and the text filed boundary true value
Text filed boundary error.
11. text filed detection method according to claim 1, which is characterized in that carry out text based on the characteristic spectrum
One's respective area detection, obtaining multiple text filed fragments includes:
Non-maxima suppression is carried out to the text filed fragment, to obtain the text that multiple text confidence levels are greater than a setting value
Region fragment.
12. a kind of Method for text detection characterized by comprising
According to claim 1, text filed detection method described in any one of -11 obtains described text filed;
Identify it is described it is text filed in text information, obtain text.
13. a kind of text filed detection device characterized by comprising
Characteristic extracting module, the feature for extracting an original image obtain a characteristic spectrum;
Fragment obtains module, for carrying out text filed detection based on the characteristic spectrum, obtains multiple text filed fragments;
Sequence information obtains module, believes for obtaining the sequence information of multiple text filed fragments, and according to the sequence
Breath is by multiple text filed crumb forms at text filed chip sequence;
Aggregation module, it is text filed for obtaining the text filed chip sequence polymerization.
14. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor
The text inspection as described in text filed detection method of any of claims 1-11 and claim 12 is realized when row
Survey method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810225220.2A CN110263779A (en) | 2018-03-19 | 2018-03-19 | Text filed detection method and device, Method for text detection, computer-readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810225220.2A CN110263779A (en) | 2018-03-19 | 2018-03-19 | Text filed detection method and device, Method for text detection, computer-readable medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110263779A true CN110263779A (en) | 2019-09-20 |
Family
ID=67911871
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810225220.2A Withdrawn CN110263779A (en) | 2018-03-19 | 2018-03-19 | Text filed detection method and device, Method for text detection, computer-readable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110263779A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401428A (en) * | 2020-03-12 | 2020-07-10 | Oppo广东移动通信有限公司 | Image classification method and device, electronic equipment and storage medium |
CN111461132A (en) * | 2020-04-17 | 2020-07-28 | 支付宝(杭州)信息技术有限公司 | Method and device for assisting in labeling OCR image data |
CN113378815A (en) * | 2021-06-16 | 2021-09-10 | 南京信息工程大学 | Model for scene text positioning recognition and training and recognition method thereof |
-
2018
- 2018-03-19 CN CN201810225220.2A patent/CN110263779A/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
ZHI TIAN ETAL: "Detecting Text in Natural Image with Connectionist Text Proposal Network", 《ECCV 2016》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401428A (en) * | 2020-03-12 | 2020-07-10 | Oppo广东移动通信有限公司 | Image classification method and device, electronic equipment and storage medium |
CN111461132A (en) * | 2020-04-17 | 2020-07-28 | 支付宝(杭州)信息技术有限公司 | Method and device for assisting in labeling OCR image data |
CN111461132B (en) * | 2020-04-17 | 2022-05-10 | 支付宝(杭州)信息技术有限公司 | Method and device for assisting in labeling OCR image data |
CN113378815A (en) * | 2021-06-16 | 2021-09-10 | 南京信息工程大学 | Model for scene text positioning recognition and training and recognition method thereof |
CN113378815B (en) * | 2021-06-16 | 2023-11-24 | 南京信息工程大学 | Scene text positioning and identifying system and training and identifying method thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109271967B (en) | Method and device for recognizing text in image, electronic equipment and storage medium | |
CN109117831B (en) | Training method and device of object detection network | |
CN111709339B (en) | Bill image recognition method, device, equipment and storage medium | |
CN108304835A (en) | character detecting method and device | |
CN109522942B (en) | Image classification method and device, terminal equipment and storage medium | |
WO2022213879A1 (en) | Target object detection method and apparatus, and computer device and storage medium | |
US10007867B2 (en) | Systems and methods for identifying entities directly from imagery | |
CN109325541A (en) | Method and apparatus for training pattern | |
CN108304761A (en) | Method for text detection, device, storage medium and computer equipment | |
US11255678B2 (en) | Classifying entities in digital maps using discrete non-trace positioning data | |
CN112232341B (en) | Text detection method, electronic device and computer readable medium | |
CN110263779A (en) | Text filed detection method and device, Method for text detection, computer-readable medium | |
CN109086834A (en) | Character identifying method, device, electronic equipment and storage medium | |
CN111723815A (en) | Model training method, image processing method, device, computer system, and medium | |
Vargas Munoz et al. | Deploying machine learning to assist digital humanitarians: making image annotation in OpenStreetMap more efficient | |
CN112712036A (en) | Traffic sign recognition method and device, electronic equipment and computer storage medium | |
CN113468330A (en) | Information acquisition method, device, equipment and medium | |
CN114978624A (en) | Phishing webpage detection method, device and equipment and storage medium | |
CN113570512A (en) | Image data processing method, computer and readable storage medium | |
CN116935368A (en) | Deep learning model training method, text line detection method, device and equipment | |
CN109934185A (en) | Data processing method and device, medium and calculating equipment | |
Shi et al. | Anchor Free remote sensing detector based on solving discrete polar coordinate equation | |
CN113255819B (en) | Method and device for identifying information | |
JP7416614B2 (en) | Learning model generation method, computer program, information processing device, and information processing method | |
CN115131291A (en) | Object counting model training method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190920 |