CN109934181A - Text recognition method, device, equipment and computer-readable medium - Google Patents
Text recognition method, device, equipment and computer-readable medium Download PDFInfo
- Publication number
- CN109934181A CN109934181A CN201910204450.5A CN201910204450A CN109934181A CN 109934181 A CN109934181 A CN 109934181A CN 201910204450 A CN201910204450 A CN 201910204450A CN 109934181 A CN109934181 A CN 109934181A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- priori frame
- frame
- priori
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a kind of text recognition method, device, equipment and computer-readable mediums, are related to field of computer technology.One specific embodiment of this method includes: to position the table in described image in the image with structured text;The candidate text box of the one or more in the table is identified using text detection model, wherein, the text detection model is based on single phase more frame detector models, using model obtained from one or more features figure and priori frame corresponding with the characteristic pattern, the priori frame is defined by priori frame ratio (scale) and priori frame length-width ratio;Merge one or more of candidate text boxes, to identify the text in the text box after merging.The embodiment can be improved text identification accuracy rate and efficiency in the image with structured text.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of text recognition method, device, equipment and computer can
Read medium.
Background technique
Recently, with the development of computer vision technique, using optical character identification (Optical Character
Recognition, OCR) text recognition technique application field it is more and more extensive, as Car license recognition, certificate identification, traffic mark
Know Text region etc.;And after completing text identification, the text transcription task of high speed can be completed in the case where on-keyboard,.
In realizing process of the present invention, at least there are the following problems in the prior art for inventor's discovery:
The particularity of application scenarios helps to improve text identification effect, and especially, wherein text is mainly filled
Image in table --- that is, having the images such as the image, such as bill, invoice of structured text, text can be made full use of
The characteristics of this structural organization, the effect of Lai Tigao text identification.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of text recognition method, device, equipment and computer-readable medium,
It can be improved text identification accuracy rate and the efficiency in the image with structured text.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of text recognition method is provided, is wrapped
It includes: it is described in text detection model, using one or more features figure, and the length of priori frame corresponding with the characteristic pattern
Wide ratio, before identifying one or more text boxes in the table, further includes: fixed in the image with structured text
Table in the described image of position;The candidate text box of the one or more in the table is identified using text detection model, wherein
The text detection model be based on single phase more frame detector models, using one or more features figure and with the feature
Scheme model obtained from corresponding priori frame, the priori frame is determined by priori frame ratio (scale) and priori frame length-width ratio
Justice;Merge one or more of candidate text boxes, to identify the text in the text box after merging.
According to an aspect of an embodiment of the present invention, it is described using text detection model identify one in the table or
Before multiple candidate's text boxes, further includes: long according to the length of training image, the width of the training image, the priori frame
Width than with the priori frame ratio, calculate the length and width of the priori frame;The training image is inputted into single phase more frames
Detector model is matched using the external frame of minimum of the text in the priori frame and the training image, and training obtains
The text detection model.
According to an aspect of an embodiment of the present invention, the basic network of the text detection model is convolutional neural networks
Preceding A layers of convolutional layer in VGG16, A are greater than the integer equal to 4 and less than or equal to 15.
According to an aspect of an embodiment of the present invention, the priori frame length-width ratio include one in 3,5,7,10 and 15 or
It is multiple.
According to an aspect of an embodiment of the present invention, described in the image with structured text, position described image
In table, comprising: the edge of the image after determining binary conversion treatment using edge detection algorithm;Side based on described image
The table in described image is obtained by filtration by Hough transformation for edge.
According to a second aspect of the embodiments of the present invention, a kind of text identification device is provided, comprising: Table recognition module,
For positioning the table in described image in the image with structured text;Text box detection module, for using text
Detection model identifies the candidate text box of the one or more in the table, wherein the text detection model is based on single-order
The more frame detector models of section, using mould obtained from one or more features figure and priori frame corresponding with the characteristic pattern
Type, the priori frame are defined by priori frame ratio (scale) and priori frame length-width ratio;Text box determining module, is used for
Merge one or more of candidate text boxes, to identify the text in the text box after merging.
According to a second aspect of the embodiments of the present invention, described device further includes model training module: the model training mould
Root tuber is calculated according to the length of training image, the width of the training image, the priori frame length-width ratio and the priori frame ratio
The length and width of the priori frame;The training image is inputted into single phase more frame detector models, utilizes the priori frame
It is matched with the external frame of minimum of the text in the training image, training obtains the text detection model.
According to a second aspect of the embodiments of the present invention, the basic network of the text detection model is convolutional neural networks
Preceding A layers of convolutional layer in VGG16, A are greater than the integer equal to 4 and less than or equal to 15.
According to a second aspect of the embodiments of the present invention, the priori frame length-width ratio include one in 3,5,7,10 and 15 or
It is multiple.
According to a second aspect of the embodiments of the present invention, further includes: the Table recognition module is true using edge detection algorithm
The edge of image after determining binary conversion treatment;The edge of the Table recognition module based on described image, passes through Hough transformation mistake
Filter obtains the table in described image.
According to a third aspect of the embodiments of the present invention, a kind of electronic equipment of text identification is provided, comprising: one or more
A processor;Storage device, for storing one or more programs, when one or more of programs are one or more of
Processor executes, so that one or more of processors realize such as above-mentioned method.
According to a fourth aspect of the embodiments of the present invention, a kind of computer-readable medium is provided, computer is stored thereon with
Program realizes such as above-mentioned method when described program is executed by processor.
One embodiment in foregoing invention has the following advantages that or the utility model has the advantages that because for having structured text
Image, the table before the text box for determining encirclement text, first in positioning image;Then using with special ratios and length
The priori frame of wide ratio identifies the candidate text box in table;And for the candidate text box recognized, with combined side
Formula determines wherein containing the text box of text, allows to fast and effeciently determine the text in the image with structured text
Frame, to improve the accuracy rate and efficiency of the text identification to this kind of image.
Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment
With explanation.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Figure 1A -1B is the schematic diagram of the image according to an embodiment of the present invention with structured text;
Fig. 2 is the schematic diagram of the main flow of text recognition method according to an embodiment of the present invention;
Fig. 3 is the table schematic diagram in the image according to an embodiment of the present invention with structured text;
Fig. 4 A-4C is the schematic diagram of candidate text box according to an embodiment of the present invention;
Fig. 5 is the schematic diagram of text box according to an embodiment of the present invention;
Fig. 6 is the structural schematic diagram of this paper detection model according to an embodiment of the present invention;
Fig. 7 is the schematic diagram of the primary structure of text identification device according to an embodiment of the present invention;
Fig. 8 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein;
Fig. 9 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present invention.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
OCR identification is generally divided into two classes: one kind is the text identification under limitation scene, such as printed text, identity card, ticket
It is identified according to bank's card number.Another kind of is the text identification under non-limitation scene, such as license plate, billboard and traffic mark text
Identification.
In limitation scene, text identification success rate is mainly by the shadow of the factors such as text composition, character script and word length
It rings.In untethered scene, text identification success rate is mainly illuminated by the light, shade, blocks, the diversity of background and font, size
It is influenced with factors such as the complexity of typesetting.
In limitation scene, the accuracy rate and efficiency of text identification can be improved using the particularity of application scenarios, such as
The characteristics of having the image of structured text for bill, invoice etc., can making full use of text organizational structuring, improves text and knows
Other effect.
It is the schematic diagram with structured text image of the embodiment of the present invention referring to Figure 1A, Figure 1A.In the present embodiment,
Structured text refers to the text for filling text in a tabular form.It include 9 tables in Fig. 1, the length-width ratio of table is simultaneously endless
It is exactly the same, corresponding text is filled in each table.Wherein, the length-width ratio of table refers to the length of table and the width of table
Ratio, the length-width ratio of table reflects the shape of table.
In the image with structured text, not only need to identify the lesser table of length-width ratio, it is also necessary to identify length and width
Bigger table.Table biggish for length-width ratio, there are the lower technical problems of text identification success rate.
Although table can also be with implicit side it is worth noting that, showing the solid line for dividing table in Figure 1A
Formula exists.For example, although text is also considered as being filled in there is no being divided by specific line between text in Figure 1B
In the table of " hiding ".Further, it is to be appreciated that the element for dividing table may include other shapes in addition to solid line
Formula, for example, dotted line, chain-dotted line etc..
Referring to fig. 2, Fig. 2 is the schematic diagram of the main flow of text recognition method according to an embodiment of the present invention, passes through text
This detection model, the table suitable for different length-width ratios.Specifically includes the following steps:
S201, the table in the image with structured text, in positioning image.
Image with structured text usually exists in a manner of printed matter, such as: the invoice of paper printing.When
So, such image can also exist in a manner of electronic bill.Whether the image with structured text of papery, still
The image with structured text of electronic edition can carry out text identification to it using embodiment as described herein.
As an example, pass through image capture device, the available image with structured text.Image Acquisition is set
It is standby to can be the equipment such as scanner, camera.
With continued reference to Figure 1A, the image in Figure 1A includes 9 tables.Herein, the text in table and table is claimed
Be structured text.Identify structured text, it is necessary first to position these tables.
As previously mentioned, in one embodiment of the invention, the table of the image with structured text is by a plurality of reality
Line composition.Position solid line, so that it may the table in image.It is drawn it should be noted that table can be with the line of physical presence
Point --- for example, solid line, dotted line, chain-dotted line etc., it is also possible to divide with the line of virtual presence (" hiding ").Namely
It says, embodiments herein can be adapted for image shown in Figure 1A or Figure 1B, the table divided in Figure 1A with solid line is positioned, or
With the line of virtual presence come the table that divides in positioning Figure 1B.
It is understood that image generally includes background and object to be identified --- namely structuring as described herein text
This.In order to distinguish background and structured text, need to carry out binary conversion treatment to the image with structured text, to be tied
The profile of structure text.The binary conversion treatment refers to one assigned in predefined two values to the pixel in image,
For example, indicating the image after binary conversion treatment using the matrix formed by 0,1.
After carrying out binary conversion treatment to image, being determined using edge detection algorithm has the side of the image of structured text
Edge.Wherein, edge detection algorithm can be following manner at least one of: single order edge detection algorithm, Sobel edge detection
Algorithm, Canny edge detection algorithm, second order edge detection algorithm and Laplacian edge detection algorithm.
After detection obtains having the edge of image of structured text, the table in the image is obtained by filtration by Hough transformation
Lattice.Wherein, Hough transformation can identify the geometry in image.
As an example, the table that structured text is related to is two-dimensional table under normal conditions, then in Hough variation
The horizontal direction line and perpendicular directional line of table can be determined in the following manner.
Horizontal direction line: the line relative to trunnion axis angle within positive and negative 5 degree, it is preferable that length be greater than image it is wide
2/3rds of degree.
Perpendicular directional line: the line relative to vertical axis angle within positive and negative 5 degree, it is preferable that it is high that length is greater than image
2/3rds of degree.
Referring to the table schematic diagram that Fig. 3, Fig. 3 are in the image according to an embodiment of the present invention with structured text.It is right
In image shown in figure 1A, table shown in Fig. 3 is obtained after binaryzation, edge detection algorithm and Hough transformation.It needs
It is bright, position the table in Fig. 3, it can be understood as positioning is wherein filled with the table of text, i.e., with structured text
Image in, determine the specific location of table.
In the above-described embodiments, after by binaryzation, edge detection algorithm and Hough transformation, can be accurately positioned has knot
Table in the image of structure text.
S202, the candidate text box of the one or more in the table is identified using text detection model, wherein the text
This detection model is based on single phase more frame detector models, using one or more features figure and corresponding with the characteristic pattern
Priori frame obtained from model, the priori frame is defined by priori frame ratio (scale) and priori frame length-width ratio.
Single phase more frame detector models (Single Shot Multibox Detector, SSD) are a kind of depth nerves
Network model.The framework of SSD is broadly divided into two parts: a part is the basic network positioned at front end, generally use to image into
The depth convolutional neural networks classified go as the front network, such as: (the Visual Geometry of visual geometric group 16
Group-16, VGG16) network.The effect of this part is to carry out preliminary feature extraction to target.Another part is to be located at rear end
Analysis On Multi-scale Features detect network, generally use one group of cascade convolutional neural networks as the back-end network, in different rulers
The characteristic pattern generated under degree to front network carries out feature extraction again.
In one embodiment of the invention, the basic network part of SSD can use the part convolutional layer of VGG16,
VGG16 includes 16 layer networks.In embodiments of the present invention, due to the occupied pixel of font in the image with structured text
It is less, generally there was only 20 to 30 pixels, so to improve the speed for obtaining the wherein text box filled with text, it is possible to reduce
The number of plies of network.I.e. the basic network of text detection model is the part convolutional layer in VGG16, it may for example comprise preceding A in VGG16
Layer convolutional layer, A are greater than the integer equal to 4 and less than or equal to 15.
SSD uses multiple characteristic patterns with different scale, and each characteristic pattern divides an image into the grid of different scale,
Such as 4 × 4 or 8 × 8.Each characteristic pattern has corresponding priori frame.Assuming that SSD uses m characteristic pattern, then referring to formula
(1) the priori frame ratio S of k-th of characteristic pattern is calculatedk。
Wherein, SminIt is the minimum priori frame ratio of characteristic pattern, SmaxIt is the maximum priori frame ratio of characteristic pattern, SkIt is kth
The priori frame ratio of a characteristic pattern, priori frame ratio refer to ratio of the priori frame size relative to image size.It can be used
Priori frame predicts the text box for having in the image of structured text.
It should be noted that training image can not in the training process of the embodiment of the present invention text detection model
Including table, or including one or more tables.Training image is the image for training text detection model;In training image
Text be also referred to as the positive sample of training image, it is to be understood that the external frame of minimum of positive sample is known.
In the establishment process of the text detection model of the embodiment of the present invention, for the feature of structured text, set in advance
Set the length-width ratio for predicting the priori frame of structured text.It should be noted that priori frame length-width ratio employed herein, is
Refer to the length-width ratio for the structured text being actually commonly used namely the length-width ratio of table, helps to improve detection success rate in this way.
Specifically, pass through priori frame ratio SkWith preset priori frame length-width ratio ar, k-th of priori frame can be obtained
LengthWith the width of priori frameTherefore, priori frame is defined by priori frame ratio and priori length-width ratio.Specifically,
The length of priori frameWith the width of priori frameIt is respectively as follows:
Assuming that the length of training image is H, width W, length-width ratio r, the then length of priori frameWith priori frame
WidthIt can be rewritten as
In being trained simultaneously for the text detection model to the embodiment of the present invention, target loss function is solved,
Target loss function L (x, c, l, g) is confidence loss LconfL is lost in (x, c) and positionlocThe sum of (x, l, g), specifically:
Wherein, N is priori frame and the matched quantity of true value frame, and true value frame refers to true text box in training image.
In formula (4), x is position coordinates, and c is the background recognized or the confidence level of text box, and l is the location information of priori frame, g
It is the location information of true value frame, α is weight coefficient, and the initial value of α may be configured as 1.By solving the minimum value of the loss function,
It is final to obtain α.
Target loss includes confidence loss and position loss simultaneously in formula (4).In the training process, it is lost by reducing
Functional value may insure also to improve the position credibility of text box while promoting the classification confidence of text box.By right
The multiple optimization of target loss function, is continuously improved the prediction accuracy of model, to train the text detection of better performances
Model.
After the width of the length and priori frame that obtain priori frame, it is determined that the specific size of priori frame.Training is schemed
As exporting the text box of training image in input SSD model.Utilize the text in formula (4), priori frame and known training image
The external frame (true value frame) of minimum, finally obtain α, and then obtain text detection model.It is understood that text detection mode
It is based on SSD model, using model obtained from one or more features figure and priori frame corresponding with characteristic pattern.
Due to for the image with structured text, using multiple priori frames close to the practical length-width ratio of text box
Length-width ratio ak, therefore it is higher using the accuracy of the text box of text detection model as described herein acquisition.As an example,
Priori frame length-width ratio akIt can be set at least one of following parameter: 3,5,7,10 and 15.
In one embodiment of the invention, it after training obtains text detection model, in text detection model, uses
With various sizes of one or more features figure, the length-width ratio of priori frame corresponding to binding characteristic figure, to identify in S201
In candidate text box in the table that recognizes.
S203, merge one or more of candidate text boxes, to identify the text in the text box after merging.
In order to improve the accuracy of prediction text box, closed to multiple candidate text boxes are obtained using text detection model
And.
Fig. 4 A to Fig. 4 C is the schematic diagram of recognized candidate text box according to an embodiment of the present invention.Specifically, such as
Shown in figure, the text box in Fig. 4 A includes: that Shijiazhuang City of Hebei Province Hongqi District is happy.Text box in Fig. 4 B includes: provincial stone man
The happy bank in the Hongqi District Zhuan Shi.Text box in Fig. 4 C includes: the happy silver in northern province Shijiazhuang City Hongqi District.
Obviously, the candidate text box in above-mentioned Fig. 4 A to Fig. 4 C does not include complete text.Wherein, Fig. 4 A missing is complete
The right half of text;Fig. 4 B lacks the left half of complete text;Fig. 4 C lacks the left half and right half of complete text.
In order to further increase the image detection success rate of structured text, structured text to be identified can obtained
It is merged on the basis of text box, the text after then identification merges in text box.
It is the schematic diagram of the text box according to an embodiment of the present invention predicted referring to Fig. 5, Fig. 5.Text box in Fig. 5,
It is the region obtained after merging three in Fig. 4 A to Fig. 4 C candidate text boxes.The text of text box is scarce in Fig. 4 A to Fig. 4 C
Incomplete text is lost, and obtaining the text of text box in Fig. 5 through merging is complete text.
It further, is the structural schematic diagram of text detection model according to an embodiment of the present invention referring to Fig. 6, Fig. 6.Text inspection
The basic network of model is surveyed as preceding 5 layers of convolutional layer in convolutional neural networks VGG16, respectively conv1 to conv5.To have
The image of structured text inputs text detection model, successively after conv1 to conv5, then to conv3, conv4 and
The output of conv5 carries out the fusion of feature figure layer to identify candidate text box, finally merges the candidate text box (example recognized
Such as the candidate text box in Fig. 4 A-4C) text box (for example, text box in Fig. 5) finally to be predicted.
Fig. 7 is the schematic diagram of the primary structure of text identification device according to an embodiment of the present invention, and text identification device can
To implement text recognition method, as shown in figure 9, text identification device specifically includes:
Table recognition module 701, for positioning the table in described image in the image with structured text;
Text box detection module 702, for identifying that the one or more in the table is candidate using text detection model
Text box, wherein the text detection model be based on single phase more frame detector models, use one or more features figure with
And model obtained from priori frame corresponding with the characteristic pattern, the priori frame are by priori frame ratio (scale) and priori
Frame length-width ratio defines;
Text box determining module 703, for merging one or more of candidate text boxes, to identify the text after merging
Text in frame.
In one embodiment of the invention, text identification device further includes model training module 704: the model instruction
Practice module 704 according to the length of training image, the width of the training image, the priori frame length-width ratio and the priori frame ratio
Example, calculates the length and width of the priori frame;The training image is inputted into single phase more frame detector models, using described
The external frame of the minimum of priori frame and the text in the training image is matched, and training obtains the text detection model.
In one embodiment of the invention, the basic network of the text detection model is convolutional neural networks VGG16
In preceding A layers of convolutional layer, A is greater than equal to 4 and is less than or equal to 15 integer.
In one embodiment of the invention, priori frame is wide than including one or more of 3,5,7,10 and 15.
In one embodiment of the invention, Table recognition module 701 determines binary conversion treatment using edge detection algorithm
The edge of image afterwards;The table in described image is obtained by filtration by Hough transformation for edge based on described image.
Fig. 8 is shown can be using the text recognition method of the embodiment of the present invention and the exemplary system of text identification device
Framework 800.
As shown in figure 8, system architecture 800 may include terminal device 801,802,803, network 804 and server 805.
Network 804 between terminal device 801,802,803 and server 805 to provide the medium of communication link.Network 804 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 801,802,803 and be interacted by network 804 with server 805, to receive or send out
Send message etc..Specifically, various telecommunication customer end applications can be installed on terminal device 801,802,803, by specific structure
Change the image transmitting of text to the server 805 for being used to identify herein.
Terminal device 801,802,803 can be the various electronic equipments with display screen and supported web page browsing, packet
Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 805 can be to provide the server of various services, such as utilize terminal device 801,802,803 to user
The image with structured text uploaded provides the back-stage management server (merely illustrative) of text identification.Back-stage management service
Device can carry out analyzing etc. to data such as the images received processing, and by processing result (such as the text recognized -- be only
Example) feed back to terminal device.
It should be noted that monitoring method provided by the embodiment of the present invention is generally executed by server 805, correspondingly,
Text identification device is generally positioned in server 805.
It should be understood that the number of terminal device, network and server in Fig. 8 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
Below with reference to Fig. 9, it illustrates the computer systems 900 for the server for being suitable for being used to realize the embodiment of the present invention
Structural schematic diagram.Terminal device shown in Fig. 9 is only an example, should not function and use scope to the embodiment of the present invention
Bring any restrictions.
As shown in figure 9, computer system 900 includes central processing unit (CPU) 901, it can be read-only according to being stored in
Program in memory (ROM) 902 or be loaded into the program in random access storage device (RAM) 903 from storage section 908 and
Execute various movements appropriate and processing.In RAM 903, also it is stored with system 900 and operates required various programs and data.
CPU 901, ROM 902 and RAM 903 are connected with each other by bus 904.Input/output (I/O) interface 905 is also connected to always
Line 904.
I/O interface 905 is connected to lower component: the importation 906 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 907 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 908 including hard disk etc.;
And the communications portion 909 of the network interface card including LAN card, modem etc..Communications portion 909 via such as because
The network of spy's net executes communication process.Driver 910 is also connected to I/O interface 905 as needed.Detachable media 911, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 910, in order to read from thereon
Computer program be mounted into storage section 908 as needed.
Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention
Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer
Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.?
In such embodiment, which can be downloaded and installed from network by communications portion 909, and/or from can
Medium 911 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 901, system of the invention is executed
The above-mentioned function of middle restriction.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires
Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this
In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned
Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet
It includes sending module, obtain module, determining module and first processing module.Wherein, the title of these modules is under certain conditions simultaneously
The restriction to the module itself is not constituted, for example, sending module is also described as " sending picture to the server-side connected
The unit of acquisition request ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be
Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes
Obtain the equipment:
In the image with structured text, the table in described image is positioned;
The candidate text box of the one or more in the table is identified using text detection model, wherein the text inspection
Surveying model is based on single phase more frame detector models, using one or more features figure and elder generation corresponding with the characteristic pattern
Model obtained from frame is tested, the priori frame is defined by priori frame ratio (scale) and priori frame length-width ratio;
Merge one or more of candidate text boxes, to identify the text in the text box after merging.
Technical solution according to an embodiment of the present invention, because surrounding text determining for the image with structured text
Table before this text box, first in positioning image;Then known using the priori frame with special ratios and length-width ratio
Candidate text box in other table;And for the candidate text box recognized, determined in a manner of merging wherein containing text
This text box, allows to fast and effeciently determine the text box in the image with structured text, to improve pair
The accuracy rate and efficiency of the text identification of this kind of image.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright
It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any
Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention
Within.
Claims (12)
1. a kind of text recognition method characterized by comprising
In the image with structured text, the table in described image is positioned;
The candidate text box of the one or more in the table is identified using text detection model, wherein the text detection mould
Type is based on single phase more frame detector models, using one or more features figure and priori frame corresponding with the characteristic pattern
Obtained from model, the priori frame is defined by priori frame ratio (scale) and priori frame length-width ratio;
Merge one or more of candidate text boxes,
To identify the text in the text box after merging.
2. text recognition method according to claim 1, which is characterized in that described to identify the table using text detection model
Before the candidate text box of one or more in lattice, further includes:
According to the length of training image, the width of the training image, the priori frame length-width ratio and the priori frame ratio, meter
Calculate the length and width of the priori frame;
The training image is inputted into single phase more frame detector models, utilizes the text in the priori frame and the training image
This external frame of minimum is matched, and training obtains the text detection model.
3. text recognition method according to claim 1 or claim 2, which is characterized in that the basic network of the text detection model
For preceding A layers of convolutional layer in convolutional neural networks VGG16, A is greater than the integer equal to 4 and less than or equal to 15.
4. text recognition method according to claim 1, which is characterized in that the priori frame length-width ratio includes 3,5,7,10 and
One or more of 15.
5. text recognition method according to claim 1, which is characterized in that it is described in the image with structured text,
Position the table in described image, comprising:
The edge of image after determining binary conversion treatment using edge detection algorithm;
The table in described image is obtained by filtration by Hough transformation for edge based on described image.
6. a kind of text identification device characterized by comprising
Table recognition module, for positioning the table in described image in the image with structured text;
Text box detection module, for identifying the candidate text box of the one or more in the table using text detection model,
Wherein, the text detection model be based on single phase more frame detector models, using one or more features figure and with institute
Model obtained from the corresponding priori frame of characteristic pattern is stated, the priori frame is by priori frame ratio (scale) and priori frame length and width
Than defining;
Text box determining module, for merging one or more of candidate text boxes, to identify in the text box after merging
Text.
7. text identification device according to claim 6, which is characterized in that described device further includes model training module:
The model training module according to the length of training image, the width of the training image, the priori frame length-width ratio and
The priori frame ratio calculates the length and width of the priori frame;
The training image is inputted into single phase more frame detector models, utilizes the text in the priori frame and the training image
This external frame of minimum is matched, and training obtains the text detection model.
8. text identification device according to claim 6, which is characterized in that the basic network of the text detection model is volume
Preceding A layers of convolutional layer in product neural network VGG16, A are greater than the integer equal to 4 and less than or equal to 15.
9. text identification device according to claim 6, which is characterized in that the priori frame length-width ratio includes 3,5,7,10 and
One or more of 15.
10. text identification device according to claim 6, which is characterized in that further include:
The Table recognition module determines the edge of the image after binary conversion treatment using edge detection algorithm;
The edge of the Table recognition module based on described image, the table in described image is obtained by filtration by Hough transformation.
11. a kind of electronic equipment for text identification characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as method as claimed in any one of claims 1 to 5.
12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor
Such as method as claimed in any one of claims 1 to 5 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910204450.5A CN109934181A (en) | 2019-03-18 | 2019-03-18 | Text recognition method, device, equipment and computer-readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910204450.5A CN109934181A (en) | 2019-03-18 | 2019-03-18 | Text recognition method, device, equipment and computer-readable medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109934181A true CN109934181A (en) | 2019-06-25 |
Family
ID=66987541
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910204450.5A Pending CN109934181A (en) | 2019-03-18 | 2019-03-18 | Text recognition method, device, equipment and computer-readable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109934181A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458164A (en) * | 2019-08-07 | 2019-11-15 | 深圳市商汤科技有限公司 | Image processing method, device, equipment and computer readable storage medium |
CN110490190A (en) * | 2019-07-04 | 2019-11-22 | 贝壳技术有限公司 | A kind of structured image character recognition method and system |
CN111539412A (en) * | 2020-04-21 | 2020-08-14 | 上海云从企业发展有限公司 | Image analysis method, system, device and medium based on OCR |
CN111626244A (en) * | 2020-05-29 | 2020-09-04 | 中国工商银行股份有限公司 | Image recognition method, image recognition device, electronic equipment and medium |
CN111695553A (en) * | 2020-06-05 | 2020-09-22 | 北京百度网讯科技有限公司 | Form recognition method, device, equipment and medium |
CN111767769A (en) * | 2019-08-14 | 2020-10-13 | 北京京东尚科信息技术有限公司 | Text extraction method and device, electronic equipment and storage medium |
CN112016481A (en) * | 2020-08-31 | 2020-12-01 | 民生科技有限责任公司 | Financial statement information detection and identification method based on OCR |
CN112036321A (en) * | 2020-09-01 | 2020-12-04 | 南京工程学院 | Safety helmet detection method based on SSD-ROI cascaded neural network |
CN112926469A (en) * | 2021-03-04 | 2021-06-08 | 浪潮云信息技术股份公司 | Certificate identification method based on deep learning OCR and layout structure |
CN113723422A (en) * | 2021-09-08 | 2021-11-30 | 重庆紫光华山智安科技有限公司 | License plate information determination method, system, device and medium |
CN113903036A (en) * | 2021-11-10 | 2022-01-07 | 北京百度网讯科技有限公司 | Text recognition method and device, electronic equipment, medium and product |
CN113936286A (en) * | 2021-11-29 | 2022-01-14 | 中国平安人寿保险股份有限公司 | Image text recognition method and device, computer equipment and storage medium |
CN114998906A (en) * | 2022-05-25 | 2022-09-02 | 北京百度网讯科技有限公司 | Text detection method, model training method, device, electronic equipment and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107688808A (en) * | 2017-08-07 | 2018-02-13 | 电子科技大学 | A kind of quickly natural scene Method for text detection |
CN107766809A (en) * | 2017-10-09 | 2018-03-06 | 平安科技(深圳)有限公司 | Electronic installation, billing information recognition methods and computer-readable recording medium |
CN107862303A (en) * | 2017-11-30 | 2018-03-30 | 平安科技(深圳)有限公司 | Information identifying method, electronic installation and the readable storage medium storing program for executing of form class diagram picture |
CN108154145A (en) * | 2018-01-24 | 2018-06-12 | 北京地平线机器人技术研发有限公司 | The method and apparatus for detecting the position of the text in natural scene image |
CN108921166A (en) * | 2018-06-22 | 2018-11-30 | 深源恒际科技有限公司 | Medical bill class text detection recognition method and system based on deep neural network |
-
2019
- 2019-03-18 CN CN201910204450.5A patent/CN109934181A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107688808A (en) * | 2017-08-07 | 2018-02-13 | 电子科技大学 | A kind of quickly natural scene Method for text detection |
CN107766809A (en) * | 2017-10-09 | 2018-03-06 | 平安科技(深圳)有限公司 | Electronic installation, billing information recognition methods and computer-readable recording medium |
CN107862303A (en) * | 2017-11-30 | 2018-03-30 | 平安科技(深圳)有限公司 | Information identifying method, electronic installation and the readable storage medium storing program for executing of form class diagram picture |
CN108154145A (en) * | 2018-01-24 | 2018-06-12 | 北京地平线机器人技术研发有限公司 | The method and apparatus for detecting the position of the text in natural scene image |
CN108921166A (en) * | 2018-06-22 | 2018-11-30 | 深源恒际科技有限公司 | Medical bill class text detection recognition method and system based on deep neural network |
Non-Patent Citations (2)
Title |
---|
JUNHWAN RYU等: "Chinese Character Boxes: Single Shot Detector Network for Chinese Character Detection", 《APPLIED SCIENCES》 * |
罗聪: "基于深度学习的自然场景文件检测与定位方法研究", 《中国优秀硕士论文全文数据库》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110490190A (en) * | 2019-07-04 | 2019-11-22 | 贝壳技术有限公司 | A kind of structured image character recognition method and system |
CN110490190B (en) * | 2019-07-04 | 2021-10-26 | 贝壳技术有限公司 | Structured image character recognition method and system |
CN110458164A (en) * | 2019-08-07 | 2019-11-15 | 深圳市商汤科技有限公司 | Image processing method, device, equipment and computer readable storage medium |
CN111767769A (en) * | 2019-08-14 | 2020-10-13 | 北京京东尚科信息技术有限公司 | Text extraction method and device, electronic equipment and storage medium |
CN111539412B (en) * | 2020-04-21 | 2021-02-26 | 上海云从企业发展有限公司 | Image analysis method, system, device and medium based on OCR |
CN111539412A (en) * | 2020-04-21 | 2020-08-14 | 上海云从企业发展有限公司 | Image analysis method, system, device and medium based on OCR |
CN111626244A (en) * | 2020-05-29 | 2020-09-04 | 中国工商银行股份有限公司 | Image recognition method, image recognition device, electronic equipment and medium |
CN111626244B (en) * | 2020-05-29 | 2023-09-12 | 中国工商银行股份有限公司 | Image recognition method, device, electronic equipment and medium |
CN111695553B (en) * | 2020-06-05 | 2023-09-08 | 北京百度网讯科技有限公司 | Form identification method, device, equipment and medium |
CN111695553A (en) * | 2020-06-05 | 2020-09-22 | 北京百度网讯科技有限公司 | Form recognition method, device, equipment and medium |
CN112016481A (en) * | 2020-08-31 | 2020-12-01 | 民生科技有限责任公司 | Financial statement information detection and identification method based on OCR |
CN112016481B (en) * | 2020-08-31 | 2024-05-10 | 民生科技有限责任公司 | OCR-based financial statement information detection and recognition method |
CN112036321A (en) * | 2020-09-01 | 2020-12-04 | 南京工程学院 | Safety helmet detection method based on SSD-ROI cascaded neural network |
CN112926469A (en) * | 2021-03-04 | 2021-06-08 | 浪潮云信息技术股份公司 | Certificate identification method based on deep learning OCR and layout structure |
CN113723422A (en) * | 2021-09-08 | 2021-11-30 | 重庆紫光华山智安科技有限公司 | License plate information determination method, system, device and medium |
CN113723422B (en) * | 2021-09-08 | 2023-10-17 | 重庆紫光华山智安科技有限公司 | License plate information determining method, system, equipment and medium |
CN113903036A (en) * | 2021-11-10 | 2022-01-07 | 北京百度网讯科技有限公司 | Text recognition method and device, electronic equipment, medium and product |
CN113903036B (en) * | 2021-11-10 | 2023-11-03 | 北京百度网讯科技有限公司 | Text recognition method and device, electronic equipment, medium and product |
CN113936286A (en) * | 2021-11-29 | 2022-01-14 | 中国平安人寿保险股份有限公司 | Image text recognition method and device, computer equipment and storage medium |
CN114998906A (en) * | 2022-05-25 | 2022-09-02 | 北京百度网讯科技有限公司 | Text detection method, model training method, device, electronic equipment and medium |
CN114998906B (en) * | 2022-05-25 | 2023-08-08 | 北京百度网讯科技有限公司 | Text detection method, training method and device of model, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109934181A (en) | Text recognition method, device, equipment and computer-readable medium | |
CN113657390B (en) | Training method of text detection model and text detection method, device and equipment | |
Salomon et al. | Deep learning for image-based automatic dial meter reading: Dataset and baselines | |
CN109117831A (en) | The training method and device of object detection network | |
US20210209395A1 (en) | Method, electronic device, and storage medium for recognizing license plate | |
CN107911753A (en) | Method and apparatus for adding digital watermarking in video | |
US11967132B2 (en) | Lane marking detecting method, apparatus, electronic device, storage medium, and vehicle | |
CN112560862B (en) | Text recognition method and device and electronic equipment | |
CN110008956A (en) | Invoice key message localization method, device, computer equipment and storage medium | |
CN109409241A (en) | Video checking method, device, equipment and readable storage medium storing program for executing | |
CN108170751B (en) | Method and apparatus for handling image | |
WO2023001059A1 (en) | Detection method and apparatus, electronic device and storage medium | |
CN108694719A (en) | image output method and device | |
CN109993749A (en) | The method and apparatus for extracting target image | |
CN111523439B (en) | Method, system, device and medium for target detection based on deep learning | |
US20220375186A1 (en) | Method and apparatus for generating bounding box, device and storage medium | |
CN111950355A (en) | Seal identification method and device and electronic equipment | |
CN110414502A (en) | Image processing method and device, electronic equipment and computer-readable medium | |
CN112581344A (en) | Image processing method and device, computer equipment and storage medium | |
CN115273123B (en) | Bill identification method, device and equipment and computer storage medium | |
CN115188000A (en) | Text recognition method and device based on OCR (optical character recognition), storage medium and electronic equipment | |
CN115578739A (en) | Training method and device for realizing IA classification model by combining RPA and AI | |
CN110188815B (en) | Feature point sampling method, device, equipment and storage medium | |
CN113762109A (en) | Training method of character positioning model and character positioning method | |
CN114332809A (en) | Image identification method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190625 |