CN109934181A

CN109934181A - Text recognition method, device, equipment and computer-readable medium

Info

Publication number: CN109934181A
Application number: CN201910204450.5A
Authority: CN
Inventors: 安耀祖
Original assignee: Beijing Haiyi Tongzhan Information Technology Co Ltd
Current assignee: Beijing Haiyi Tongzhan Information Technology Co Ltd
Priority date: 2019-03-18
Filing date: 2019-03-18
Publication date: 2019-06-25

Abstract

The invention discloses a kind of text recognition method, device, equipment and computer-readable mediums, are related to field of computer technology.One specific embodiment of this method includes: to position the table in described image in the image with structured text；The candidate text box of the one or more in the table is identified using text detection model, wherein, the text detection model is based on single phase more frame detector models, using model obtained from one or more features figure and priori frame corresponding with the characteristic pattern, the priori frame is defined by priori frame ratio (scale) and priori frame length-width ratio；Merge one or more of candidate text boxes, to identify the text in the text box after merging.The embodiment can be improved text identification accuracy rate and efficiency in the image with structured text.

Description

Text recognition method, device, equipment and computer-readable medium

Technical field

The present invention relates to field of computer technology more particularly to a kind of text recognition method, device, equipment and computer can Read medium.

Background technique

Recently, with the development of computer vision technique, using optical character identification (Optical Character Recognition, OCR) text recognition technique application field it is more and more extensive, as Car license recognition, certificate identification, traffic mark Know Text region etc.；And after completing text identification, the text transcription task of high speed can be completed in the case where on-keyboard,.

In realizing process of the present invention, at least there are the following problems in the prior art for inventor's discovery:

The particularity of application scenarios helps to improve text identification effect, and especially, wherein text is mainly filled Image in table --- that is, having the images such as the image, such as bill, invoice of structured text, text can be made full use of The characteristics of this structural organization, the effect of Lai Tigao text identification.

Summary of the invention

In view of this, the embodiment of the present invention provides a kind of text recognition method, device, equipment and computer-readable medium, It can be improved text identification accuracy rate and the efficiency in the image with structured text.

To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of text recognition method is provided, is wrapped It includes: it is described in text detection model, using one or more features figure, and the length of priori frame corresponding with the characteristic pattern Wide ratio, before identifying one or more text boxes in the table, further includes: fixed in the image with structured text Table in the described image of position；The candidate text box of the one or more in the table is identified using text detection model, wherein The text detection model be based on single phase more frame detector models, using one or more features figure and with the feature Scheme model obtained from corresponding priori frame, the priori frame is determined by priori frame ratio (scale) and priori frame length-width ratio Justice；Merge one or more of candidate text boxes, to identify the text in the text box after merging.

According to an aspect of an embodiment of the present invention, it is described using text detection model identify one in the table or Before multiple candidate's text boxes, further includes: long according to the length of training image, the width of the training image, the priori frame Width than with the priori frame ratio, calculate the length and width of the priori frame；The training image is inputted into single phase more frames Detector model is matched using the external frame of minimum of the text in the priori frame and the training image, and training obtains The text detection model.

According to an aspect of an embodiment of the present invention, the basic network of the text detection model is convolutional neural networks Preceding A layers of convolutional layer in VGG16, A are greater than the integer equal to 4 and less than or equal to 15.

According to an aspect of an embodiment of the present invention, the priori frame length-width ratio include one in 3,5,7,10 and 15 or It is multiple.

According to an aspect of an embodiment of the present invention, described in the image with structured text, position described image In table, comprising: the edge of the image after determining binary conversion treatment using edge detection algorithm；Side based on described image The table in described image is obtained by filtration by Hough transformation for edge.

According to a second aspect of the embodiments of the present invention, a kind of text identification device is provided, comprising: Table recognition module, For positioning the table in described image in the image with structured text；Text box detection module, for using text Detection model identifies the candidate text box of the one or more in the table, wherein the text detection model is based on single-order The more frame detector models of section, using mould obtained from one or more features figure and priori frame corresponding with the characteristic pattern Type, the priori frame are defined by priori frame ratio (scale) and priori frame length-width ratio；Text box determining module, is used for Merge one or more of candidate text boxes, to identify the text in the text box after merging.

According to a second aspect of the embodiments of the present invention, described device further includes model training module: the model training mould Root tuber is calculated according to the length of training image, the width of the training image, the priori frame length-width ratio and the priori frame ratio The length and width of the priori frame；The training image is inputted into single phase more frame detector models, utilizes the priori frame It is matched with the external frame of minimum of the text in the training image, training obtains the text detection model.

According to a second aspect of the embodiments of the present invention, the basic network of the text detection model is convolutional neural networks Preceding A layers of convolutional layer in VGG16, A are greater than the integer equal to 4 and less than or equal to 15.

According to a second aspect of the embodiments of the present invention, the priori frame length-width ratio include one in 3,5,7,10 and 15 or It is multiple.

According to a second aspect of the embodiments of the present invention, further includes: the Table recognition module is true using edge detection algorithm The edge of image after determining binary conversion treatment；The edge of the Table recognition module based on described image, passes through Hough transformation mistake Filter obtains the table in described image.

According to a third aspect of the embodiments of the present invention, a kind of electronic equipment of text identification is provided, comprising: one or more A processor；Storage device, for storing one or more programs, when one or more of programs are one or more of Processor executes, so that one or more of processors realize such as above-mentioned method.

According to a fourth aspect of the embodiments of the present invention, a kind of computer-readable medium is provided, computer is stored thereon with Program realizes such as above-mentioned method when described program is executed by processor.

One embodiment in foregoing invention has the following advantages that or the utility model has the advantages that because for having structured text Image, the table before the text box for determining encirclement text, first in positioning image；Then using with special ratios and length The priori frame of wide ratio identifies the candidate text box in table；And for the candidate text box recognized, with combined side Formula determines wherein containing the text box of text, allows to fast and effeciently determine the text in the image with structured text Frame, to improve the accuracy rate and efficiency of the text identification to this kind of image.

Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment With explanation.

Detailed description of the invention

Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:

Figure 1A -1B is the schematic diagram of the image according to an embodiment of the present invention with structured text；

Fig. 2 is the schematic diagram of the main flow of text recognition method according to an embodiment of the present invention；

Fig. 3 is the table schematic diagram in the image according to an embodiment of the present invention with structured text；

Fig. 4 A-4C is the schematic diagram of candidate text box according to an embodiment of the present invention；

Fig. 5 is the schematic diagram of text box according to an embodiment of the present invention；

Fig. 6 is the structural schematic diagram of this paper detection model according to an embodiment of the present invention；

Fig. 7 is the schematic diagram of the primary structure of text identification device according to an embodiment of the present invention；

Fig. 8 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein；

Fig. 9 is adapted for the structural schematic diagram for the computer system for realizing the server of the embodiment of the present invention.

Specific embodiment

Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.

OCR identification is generally divided into two classes: one kind is the text identification under limitation scene, such as printed text, identity card, ticket It is identified according to bank's card number.Another kind of is the text identification under non-limitation scene, such as license plate, billboard and traffic mark text Identification.

In limitation scene, text identification success rate is mainly by the shadow of the factors such as text composition, character script and word length It rings.In untethered scene, text identification success rate is mainly illuminated by the light, shade, blocks, the diversity of background and font, size It is influenced with factors such as the complexity of typesetting.

In limitation scene, the accuracy rate and efficiency of text identification can be improved using the particularity of application scenarios, such as The characteristics of having the image of structured text for bill, invoice etc., can making full use of text organizational structuring, improves text and knows Other effect.

It is the schematic diagram with structured text image of the embodiment of the present invention referring to Figure 1A, Figure 1A.In the present embodiment, Structured text refers to the text for filling text in a tabular form.It include 9 tables in Fig. 1, the length-width ratio of table is simultaneously endless It is exactly the same, corresponding text is filled in each table.Wherein, the length-width ratio of table refers to the length of table and the width of table Ratio, the length-width ratio of table reflects the shape of table.

In the image with structured text, not only need to identify the lesser table of length-width ratio, it is also necessary to identify length and width Bigger table.Table biggish for length-width ratio, there are the lower technical problems of text identification success rate.

Although table can also be with implicit side it is worth noting that, showing the solid line for dividing table in Figure 1A Formula exists.For example, although text is also considered as being filled in there is no being divided by specific line between text in Figure 1B In the table of " hiding ".Further, it is to be appreciated that the element for dividing table may include other shapes in addition to solid line Formula, for example, dotted line, chain-dotted line etc..

Referring to fig. 2, Fig. 2 is the schematic diagram of the main flow of text recognition method according to an embodiment of the present invention, passes through text This detection model, the table suitable for different length-width ratios.Specifically includes the following steps:

S201, the table in the image with structured text, in positioning image.

Image with structured text usually exists in a manner of printed matter, such as: the invoice of paper printing.When So, such image can also exist in a manner of electronic bill.Whether the image with structured text of papery, still The image with structured text of electronic edition can carry out text identification to it using embodiment as described herein.

As an example, pass through image capture device, the available image with structured text.Image Acquisition is set It is standby to can be the equipment such as scanner, camera.

With continued reference to Figure 1A, the image in Figure 1A includes 9 tables.Herein, the text in table and table is claimed Be structured text.Identify structured text, it is necessary first to position these tables.

As previously mentioned, in one embodiment of the invention, the table of the image with structured text is by a plurality of reality Line composition.Position solid line, so that it may the table in image.It is drawn it should be noted that table can be with the line of physical presence Point --- for example, solid line, dotted line, chain-dotted line etc., it is also possible to divide with the line of virtual presence (" hiding ").Namely It says, embodiments herein can be adapted for image shown in Figure 1A or Figure 1B, the table divided in Figure 1A with solid line is positioned, or With the line of virtual presence come the table that divides in positioning Figure 1B.

It is understood that image generally includes background and object to be identified --- namely structuring as described herein text This.In order to distinguish background and structured text, need to carry out binary conversion treatment to the image with structured text, to be tied The profile of structure text.The binary conversion treatment refers to one assigned in predefined two values to the pixel in image, For example, indicating the image after binary conversion treatment using the matrix formed by 0,1.

After carrying out binary conversion treatment to image, being determined using edge detection algorithm has the side of the image of structured text Edge.Wherein, edge detection algorithm can be following manner at least one of: single order edge detection algorithm, Sobel edge detection Algorithm, Canny edge detection algorithm, second order edge detection algorithm and Laplacian edge detection algorithm.

After detection obtains having the edge of image of structured text, the table in the image is obtained by filtration by Hough transformation Lattice.Wherein, Hough transformation can identify the geometry in image.

As an example, the table that structured text is related to is two-dimensional table under normal conditions, then in Hough variation The horizontal direction line and perpendicular directional line of table can be determined in the following manner.

Horizontal direction line: the line relative to trunnion axis angle within positive and negative 5 degree, it is preferable that length be greater than image it is wide 2/3rds of degree.

Perpendicular directional line: the line relative to vertical axis angle within positive and negative 5 degree, it is preferable that it is high that length is greater than image 2/3rds of degree.

Referring to the table schematic diagram that Fig. 3, Fig. 3 are in the image according to an embodiment of the present invention with structured text.It is right In image shown in figure 1A, table shown in Fig. 3 is obtained after binaryzation, edge detection algorithm and Hough transformation.It needs It is bright, position the table in Fig. 3, it can be understood as positioning is wherein filled with the table of text, i.e., with structured text Image in, determine the specific location of table.

In the above-described embodiments, after by binaryzation, edge detection algorithm and Hough transformation, can be accurately positioned has knot Table in the image of structure text.

S202, the candidate text box of the one or more in the table is identified using text detection model, wherein the text This detection model is based on single phase more frame detector models, using one or more features figure and corresponding with the characteristic pattern Priori frame obtained from model, the priori frame is defined by priori frame ratio (scale) and priori frame length-width ratio.

Single phase more frame detector models (Single Shot Multibox Detector, SSD) are a kind of depth nerves Network model.The framework of SSD is broadly divided into two parts: a part is the basic network positioned at front end, generally use to image into The depth convolutional neural networks classified go as the front network, such as: (the Visual Geometry of visual geometric group 16 Group-16, VGG16) network.The effect of this part is to carry out preliminary feature extraction to target.Another part is to be located at rear end Analysis On Multi-scale Features detect network, generally use one group of cascade convolutional neural networks as the back-end network, in different rulers The characteristic pattern generated under degree to front network carries out feature extraction again.

In one embodiment of the invention, the basic network part of SSD can use the part convolutional layer of VGG16, VGG16 includes 16 layer networks.In embodiments of the present invention, due to the occupied pixel of font in the image with structured text It is less, generally there was only 20 to 30 pixels, so to improve the speed for obtaining the wherein text box filled with text, it is possible to reduce The number of plies of network.I.e. the basic network of text detection model is the part convolutional layer in VGG16, it may for example comprise preceding A in VGG16 Layer convolutional layer, A are greater than the integer equal to 4 and less than or equal to 15.

SSD uses multiple characteristic patterns with different scale, and each characteristic pattern divides an image into the grid of different scale, Such as 4 × 4 or 8 × 8.Each characteristic pattern has corresponding priori frame.Assuming that SSD uses m characteristic pattern, then referring to formula (1) the priori frame ratio S of k-th of characteristic pattern is calculated_k。

Wherein, S_minIt is the minimum priori frame ratio of characteristic pattern, S_maxIt is the maximum priori frame ratio of characteristic pattern, S_kIt is kth The priori frame ratio of a characteristic pattern, priori frame ratio refer to ratio of the priori frame size relative to image size.It can be used Priori frame predicts the text box for having in the image of structured text.

It should be noted that training image can not in the training process of the embodiment of the present invention text detection model Including table, or including one or more tables.Training image is the image for training text detection model；In training image Text be also referred to as the positive sample of training image, it is to be understood that the external frame of minimum of positive sample is known.

In the establishment process of the text detection model of the embodiment of the present invention, for the feature of structured text, set in advance Set the length-width ratio for predicting the priori frame of structured text.It should be noted that priori frame length-width ratio employed herein, is Refer to the length-width ratio for the structured text being actually commonly used namely the length-width ratio of table, helps to improve detection success rate in this way.

Specifically, pass through priori frame ratio S_kWith preset priori frame length-width ratio a_r, k-th of priori frame can be obtained LengthWith the width of priori frameTherefore, priori frame is defined by priori frame ratio and priori length-width ratio.Specifically, The length of priori frameWith the width of priori frameIt is respectively as follows:

Assuming that the length of training image is H, width W, length-width ratio r, the then length of priori frameWith priori frame WidthIt can be rewritten as

In being trained simultaneously for the text detection model to the embodiment of the present invention, target loss function is solved, Target loss function L (x, c, l, g) is confidence loss L_confL is lost in (x, c) and position_locThe sum of (x, l, g), specifically:

Wherein, N is priori frame and the matched quantity of true value frame, and true value frame refers to true text box in training image. In formula (4), x is position coordinates, and c is the background recognized or the confidence level of text box, and l is the location information of priori frame, g It is the location information of true value frame, α is weight coefficient, and the initial value of α may be configured as 1.By solving the minimum value of the loss function, It is final to obtain α.

Target loss includes confidence loss and position loss simultaneously in formula (4).In the training process, it is lost by reducing Functional value may insure also to improve the position credibility of text box while promoting the classification confidence of text box.By right The multiple optimization of target loss function, is continuously improved the prediction accuracy of model, to train the text detection of better performances Model.

After the width of the length and priori frame that obtain priori frame, it is determined that the specific size of priori frame.Training is schemed As exporting the text box of training image in input SSD model.Utilize the text in formula (4), priori frame and known training image The external frame (true value frame) of minimum, finally obtain α, and then obtain text detection model.It is understood that text detection mode It is based on SSD model, using model obtained from one or more features figure and priori frame corresponding with characteristic pattern.

Due to for the image with structured text, using multiple priori frames close to the practical length-width ratio of text box Length-width ratio a_k, therefore it is higher using the accuracy of the text box of text detection model as described herein acquisition.As an example, Priori frame length-width ratio a_kIt can be set at least one of following parameter: 3,5,7,10 and 15.

In one embodiment of the invention, it after training obtains text detection model, in text detection model, uses With various sizes of one or more features figure, the length-width ratio of priori frame corresponding to binding characteristic figure, to identify in S201 In candidate text box in the table that recognizes.

S203, merge one or more of candidate text boxes, to identify the text in the text box after merging.

In order to improve the accuracy of prediction text box, closed to multiple candidate text boxes are obtained using text detection model And.

Fig. 4 A to Fig. 4 C is the schematic diagram of recognized candidate text box according to an embodiment of the present invention.Specifically, such as Shown in figure, the text box in Fig. 4 A includes: that Shijiazhuang City of Hebei Province Hongqi District is happy.Text box in Fig. 4 B includes: provincial stone man The happy bank in the Hongqi District Zhuan Shi.Text box in Fig. 4 C includes: the happy silver in northern province Shijiazhuang City Hongqi District.

Obviously, the candidate text box in above-mentioned Fig. 4 A to Fig. 4 C does not include complete text.Wherein, Fig. 4 A missing is complete The right half of text；Fig. 4 B lacks the left half of complete text；Fig. 4 C lacks the left half and right half of complete text.

In order to further increase the image detection success rate of structured text, structured text to be identified can obtained It is merged on the basis of text box, the text after then identification merges in text box.

It is the schematic diagram of the text box according to an embodiment of the present invention predicted referring to Fig. 5, Fig. 5.Text box in Fig. 5, It is the region obtained after merging three in Fig. 4 A to Fig. 4 C candidate text boxes.The text of text box is scarce in Fig. 4 A to Fig. 4 C Incomplete text is lost, and obtaining the text of text box in Fig. 5 through merging is complete text.

It further, is the structural schematic diagram of text detection model according to an embodiment of the present invention referring to Fig. 6, Fig. 6.Text inspection The basic network of model is surveyed as preceding 5 layers of convolutional layer in convolutional neural networks VGG16, respectively conv1 to conv5.To have The image of structured text inputs text detection model, successively after conv1 to conv5, then to conv3, conv4 and The output of conv5 carries out the fusion of feature figure layer to identify candidate text box, finally merges the candidate text box (example recognized Such as the candidate text box in Fig. 4 A-4C) text box (for example, text box in Fig. 5) finally to be predicted.

Fig. 7 is the schematic diagram of the primary structure of text identification device according to an embodiment of the present invention, and text identification device can To implement text recognition method, as shown in figure 9, text identification device specifically includes:

Table recognition module 701, for positioning the table in described image in the image with structured text；

Text box detection module 702, for identifying that the one or more in the table is candidate using text detection model Text box, wherein the text detection model be based on single phase more frame detector models, use one or more features figure with And model obtained from priori frame corresponding with the characteristic pattern, the priori frame are by priori frame ratio (scale) and priori Frame length-width ratio defines；

Text box determining module 703, for merging one or more of candidate text boxes, to identify the text after merging Text in frame.

In one embodiment of the invention, text identification device further includes model training module 704: the model instruction Practice module 704 according to the length of training image, the width of the training image, the priori frame length-width ratio and the priori frame ratio Example, calculates the length and width of the priori frame；The training image is inputted into single phase more frame detector models, using described The external frame of the minimum of priori frame and the text in the training image is matched, and training obtains the text detection model.

In one embodiment of the invention, the basic network of the text detection model is convolutional neural networks VGG16 In preceding A layers of convolutional layer, A is greater than equal to 4 and is less than or equal to 15 integer.

In one embodiment of the invention, priori frame is wide than including one or more of 3,5,7,10 and 15.

In one embodiment of the invention, Table recognition module 701 determines binary conversion treatment using edge detection algorithm The edge of image afterwards；The table in described image is obtained by filtration by Hough transformation for edge based on described image.

Fig. 8 is shown can be using the text recognition method of the embodiment of the present invention and the exemplary system of text identification device Framework 800.

As shown in figure 8, system architecture 800 may include terminal device 801,802,803, network 804 and server 805. Network 804 between terminal device 801,802,803 and server 805 to provide the medium of communication link.Network 804 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 801,802,803 and be interacted by network 804 with server 805, to receive or send out Send message etc..Specifically, various telecommunication customer end applications can be installed on terminal device 801,802,803, by specific structure Change the image transmitting of text to the server 805 for being used to identify herein.

Terminal device 801,802,803 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..

Server 805 can be to provide the server of various services, such as utilize terminal device 801,802,803 to user The image with structured text uploaded provides the back-stage management server (merely illustrative) of text identification.Back-stage management service Device can carry out analyzing etc. to data such as the images received processing, and by processing result (such as the text recognized -- be only Example) feed back to terminal device.

It should be noted that monitoring method provided by the embodiment of the present invention is generally executed by server 805, correspondingly, Text identification device is generally positioned in server 805.

It should be understood that the number of terminal device, network and server in Fig. 8 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

Below with reference to Fig. 9, it illustrates the computer systems 900 for the server for being suitable for being used to realize the embodiment of the present invention Structural schematic diagram.Terminal device shown in Fig. 9 is only an example, should not function and use scope to the embodiment of the present invention Bring any restrictions.

As shown in figure 9, computer system 900 includes central processing unit (CPU) 901, it can be read-only according to being stored in Program in memory (ROM) 902 or be loaded into the program in random access storage device (RAM) 903 from storage section 908 and Execute various movements appropriate and processing.In RAM 903, also it is stored with system 900 and operates required various programs and data. CPU 901, ROM 902 and RAM 903 are connected with each other by bus 904.Input/output (I/O) interface 905 is also connected to always Line 904.

I/O interface 905 is connected to lower component: the importation 906 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 907 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 908 including hard disk etc.； And the communications portion 909 of the network interface card including LAN card, modem etc..Communications portion 909 via such as because The network of spy's net executes communication process.Driver 910 is also connected to I/O interface 905 as needed.Detachable media 911, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 910, in order to read from thereon Computer program be mounted into storage section 908 as needed.

Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.? In such embodiment, which can be downloaded and installed from network by communications portion 909, and/or from can Medium 911 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 901, system of the invention is executed The above-mentioned function of middle restriction.

It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.

Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.

Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet It includes sending module, obtain module, determining module and first processing module.Wherein, the title of these modules is under certain conditions simultaneously The restriction to the module itself is not constituted, for example, sending module is also described as " sending picture to the server-side connected The unit of acquisition request ".

As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be Included in equipment described in above-described embodiment；It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes Obtain the equipment:

In the image with structured text, the table in described image is positioned；

The candidate text box of the one or more in the table is identified using text detection model, wherein the text inspection Surveying model is based on single phase more frame detector models, using one or more features figure and elder generation corresponding with the characteristic pattern Model obtained from frame is tested, the priori frame is defined by priori frame ratio (scale) and priori frame length-width ratio；

Merge one or more of candidate text boxes, to identify the text in the text box after merging.

Technical solution according to an embodiment of the present invention, because surrounding text determining for the image with structured text Table before this text box, first in positioning image；Then known using the priori frame with special ratios and length-width ratio Candidate text box in other table；And for the candidate text box recognized, determined in a manner of merging wherein containing text This text box, allows to fast and effeciently determine the text box in the image with structured text, to improve pair The accuracy rate and efficiency of the text identification of this kind of image.

Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention Within.

Claims

1. a kind of text recognition method characterized by comprising

The candidate text box of the one or more in the table is identified using text detection model, wherein the text detection mould Type is based on single phase more frame detector models, using one or more features figure and priori frame corresponding with the characteristic pattern Obtained from model, the priori frame is defined by priori frame ratio (scale) and priori frame length-width ratio；

Merge one or more of candidate text boxes,

To identify the text in the text box after merging.

2. text recognition method according to claim 1, which is characterized in that described to identify the table using text detection model Before the candidate text box of one or more in lattice, further includes:

According to the length of training image, the width of the training image, the priori frame length-width ratio and the priori frame ratio, meter Calculate the length and width of the priori frame；

The training image is inputted into single phase more frame detector models, utilizes the text in the priori frame and the training image This external frame of minimum is matched, and training obtains the text detection model.

3. text recognition method according to claim 1 or claim 2, which is characterized in that the basic network of the text detection model For preceding A layers of convolutional layer in convolutional neural networks VGG16, A is greater than the integer equal to 4 and less than or equal to 15.

4. text recognition method according to claim 1, which is characterized in that the priori frame length-width ratio includes 3,5,7,10 and One or more of 15.

5. text recognition method according to claim 1, which is characterized in that it is described in the image with structured text, Position the table in described image, comprising:

The edge of image after determining binary conversion treatment using edge detection algorithm；

The table in described image is obtained by filtration by Hough transformation for edge based on described image.

6. a kind of text identification device characterized by comprising

Table recognition module, for positioning the table in described image in the image with structured text；

Text box detection module, for identifying the candidate text box of the one or more in the table using text detection model, Wherein, the text detection model be based on single phase more frame detector models, using one or more features figure and with institute Model obtained from the corresponding priori frame of characteristic pattern is stated, the priori frame is by priori frame ratio (scale) and priori frame length and width Than defining；

Text box determining module, for merging one or more of candidate text boxes, to identify in the text box after merging Text.

7. text identification device according to claim 6, which is characterized in that described device further includes model training module:

The model training module according to the length of training image, the width of the training image, the priori frame length-width ratio and The priori frame ratio calculates the length and width of the priori frame；

8. text identification device according to claim 6, which is characterized in that the basic network of the text detection model is volume Preceding A layers of convolutional layer in product neural network VGG16, A are greater than the integer equal to 4 and less than or equal to 15.

9. text identification device according to claim 6, which is characterized in that the priori frame length-width ratio includes 3,5,7,10 and One or more of 15.

10. text identification device according to claim 6, which is characterized in that further include:

The Table recognition module determines the edge of the image after binary conversion treatment using edge detection algorithm；

The edge of the Table recognition module based on described image, the table in described image is obtained by filtration by Hough transformation.

11. a kind of electronic equipment for text identification characterized by comprising

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 5.

12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor Such as method as claimed in any one of claims 1 to 5 is realized when row.