CN110263779A

CN110263779A - Text filed detection method and device, Method for text detection, computer-readable medium

Info

Publication number: CN110263779A
Application number: CN201810225220.2A
Authority: CN
Inventors: 刘铭
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-03-19
Filing date: 2018-03-19
Publication date: 2019-09-20

Abstract

The present invention relates to field of computer technology, a kind of text filed detection method and device, Method for text detection, computer-readable medium are provided, text method for detecting area includes: to extract the feature of an original image to obtain a characteristic spectrum；Text filed detection is carried out based on the characteristic spectrum, obtains multiple text filed fragments；Obtain the sequence information of multiple text filed fragments, and according to the sequence information by multiple text filed crumb forms at text filed chip sequence；The text filed chip sequence polymerization is obtained text filed.Present invention reduces the false detection rate of text detection and omission factors, improve accuracy and the Feasible degree of text detection.

Description

Text filed detection method and device, Method for text detection, computer-readable medium

Technical field

The present invention relates to field of computer technology, in particular to a kind of text filed detection method and device, text This detection method, computer-readable medium.

Background technique

With the rapid development of internet technology and the rapid proliferation of smart phone, make our life more and more just It is prompt.Usually we will use digital camera, camera or mobile phone photograph and uploaded material (such as identity card, business license, head portrait Deng), for identity, the qualification of operator verifying user, but since background of taking pictures under natural scene is complicated, environmental disturbances factor More, the text in picture is difficult to distinguish with background, and the case where be at least partially obscured in photo there is also text, this is to text This detection causes very big challenge.In order to identify the text in natural scene image, many OCR of expert design (Optical Character Recognition, optical character identification) character recognition system, these systems are in document Text usually has preferable detection effect, but poor for the text detection effect in scene image.This is because scene figure As the variation multiplicity of text, and image background is also relative complex, is difficult directly to identify by OCR software.Therefore String localization It is the first step for understanding scene image text, Classification and Identification further will could be carried out to content of text after String localization.

It should be noted that information is only used for reinforcing to background of the invention disclosed in above-mentioned background technology part Understand, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.

Summary of the invention

It can the purpose of the present invention is to provide a kind of text filed detection method and device, Method for text detection, computer Medium is read, and then reduces the false detection rate and omission factor of text detection, improves accuracy and the Feasible degree of text detection.

Other characteristics and advantages of the invention will be apparent from by the following detailed description, or partially by this hair Bright practice and acquistion.

According to the first aspect of the invention, a kind of text filed detection method is provided characterized by comprising extract one The feature of original image obtains a characteristic spectrum；Text filed detection is carried out based on the characteristic spectrum, obtains multiple text areas Domain fragment；Obtain the sequence information of multiple text filed fragments, and according to the sequence information by multiple text areas Domain crumb form is at text filed chip sequence；The text filed chip sequence polymerization is obtained text filed.

According to the second aspect of the invention, a kind of Method for text detection is provided characterized by comprising according to above-mentioned Text filed detection method obtains described text filed；Identify it is described it is text filed in text information, obtain text.

According to the third aspect of the invention we, a kind of text filed detection device is provided characterized by comprising feature mentions Modulus block, the feature for extracting an original image obtain a characteristic spectrum；Fragment obtains module, for being based on the feature Map carries out text filed detection, obtains multiple text filed fragments；Sequence information obtains module, multiple described for obtaining The sequence information of text filed fragment, and according to the sequence information by multiple text filed crumb forms at text filed Chip sequence；Aggregation module, it is text filed for obtaining the text filed chip sequence polymerization.

In some embodiments of the invention, aforementioned schemes are based on, characteristic extracting module of the invention includes: convolution list Member obtains the characteristic spectrum for carrying out multistage convolution to the original image by residual error network model.

In some embodiments of the invention, aforementioned schemes are based on, it includes: that anchor point is set that fragment of the invention, which obtains module, Unit is set, for one group of anchor point to be arranged in each pixel of the characteristic spectrum；Feature extraction unit, for passing through sliding window The corresponding characteristics of image of the anchor point is extracted, multiple text filed fragments are generated.

In some embodiments of the invention, aforementioned schemes are based on, anchor point setting unit of the invention includes: that width is set Order member, for the fixed anchor point of one group of width to be arranged in each pixel of the characteristic spectrum.

In some embodiments of the invention, aforementioned schemes are based on, it includes: sequence that sequence information of the invention, which obtains module, Column information extraction unit, for the text filed fragment to be input to a shot and long term memory models, to obtain the text area The sequence information of domain fragment, and according to the sequence information by multiple text filed crumb forms at text filed chip sequence.

In some embodiments of the invention, aforementioned schemes are based on, shot and long term memory network is that two-way shot and long term remembers mould Type, including it is preceding to shot and long term memory models and backward shot and long term memory models.

In some embodiments of the invention, aforementioned schemes are based on, before sequence information extraction unit of the invention includes: To sequence extraction unit, for learning the letter above of each text filed fragment by the forward direction shot and long term memory models Breath；Backward sequence extraction unit, for being learnt under each text filed fragment by the backward shot and long term memory models Literary information；Combining unit forms the text filed chip sequence for merging learning outcome series connection.

In some embodiments of the invention, aforementioned schemes are based on, aggregation module of the invention includes: selecting unit, is used In choosing multiple text filed fragments, wherein the spacing of the adjacent two text filed fragment is less than a set distance, and Longitudinal overlap ratio is greater than a setting value；Connection unit, for multiple text filed fragment connections to be obtained the text area Domain.

In some embodiments of the invention, aforementioned schemes, text filed detection device of the invention are based on further include: Sample acquisition module, for obtaining training sample；Model training module, for according to the training sample and different study Rate carries out machine training to the residual error network model and the two-way shot and long term memory models；Judging unit, for when described When the loss function minimum of residual error network model and the two-way shot and long term memory models, terminate machine training.

In some embodiments of the invention, aforementioned schemes, text filed detection device of the invention are based on further include: Feature Mapping module, for the text filed chip sequence to be mapped to full articulamentum, to predict the text of text filed fragment This confidence level and text position.

In some embodiments of the invention, aforementioned schemes, text filed detection device of the invention are based on further include: Correction module, for according to the opposite position between the text filed predicted boundary and the text filed boundary true value It moves and corrects the text filed boundary error.

In some embodiments of the invention, aforementioned schemes are based on, it includes: that screening is single that fragment of the invention, which obtains module, Member, for carrying out non-maxima suppression to the text filed fragment, to obtain multiple text confidence levels greater than a setting value Text filed fragment.

According to the fourth aspect of the invention, a kind of computer-readable medium is provided, computer program is stored thereon with, Such as above-mentioned text filed detection method as described in the examples and text detection side are realized when described program is executed by processor Method.

According to the text filed detection method in this example embodiment, server receives an original image, and extracts the original The feature of beginning image obtains a characteristic spectrum；Text filed detection is carried out based on characteristic spectrum, is obtained multiple text filed broken Piece；Then the sequence information of multiple text filed fragments is obtained, and according to sequence information by multiple text filed crumb forms At text filed chip sequence；Finally the polymerization of text filed chip sequence is obtained text filed.Text area through the invention Omission factor and false detection rate when area detecting method can reduce text detection, avoiding among text has space that can be accidentally divided into The problem of two words, improves accuracy and the Feasible degree of text detection.

The present invention is it should be understood that above general description and following detailed description is only exemplary and explanatory , the present invention can not be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets reality of the invention Example is applied, and is used to explain the principle of the present invention together with specification.It should be evident that the accompanying drawings in the following description is only this Some embodiments of invention without creative efforts, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.

Fig. 1 is shown can be using the verification method or network data request that the network data of the embodiment of the present invention is requested Verify the schematic diagram of the exemplary system architecture of device；

Fig. 2 shows the structural schematic diagrams of the computer system of the electronic equipment suitable for being used to realize the embodiment of the present invention；

Fig. 3 shows the method flow diagram of text detection in the related technology；

Fig. 4 shows the method flow diagram of text filed detection in one embodiment of the invention；

Fig. 5 shows the block schematic illustration of text detection model in one embodiment of the invention；

Fig. 6 shows forming method flow chart text filed in one embodiment of the invention；

Fig. 7 shows the stream that in one embodiment of the invention nameplate, identity card, business card or advertising pictures are carried out with text detection Cheng Tu；

Fig. 8 shows the method flow diagram for carrying out text detection in one embodiment of the invention to business card；

Fig. 9 shows the structural schematic diagram of text filed detection device in one embodiment of the invention.

Specific embodiment

Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein；On the contrary, thesing embodiments are provided so that the present invention will more Add fully and completely, and the design of example embodiment is comprehensively communicated to those skilled in the art.

In addition, described feature, structure or characteristic can be incorporated in one or more implementations in any suitable manner In example.In the following description, many details are provided to provide and fully understand to the embodiment of the present invention.However, It will be appreciated by persons skilled in the art that technical solution of the present invention can be practiced without one or more in specific detail, Or it can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes known side Method, device, realization or operation are to avoid fuzzy each aspect of the present invention.

Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity. I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.

Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step, It nor is it necessary that and executed by described sequence.For example, some operation/steps can also decompose, and some operation/steps can To merge or partially merge, therefore the sequence actually executed is possible to change according to the actual situation.

Fig. 1 shows the text filed detection method and device, Method for text detection that can apply the embodiment of the present invention The schematic diagram of exemplary system architecture 100.

As shown in Figure 1, system architecture 100 may include terminal device 101, network 102 and server 103.Network 102 To provide the medium of communication link between terminal device 101 and server 103.Network 102 may include various connection classes Type, such as wired, wireless communication link or fiber optic cables etc..

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.For example server 103 can be multiple server compositions Server cluster etc..

User can be used terminal device 101 and be interacted by network 102 with server 103, to receive or send picture number According to etc..Terminal device 101 can be the various electronic equipments with display screen, including but not limited to smart phone, plate electricity Brain, portable computer and desktop computer etc..

Server 103 can be to provide the server of various services.Such as 103 receiving terminal apparatus 101 of server is sent An original image, and extract the original image feature obtain a characteristic spectrum；Text filed inspection is carried out based on characteristic spectrum It surveys, obtains multiple text filed fragments；Multiple text filed fragments are input to shot and long term memory models, obtain multiple texts The sequence information of region fragment, and according to sequence information by multiple text filed crumb forms at text filed chip sequence； The polymerization of text filed chip sequence is finally obtained to omission factor and false detection rate text filed, when can reduce text detection, Improve accuracy and the Feasible degree of text detection.

Fig. 2 shows the structures of the computer system of the electronic equipment suitable for being used to realize the embodiment in the present invention to show It is intended to.

It should be noted that Fig. 2 shows the computer system 200 of electronic equipment be only an example, should not be to this hair The function and use scope of bright embodiment bring any restrictions.

As shown in Fig. 2, computer system 200 includes central processing unit (CPU) 201, it can be read-only according to being stored in Program in memory (ROM) 202 is loaded into the program in random access storage device (RAM) 203 from storage section 208 And execute various movements appropriate and processing.In RAM 203, it is also stored with various programs and data needed for system operatio. CPU 201, ROM 202 and RAM 203 are connected with each other by bus 204.Input/output (I/O) interface 205 is also connected to Bus 204.

I/O interface 205 is connected to lower component: the importation 206 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 207 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section including hard disk etc. 208；And the communications portion 209 of the network interface card including LAN card, modem etc..Communications portion 209 via The network of such as internet executes communication process.Driver 210 is also connected to I/O interface 205 as needed.Detachable media 211, such as disk, CD, magneto-optic disk, semiconductor memory etc., are mounted on as needed on driver 210, in order to from The computer program read thereon is mounted into storage section 208 as needed.

Particularly, according to an embodiment of the invention, may be implemented as computer below with reference to the process of flow chart description Software program.For example, the embodiment of the present invention includes a kind of computer program product comprising be carried on computer-readable Jie Computer program in matter, the computer program include the program code for method shown in execution flow chart.Such In embodiment, which can be downloaded and installed from network by communications portion 209, and/or is situated between from detachable Matter 211 is mounted.When the computer program is executed by central processing unit (CPU) 201, executes in the system of the application and limit Fixed various functions.

It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or Computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- But be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above group It closes.The more specific example of computer readable storage medium can include but is not limited to: have being electrically connected for one or more conducting wires It connects, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type programmable Reading memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic are deposited Memory device or above-mentioned any appropriate combination.In the present invention, computer readable storage medium, which can be, any includes Or the tangible medium of storage program, which can be commanded execution system, device or device use or in connection make With.And in the present invention, computer-readable signal media may include propagating in a base band or as carrier wave a part Data-signal, wherein carrying computer-readable program code.The data-signal of this propagation can use a variety of shapes Formula, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media is also It can be any computer-readable medium other than computer readable storage medium, which can send, pass It broadcasts or transmits for by the use of instruction execution system, device or device or program in connection.Computer can The program code for reading to include on medium can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can be with Represent a part of a module, program segment or code, a part of above-mentioned module, program segment or code include one or Multiple executable instructions for implementing the specified logical function.It should also be noted that in some implementations as replacements, side The function of being marked in frame can also occur in a different order than that indicated in the drawings.For example, two succeedingly indicate Box can actually be basically executed in parallel, they can also be executed in the opposite order sometimes, this is according to related function Depending on.It is also noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, it can To be realized with the dedicated hardware based system for executing defined functions or operations, or specialized hardware and meter can be used The combination of calculation machine instruction is realized.

Being described in unit involved in the embodiment of the present invention can be realized by way of software, can also be passed through The mode of hardware realizes that described unit also can be set in the processor.Wherein, the title of these units is at certain In the case of do not constitute restriction to the unit itself.

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in electronic equipment described in above-described embodiment；It is also possible to individualism, and without the supplying electronic equipment In.Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by the electricity When sub- equipment executes, so that method described in electronic equipment realization as the following examples.For example, the electronic equipment It may be implemented such as Fig. 4-each step shown in Fig. 8.

This field in the related technology, generally use based on depth convolutional neural networks carry out text detection, commonly Depth convolutional neural networks model has CNN, R-CNN, Fast RCNN, Faster RCNN etc., below with Faster RCNN model Be illustrated for text detection.

Fig. 3 shows the flow chart that Faster RCNN model carries out text detection, as shown in figure 3, in step S301, Multistage convolution is carried out to input picture, carries out convolutional layer feature extraction；In step s 302, the feature of extraction is passed through into one The convolution operation of step extracts global convolution feature, forms image overall feature；In step S303, the feature of extraction is used In the input of Area generation network, Area generation network can carry out rough text classification and positioning to the feature of input, generate Candidate target region；In step s 304, image overall feature and candidate target region are input into interest pool area together Change layer, this layer mainly arrives the input feature vector dimension pondization of variation and the subsequent complete matched dimension of articulamentum；In step S305 In, target category prediction is carried out according to the feature by pond and target position returns, wherein target category prediction is for predicting Whether feature is text, and target position is returned for predicting that position in the picture occurs in text.

But Method for text detection in the related technology has the disadvantage in that (1) does not account for text and common object Physical examination error of measurement is different.Object usually has complete closed boundary, but text does not have this condition, text shape border generally It is to change with stroke, and might have space among the same word, inside usual object detection task, there is space Object in two regions can be considered as two different objects；(2) it is successful to measure detection for common object detection algorithms Standard is to calculate the lap between the text box predicted and the text box marked in advance to account for the ratio of the two union to weigh Amount, it is generally recognized that it is exactly correct detection that this ratio, which is greater than 0.5, this is because can be speculated according to a part of object The object category out.But this hypothesis is difficult to meet in text detection, because text is generally smaller, and some texts The partial region of word is similar, so being difficult which class determination should be categorized on earth if not seeing whole texts Not, therefore it is required that the text location more refined.

The problem of for practical application, provides firstly a kind of text filed inspection in an embodiment of the present invention Survey method, with to there are the problem of optimize processing, with specific reference to shown in Fig. 4, text filed detection method is suitable for aforementioned The electronic equipment in embodiment, and at least include the following steps, specifically:

Step S410: the feature for extracting an original image obtains a characteristic spectrum；

Step S420: text filed detection is carried out based on the characteristic spectrum, obtains multiple text filed fragments；

Step S430: the sequence information of multiple text filed fragments is obtained, and will be multiple according to the sequence information The text filed crumb form is at text filed chip sequence；

Step S440: the text filed chip sequence polymerization is obtained text filed.

The original that terminal device 101 is sent according to the text filed detection method in this example embodiment, server 103 Beginning image carries out feature extraction and obtains a characteristic spectrum；Text filed detection is carried out based on characteristic spectrum, obtains multiple text areas Domain fragment, and according to the sequence information of multiple text filed fragments by multiple text filed crumb forms at text filed broken Piece sequence；It is finally that the polymerization acquisition of text filed chip sequence is text filed, it can reduce omission factor and mistake when text detection Inspection rate improves accuracy and the Feasible degree of text detection.

In the following, the text filed detection method in this example embodiment is further detailed.

In step S410, the feature for extracting an original image obtains a characteristic spectrum.

In this exemplary embodiment, Fig. 5 shows the frame diagram of text filed detection, referring to Figure 5, terminal device 101 send an original image to server 103, and server 103 carries out feature extraction acquisition to it after receiving the original image One characteristic spectrum Res4f.Terminal device 101 can take pictures to the object comprising text information by included camera Obtain original image；Can also obtain from other image acquisition equipments (such as video camera, DV machine, camera) includes text The original image of information.Meanwhile original image can be nameplate, identity document, business card, billboard etc. with text information Image.

In this exemplary embodiment, after server 103 receives original image, by image characteristics extraction device to original Image carries out feature extraction.The image characteristics extraction device can be VGG19, VGG16, ResNet50, ResNet101, Inception V3, Xception etc., in order to make it easy to understand, the present invention carries out feature to original image using ResNet50 It extracts, as shown in figure 5, obtaining characteristic spectrum Res4f according to the feature of extraction, characteristic spectrum Res4f is the global convolution extracted Feature.

In the step s 420, word area detection is carried out based on the characteristic spectrum, obtains multiple text filed fragments.

In this exemplary embodiment, appropriately sized sliding window can be used sliding on characteristic spectrum Res4f with certain stride It is dynamic, it detected to text filed, extract text filed information, to obtain multiple text filed fragments.The size of sliding window can be with It is set according to actual needs, such as the sliding window of 3 × 3,5 × 5 scales can be used, such as use scale for 3 in the present invention × 3 sliding window carries out feature extraction to characteristic spectrum Res4f, obtains multiple text filed fragments.By sliding window in characteristic spectrum Text filed feature is extracted in sliding on Res4f, can be saved sliding on the original image and be extracted text filed feature bring Compute repeatedly problem, and characteristic spectrum Res4f has the diminution of larger multiple relative to original image dimension, therefore into one Step reduces time and the computing resource of sliding window operation consumption.

In this exemplary embodiment, one group of anchor point, every group of anchor can be set in each pixel of characteristic spectrum Res4f Point includes K anchor point (K is positive integer), those anchor points can unknown shape on Cover Characteristics map Res4f as much as possible, The corresponding characteristics of image of each anchor point is extracted by sliding sliding window, generates multiple text filed fragments.Since text is than other mesh Mark is difficult to be distinguished, and text detection is usually carried out as unit of word or line of text, when directly detecting word or line of text, If character is made of the segment separated, the portion that will belong to the same word or line of text when detection may cause Missing inspection or detection are given into two words or line of text, influences subsequent text identification result.Therefore in order to handle actual field The changeable target of shape under scape, while reducing the search space of model optimization, the present invention fixed using width, alterable height Anchor point, for example, it can be set to the width of anchor point be 16, height value be 7,11,18,25,35,56,67,88,100,168, 278 }, in small-scale horizontal zone, the variation of text is little, and biggish space will not occurs, when the width of anchor point When degree is set as 16, accuracy and the Feasible degree of text detection can be improved.Certainly, the width of anchor point and/or height can also Think other values, details are not described herein again.

In this exemplary embodiment, the characteristic spectrum Res4f of the original image given for one, acquisition has W × H × C Feature Mapping, wherein W × H indicates that the width and height of characteristic spectrum Res4f, C indicate the port number of characteristic spectrum Res4f. When densely being slided on characteristic spectrum Res4f using 3 × 3 sliding window window, each sliding window can use 3 × 3 × C Convolution feature goes to give a forecast.In each prediction, the position of the horizontal position of sliding window and K anchor point be it is fixed, thus may be used The predicted value on characteristic spectrum Res4f to be mapped to the coordinate of original image.

Further, on each position of characteristic spectrum Res4f, detector can all provide text/non-textual and set The vertical coordinate of the score of reliability and K anchor point, and the text filed fragment that each anchor point detects can be carried out non- Maximum inhibits, and only retains the candidate regions that text/non-textual confidence level is greater than a setting value.Such as text/non-text can be retained This confidence level is greater than 0.7 candidate regions, naturally it is also possible to other text/non-textual confidence levels is set, to obtain text area Domain, details are not described herein.

In step S430, the sequence information of multiple text filed fragments is obtained, and will according to the sequence information Multiple text filed crumb forms are at text filed chip sequence.

In this exemplary embodiment, since the text filed fragment obtained in step S420 is independent prediction, hold Erroneous detection is easily caused, for example window, brick, fence or leaf are identified as text, while may also can lose some confidence levels Low text debris plume is more unintelligible etc. than fading if any shadow occlusion, text itself.But it can in conjunction with global scene information To reduce this kind of erroneous detection or missing inspection, and a word or sentence are usually made of multiple characters, therefore multiple text filed Fragment has very strong sequence characteristic.Multiple text filed fragments can be input to a shot and long term memory models, to obtain The sequence information of multiple text filed fragments, then according to the sequence information by multiple text filed crumb forms at text filed Chip sequence.

In this exemplary embodiment, shot and long term memory models can be two-way shot and long term memory models, including preceding to length Phase memory models and backward shot and long term memory models, do Series Modeling for the text information to the left and right sides.Can wherein it lead to The information above of too long each text filed fragment of short-term memory model learning；Learnt by the backward shot and long term memory models The context information of each text filed fragment；Then learning outcome series connection is merged can be obtained text filed chip sequence.

In this exemplary embodiment, shot and long term memory models can be updated periodically its internal state H_t, expression formula As shown in formula (1):

Wherein,It is activation primitive, generally uses Sigmoid function, shown in expression formula such as formula (2):

Wherein, X_t∈R^3×3×CIt is the convolution results that t-th of (3 × 3) sliding window generates on characteristic spectrum Res4f, H_t-1Table Show the internal state of t-1 moment shot and long term memory models.

In this exemplary embodiment, each shot and long term memory models are 128 dimensions in two-way shot and long term memory models, i.e., The dimension of two-way shot and long term memory models internal state is 256 dimensions, that is to say, that H_t∈R²⁵⁶.It can certainly be according to practical need Change the dimension of shot and long term memory models, such as can be set to 512 dimensions, 1024 dimensions, the present invention does not do specific limit to this It is fixed.In addition, the depth of shot and long term memory models can be increased to 3 layers to improve the extraction efficiency of sequence information and precision Or 4 layers, principle is identical as the principle of two-way shot and long term memory models, and details are not described herein.

In step S440, the text filed chip sequence polymerization is obtained text filed.

In this exemplary embodiment, after obtaining text filed chip sequence, text filed chip sequence can be carried out Polymerization forms text filed.Fig. 6 shows the flow chart of text filed chip sequence polymerization, as shown in fig. 6, in step S601 In, a text filed fragment is chosen in multiple text filed chip sequences as target candidate area B_i(i is positive integer)； In step S602, distance objective candidate regions B is found_iNearest candidate regions B_j(j ≠ i, and j is positive integer), candidate regions B_jWith Target candidate area B_iDistance less than 50, longitudinal overlap ratio be greater than 0.7；In step S603, recurrence uses above-mentioned rule, Qualified neighboring candidate area is linked to be a line of text.

It will be appreciated by those skilled in the art that the distance between above-mentioned candidate regions, less than 50, longitudinal overlap ratio is greater than 0.7 is merely illustrative, and also can choose other suitable distances and longitudinal overlap ratio, details are not described herein.

The present invention uses, and there is width to fix, the anchor point of alterable height carries out feature extraction to characteristic spectrum, generate text Region fragment；Then the sequence information of text filed fragment is extracted by two-way shot and long term memory models, is formed text filed broken Piece sequence；Finally the polymerization of text filed chip sequence is obtained text filed.Since the horizontal direction value of anchor point is fixed, institute Need to only be returned to vertical direction, compared with existing Method for text detection needs to predict 4 coordinates of target, can subtract The search space of few model optimization, reduces the learning difficulty of entire text detection model, is convenient at the same time and calculates Better model is trained under resource.

It further, can be by text by the coordinate of text block in the detected text filed chip sequence of anchor point The height of region fragment and its centre coordinate determine.The height and central point vertical component predicted value and true value of text block Calculation formula is as follows:

Wherein, ν_cIndicate the predicted value of the regressive object of text block central point vertical component；c_yIndicate text filed fragment The vertical component of central point；Indicate the vertical component of corresponding anchor point centre coordinate； h^aIndicate the height of corresponding anchor point；ν_h Indicate the predicted value of the height regressive object of text block；H indicates the height of text filed fragment；Indicate text block central point The true value of the regressive object of vertical component；Indicate the true value of text block central point vertical component；Indicate the height of text block The true value of regressive object；h^*Indicate the true value of the height of text block.

In this exemplary embodiment, the internal state H of shot and long term memory models_tIt can be mapped to and next connect entirely Layer and output layer are connect, for predicting the text confidence level and text position of text filed chip sequence, as shown in figure 4, can adopt 2K vertical coordinate offset, 2K text confidence level and 1K line of text horizontal boundary offset are predicted with mapping result, and wherein K is Anchor point number in characteristic spectrum Res4f in each pixel.

In this exemplary embodiment, shot and long term memory models have been integrated into entire text detection model by the present invention, because This can train shot and long term memory models together with image characteristics extraction device.The training method of model may is that be schemed first As keeping the ratio of width to height that can zoom to 600 the most short side of original image, the layer in shot and long term memory models can in processor To use the random numbers of Gaussian distribution that mean value is 0, variance 0.01, standard deviation are 0.1 to be initialized；Then to model into The optimization of row stochastic gradient descent, makes model have the smallest loss function.When stochastic gradient descent optimizes, the potential energy item of model For 0.9, weight 0.0005, the number of samples of each trained batch is 128, wherein the ratio of positive negative sample is 1:1, is passed through Initial learning rate is set as 0.001, learning rate can be down to 0.0001 after 90000 iteration of training, then retraining 10000 Secondary iteration.The objective function of model optimization is as follows:

Wherein, L (s_i,s_j,o_k) indicate global optimization objective function；Respectively indicate text classification, text This positioning, boundary optimize the loss function of task；s_iIndicate that i-th of anchor point is predicted to be the probability of text；Indicate i-th of anchor Point whether be text true value；ν_jIndicate j-th of anchor point vertical direction coordinate predicted value；Indicate j-th of anchor point vertical direction The true value of coordinate；o_kIndicate the horizontal offset predicted value of kth boundary anchor point retive boundary；Indicate k-th of boundary anchor point The horizontal offset true value of retive boundary；θ₁And θ₂Respectively indicate the loss weight of String localization task, boundary optimization task； N_s、N_ν、N_oRespectively indicate text classification in each trained batch, String localization, the anchor point number used of boundary optimization task.

It in this exemplary embodiment, can be to text area after the polymerization acquisition of text filed chip sequence is text filed The boundary in domain is modified.Because the width for manually setting anchor point is 16 or other fixed values, but not all text filed Width be all the multiple of 16 or other fixed values, so can have a boundary inaccuracy of text filed horizontal direction positioning Problem, thus need to be modified text filed boundary.

Further, the modified method in boundary can be by between predicted boundary anchor point and the boundary of text filed true value Relative displacement correct error, the calculation formula of horizontal offset are as follows:

Wherein, O indicates the horizontal direction offset regressive object of prediction；x_sideIndicate that the text block currently segmented is opposite In the predicted value of the left side offset of original non-cutting text block；Indicate the central point horizontal component of corresponding anchor point；w^aTable Show the fixed width of current anchor (such as: 16)；O^*Indicate the text block currently segmented relative to inclined on the left of former non-cutting text block The regressive object true value of shifting amount；Indicate offset of the text block currently segmented relative to the left side of original non-cutting text block The true value of amount.

Text filed right side offset can also be modified using above-mentioned modification method.By correcting step above After rapid, the text filed character area that can relatively accurately orient in image of prediction.

In an embodiment of the present invention, a kind of Method for text detection is additionally provided, the specific method is as follows, according to the present invention Text filed detection method obtain it is text filed；Then the text information in text filed is identified, to obtain text This.

Method for text detection in the present invention can be used for in the objects such as nameplate, identity card, business card, advertising pictures Text is detected, and Fig. 7 is shown using the text detection model in the present invention to nameplate, identity card, business card or advertising pictures In the flow chart that is detected of text, pass through upper left corner bottom right angular coordinate and confidence level to text each in image and carry out Detection is to identify text.

Further to help to understand, the present invention is illustrated for detecting to the text in business card, and Fig. 8 is shown The method flow diagram of text detection is carried out to business card, as shown in figure 8, in step S801, to the line of text region in business card It is detected.Used detection method is text filed detection method of the invention, to determine the line of text area in business card Domain.In step S802, the text information in line of text region is identified, forms text.Specifically, it is determined that line of text Behind region, the text in each line of text region is identified one by one, such as a line of text region includes N (N is positive integer) A text, can upper left corner bottom right angular coordinate to first text and confidence level detected to determine first text, connect The upper left corner bottom right angular coordinate and confidence level of second text are detected to determine second text, repeat above-mentioned step Suddenly, until completing the detection to n-th text；It is last that text is obtained according to the text identified.

The device of the invention embodiment introduced below can be used for executing the above-mentioned text filed detection method of the present invention. For undisclosed details in apparatus of the present invention embodiment, the embodiment of the above-mentioned form validation method of the present invention is please referred to.

Fig. 9 shows a kind of structural schematic diagram of text filed detection device.Referring to shown in Fig. 9, text filed detection 900 may include: characteristic extracting module 901, fragment obtains module 902, sequence information obtains module 903, aggregation module 904.

Specifically, characteristic extracting module 901, the feature for extracting an original image obtain a characteristic spectrum；Fragment obtains Modulus block 902 obtains multiple text filed fragments for carrying out word area detection based on the characteristic spectrum；Sequence information Module 903 is obtained, for obtaining the sequence information of multiple text filed fragments, and will be multiple according to the sequence information The text filed crumb form is at text filed chip sequence；Aggregation module 904 is used for the text filed chip sequence Polymerization obtains text filed.

In this exemplary embodiment, characteristic extracting module 901 includes convolution unit 9011, for passing through residual error network mould Type carries out multistage convolution to the original image and obtains the characteristic spectrum.

In this exemplary embodiment, it includes anchor point setting unit 9021 and feature extraction unit that fragment, which obtains module 902, 9022。

Specifically, anchor point setting unit 9021, for one group of anchor point to be arranged in each pixel of the characteristic spectrum； Feature extraction unit 9022 generates multiple described text filed for extracting the corresponding characteristics of image of the anchor point by sliding window Fragment.

Further, anchor point setting unit 9021 includes width setup unit 90211, in the characteristic spectrum The fixed anchor point of one group of width is set in each pixel.

In this exemplary embodiment, it includes sequence information extraction unit 9023 that sequence information, which obtains module 902, and being used for will The text filed fragment is input to a shot and long term memory models, to obtain the sequence information of the text filed fragment, and root According to the sequence information by multiple text filed crumb forms at text filed chip sequence.

In this exemplary embodiment, the shot and long term memory network is two-way shot and long term memory models, including preceding to length Phase memory models and backward shot and long term memory models.

Further, sequence information extraction unit 9023 includes forward sequence extraction unit 90231, backward sequential extraction procedures Unit 90232 and combining unit 90233.

Specifically, forward sequence extraction unit 90231, for learning each institute by the forward direction shot and long term memory models State the information above of text filed fragment；Backward sequence extraction unit 90232, for remembering mould by the backward shot and long term Type learns the context information of each text filed fragment；Combining unit 90233 is formed for learning outcome to connect to merge The text filed chip sequence.

In this exemplary embodiment, aggregation module 904 includes selecting unit 9041 and connection unit 9042.

Specifically, selecting unit 9041, for choosing multiple text filed fragments, adjacent is described text filed At a distance of a set distance between fragment, and longitudinal overlap ratio is greater than a setting value；Connection unit 9042, being used for will be multiple described Text filed fragment connection obtains described text filed

In this exemplary embodiment, text filed detection device 900 further includes sample acquisition module 905, model training mould Block 906 and judgment module 907.

Specifically, sample acquisition module 905, for obtaining training sample；Model training module 906, for according to Training sample and different learning rates carry out machine instruction to the residual error network model and the two-way shot and long term memory models Practice；Judging unit 907, it is minimum for the loss function when the residual error network model and the two-way shot and long term memory models When, terminate machine training.

In this exemplary embodiment, text filed detection device 900 further includes mapping block 908, is used for the text Region chip sequence maps to full articulamentum, to predict the text confidence level and text position of text filed fragment.

In this exemplary embodiment, text filed detection device 900 further includes correction module 909, for according to the text The text filed boundary is corrected in relative displacement between the predicted boundary of one's respective area and the text filed boundary true value Error.

In this exemplary embodiment, it includes screening unit 9024 that fragment, which obtains module 902, for described text filed Fragment carries out non-maxima suppression, to obtain the text filed fragment that multiple text confidence levels are greater than a setting value.

It should be noted that although being referred to several modules or list of text filed detection device in the above detailed description Member, but this division is not enforceable.In fact, embodiment according to the present invention, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention Other embodiments.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes Or adaptive change follow general principle of the invention and including the present invention it is undocumented in the art known in often Knowledge or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by appended Claim point out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is only limited by the attached claims.

Claims

1. a kind of text filed detection method characterized by comprising

The feature for extracting an original image obtains a characteristic spectrum；

Text filed detection is carried out based on the characteristic spectrum, obtains multiple text filed fragments；

The sequence information of multiple text filed fragments is obtained, and will be multiple described text filed broken according to the sequence information Piece forms text filed chip sequence；

The text filed chip sequence polymerization is obtained text filed.

2. text filed detection method according to claim 1, which is characterized in that the feature for extracting an original image obtains One characteristic spectrum includes:

Multistage convolution is carried out to the original image by residual error network model and obtains the characteristic spectrum.

3. text filed detection method according to claim 1, which is characterized in that carry out text based on the characteristic spectrum Region detection, obtaining multiple text filed fragments includes:

One group of anchor point is set in each pixel of the characteristic spectrum；

The corresponding characteristics of image of the anchor point is extracted by sliding window, generates multiple text filed fragments.

4. text filed detection method according to claim 3, which is characterized in that in each pixel of the characteristic spectrum One group of anchor point of upper setting includes:

The fixed anchor point of one group of width is set in each pixel of the characteristic spectrum.

5. text filed detection method according to claim 1, which is characterized in that obtain multiple text filed fragments Sequence information, and include: at text filed chip sequence by multiple text filed crumb forms according to the sequence information

The text filed fragment is input to a shot and long term memory models, to obtain the sequence letter of the text filed fragment Breath, and according to the sequence information by multiple text filed crumb forms at text filed chip sequence.

6. text filed detection method according to claim 5, which is characterized in that the shot and long term memory network is two-way Shot and long term memory models, including it is preceding to shot and long term memory models and backward shot and long term memory models；By the text filed fragment A shot and long term memory models are input to, to obtain the sequence information of the text filed fragment, and will according to the sequence information Multiple text filed crumb forms include: at text filed chip sequence

Learn the information above of each text filed fragment by the forward direction shot and long term memory models；

Learn the context information of each text filed fragment by the backward shot and long term memory models；

Learning outcome is connected to merge and obtains the text filed chip sequence.

7. text filed detection method according to claim 1, which is characterized in that gather the text filed chip sequence It closes to obtain and described text filed includes:

Multiple text filed fragments are chosen, wherein the spacing of the adjacent two text filed fragment is less than a set distance, And longitudinal overlap ratio is greater than a setting value；

Multiple text filed fragment connections are obtained described text filed.

8. text filed detection method according to claim 6, which is characterized in that the text filed detection method is also wrapped It includes:

Obtain training sample；

According to the training sample and different learning rates to the residual error network model and the two-way shot and long term memory models Carry out machine training；

When the loss function minimum of the residual error network model and the two-way shot and long term memory models, terminate machine training.

9. text filed detection method according to claim 1, which is characterized in that the text filed detection method is also wrapped It includes:

The text filed chip sequence is mapped into full articulamentum, with predict the text filed fragment text confidence level and Text position.

10. text filed detection method according to claim 1, which is characterized in that the text filed detection method is also Include:

According to the relative displacement amendment between the text filed predicted boundary and the text filed boundary true value Text filed boundary error.

11. text filed detection method according to claim 1, which is characterized in that carry out text based on the characteristic spectrum One's respective area detection, obtaining multiple text filed fragments includes:

Non-maxima suppression is carried out to the text filed fragment, to obtain the text that multiple text confidence levels are greater than a setting value Region fragment.

12. a kind of Method for text detection characterized by comprising

According to claim 1, text filed detection method described in any one of -11 obtains described text filed；

Identify it is described it is text filed in text information, obtain text.

13. a kind of text filed detection device characterized by comprising

Characteristic extracting module, the feature for extracting an original image obtain a characteristic spectrum；

Fragment obtains module, for carrying out text filed detection based on the characteristic spectrum, obtains multiple text filed fragments；

Sequence information obtains module, believes for obtaining the sequence information of multiple text filed fragments, and according to the sequence Breath is by multiple text filed crumb forms at text filed chip sequence；

Aggregation module, it is text filed for obtaining the text filed chip sequence polymerization.

14. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor The text inspection as described in text filed detection method of any of claims 1-11 and claim 12 is realized when row Survey method.