CN110135423A

CN110135423A - The training method and optical character recognition method of text identification network

Info

Publication number: CN110135423A
Application number: CN201910431918.4A
Authority: CN
Inventors: 吴雨培; 黄耀
Original assignee: Beijing Achu Robot Technology Co Ltd
Current assignee: Beijing Achu Robot Technology Co Ltd
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2019-08-16

Abstract

The present invention provides the training methods and optical character recognition method of a kind of text identification network, are related to the technical field of text identification, this method comprises: carrying out feature extraction to training sample set, obtain multiple characteristic patterns；It includes the training image for marking the callout box of content of text and text position that training sample, which is concentrated,；Centered on each pixel in characteristic pattern, candidate frame collection is generated；Candidate frame collection includes at least one candidate frame；Rejecting is more than the corresponding first kind candidate frame of training image size, and is more than the default corresponding second class candidate frame of repetitive rate；According to the candidate frame collection after rejecting, training text identifies network.The present invention can improve training effectiveness in the case where guaranteeing accuracy.

Description

The training method and optical character recognition method of text identification network

Technical field

The present invention relates to text recognition technique fields, more particularly, to the training method and optics of a kind of text identification network Character identifying method.

Background technique

Optical character identification, which refers to, is identified the content of text in image simultaneously by optical device such as camera or scanner It is input in computer, avoids and artificially repeat inefficient labour, greatly improve working efficiency, have in industry and life It is widely applied.In present society life, optical character identification depends on traditional algorithm to the identification of image, in tradition In training process, extraction characteristics of image figure, recycling characteristic pattern are trained convolutional neural networks first.However, from image The characteristic pattern extracted includes the feature and the duplicate background of multiple possible contents of the duplicate target text of multiple possible contents Feature, also include the feature of target and background doping, the feature duplicate for content or with doping in training The effect and bad come is trained, causes training effectiveness relatively low.

Summary of the invention

In view of this, the purpose of the present invention is to provide the training method of text identification network and optical character identification sides The accuracy of recognition result can be improved in method.

In a first aspect, the embodiment of the invention provides a kind of training methods of text identification network, comprising:

Feature extraction is carried out to training sample set, obtains multiple characteristic patterns；It includes mark text that the training sample, which is concentrated, The training image of content and the callout box of text position；

Centered on each pixel in characteristic pattern, candidate frame collection is generated；The candidate frame collection includes at least one time Select frame；

Rejecting is more than the corresponding first kind candidate frame of the training image size, and is more than default repetitive rate corresponding second Class candidate frame；

According to the candidate frame collection after rejecting, training text identifies network.

With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein institute State the step of rejecting first kind candidate frame corresponding more than the training image size, comprising:

The callout box of each candidate frame training image corresponding with candidate frame is subjected to registration calculating；

It is carried out candidate frame of the registration calculated result between first threshold and second threshold as first kind candidate frame Rejecting processing.

With reference to first aspect, the embodiment of the invention provides second of possible embodiments of first aspect, wherein institute State the step of rejecting the second class candidate frame corresponding more than default repetitive rate, comprising:

Calculate the location information of the candidate frame；

According to location information, the confidence level of each candidate frame is determined；

Multiple confidence levels are ranked up, Confidence queue is obtained；

The numerical value highest that confidence level is extracted from candidate frame is corresponding referring to candidate frame；

Overlapping area calculating is carried out referring to candidate frame and the candidate frame by described；

Determine that being more than the default corresponding candidate frame of repetitive rate as the second class candidate frame carries out rejecting processing.In conjunction with first party Face, the embodiment of the invention provides the third possible embodiments of first aspect, wherein the calculating candidate frame The step of location information, comprising:

Registration calculated result is greater than the corresponding candidate frame of first threshold as prospect frame；

Prospect frame callout box corresponding with the prospect frame is subjected to coordinate and returns calculating, obtains offset；

According to the offset, the location information of the candidate frame is calculated.

With reference to first aspect, the embodiment of the invention provides the 4th kind of possible embodiments of first aspect, wherein institute The step of stating and feature extraction carried out to training sample set, obtaining multiple characteristic patterns, comprising:

Feature extraction is carried out to training sample set by neural network model, obtains multiple characteristic patterns；

Wherein, the neural network model includes: multiple residual blocks, each residual block include at least two convolutional layers, Batch regularization layer, batch normalization layer and two activation primitive layers；At least one convolutional layer, the batch regularization layer, activation Function layer, at least one convolutional layer, described batch of normalization layer, activation primitive layer are connected one by one.

With reference to first aspect, the embodiment of the invention provides the 5th kind of possible embodiments of first aspect, wherein institute Before stating the step of carrying out feature extraction to training sample set by neural network model, obtain multiple characteristic patterns, the method Further include:

The training parameter of neural network model is set；The training parameter includes one or more in following parameter: The number of iterations, batch size, image size, learning rate and learning rate pad value.

With reference to first aspect, the embodiment of the invention provides the 6th kind of possible embodiments of first aspect, wherein Described concentrate in the training sample chooses training image, and the training image is input in neural network model, is obtained more Before the step of a characteristic pattern, the method also includes:

The training image that the training sample is concentrated carries out one or more of processing: flip horizontal handles, is vertical Overturning processing adjusts lighting process, removal noise processed；

By treated, the training image is added to the training sample concentration.

Second aspect, the embodiment of the present invention also provide a kind of training device of text identification network, comprising:

Extraction module obtains multiple characteristic patterns for carrying out feature extraction to training sample set；The training sample is concentrated The training image of callout box including mark content of text and text position；

Generation module, for generating candidate frame collection for centered on each pixel in characteristic pattern；The candidate frame Ji Bao Include at least one candidate frame；

Module is rejected, is more than the corresponding first kind candidate frame of the training image size for rejecting, and is more than default weight The multiple corresponding second class candidate frame of rate；

Training module, for according to the candidate frame collection after rejecting, training text to identify network.

The third aspect, the embodiment of the present invention also provide a kind of optical character recognition method, comprising:

Obtain testing image；

The testing image is inputted in the text identification network, identifies content of text and text in the testing image This position；The text identification network is according to the training of the training method of the described in any item text identification networks of above-described embodiment It obtains.

Fourth aspect, the embodiment of the present invention also provide a kind of meter of non-volatile program code that can be performed with processor Calculation machine readable medium, said program code make the processor execute described any the method for above-described embodiment.

The embodiment of the present invention bring it is following the utility model has the advantages that by training sample set carry out feature extraction, obtain multiple Characteristic pattern；It includes the training image for marking the callout box of content of text and text position that the training sample, which is concentrated, by characteristic pattern In each pixel centered on, generate candidate frame collection；The candidate frame collection includes at least one candidate frame, and it is more than described for rejecting The corresponding first kind candidate frame of training image size, and be more than the default corresponding second class candidate frame of repetitive rate, after rejecting Candidate frame collection, training text identifies network, and the candidate frame after rejecting is trained by the present invention, can guarantee accuracy In the case of improve training effectiveness.

Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention are in specification, claims And specifically noted structure is achieved and obtained in attached drawing.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 is the flow chart of the training method of text identification network provided in an embodiment of the present invention；

Fig. 2 is the structure chart of the training device of text identification network provided in an embodiment of the present invention；

Fig. 3 is the flow chart of optical character recognition method provided in an embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.

The training method and optical character recognition method of a kind of text identification network provided in an embodiment of the present invention, this hair It is bright to be trained the candidate frame after rejecting, training effectiveness can be improved in the case where guaranteeing accuracy.For convenient for pair The present embodiment is understood, is carried out first to a kind of training method of text identification network disclosed in the embodiment of the present invention detailed It introduces,

As shown in connection with fig. 1, this method comprises:

S110: feature extraction is carried out to training sample set, obtains multiple characteristic patterns；It includes mark text that training sample, which is concentrated, The training image of the callout box of content.

Optionally, step S110 can carry out feature extraction to training sample set by neural network model, obtain multiple Characteristic pattern.

Wherein, the neural network model includes: multiple residual blocks, each residual block include at least two convolutional layers, Batch regularization layer, batch normalization layer and two activation primitive layers；At least one convolutional layer, the batch regularization layer, activation Function layer, at least one convolutional layer, described batch of normalization layer, activation primitive layer are connected one by one.Batch regularization layer can be effective Training parameter over-fitting is prevented, this is because neural network model parameter is updated after iteration each time, upper one layer of net By the calculating of network next time, it is that the learning tape of next layer network is next difficult that the distribution of data, which can change, for the output of network. Non-linear factor is added to model in activation primitive layer, and neural network can be improved to the expression batch normalization layer of model, with jump After structure is carried out by element phase add operation, then carry out activation primitive operation.This structure passes through study residual error (F (x)+x-x) Mode solve the network number of plies it is more caused by gradient degenerate problems, allow network structure be easier to learn into data feature, Important function has been played in various image processing tasks.

Citing: the structure of neural network model can be the residual error network structure of 50 layer depths, and residual error network is by one Each and every one residual block is constituted.Residual block includes three convolutional layers, is all added after the first convolutional layer and the connection of the second convolutional layer Batch regularization layer and the first activation primitive layer, the first activation primitive layer are connect with third convolutional layer, and are followed by batch normalization Layer and the second activation primitive layer operation.

In the training process for above-mentioned neural network model, by training image by a series of convolution extract feature and under 16 times of sampling obtains the characteristic pattern of original image.Down-sampling refer to by step-length be 2 convolutional layer when, by the 2*2 window of original image Interior image becomes a pixel, and the characteristic pattern extracted is made respectively to reduce one times in length and width, altogether by adopting under 4 times Sample obtains characteristic pattern, therefore the 1/16 of the length of characteristic pattern and wide respectively original image.The spy in training image is obtained by residual block Reference breath, the residual block of process is more, and the characteristic information of the image extracted is more detailed, more advanced, and network model is later Test result will be more preferable.

Feature extraction is carried out to training sample set by neural network model described, the step of obtaining multiple characteristic patterns it Before, the method also includes:

The training parameter of neural network model is set；The training parameter includes one or more in following parameter: The number of iterations, batch size, image size, learning rate and learning rate pad value.The number of iterations refers to all instructions on training sample set The frequency of training for practicing image, generally determines according to the quantity of training image and network model size.Criticizing size indicates network one The picture number that secondary propagated forward is loaded into.Picture size refers to that image needs to be adjusted to size appropriate before training, according to definition Length and width change the size of picture.Change image size and needs to consider following two situation: first, image is excessive, training process In the maximum memory that hardware can be provided is had exceeded to memory required in image processing process, so as to cause program crashing； Two, proportion is very small in the picture for the identified target in image, if being directly trained original image input, may calculate The detection effect of method output is bad, in some instances it may even be possible to the case where missing identification target occurs.Learning rate is non-in deep learning training Often important parameter, learning rate indicate the learning rate of Controlling model, when using inappropriate learning rate, it may appear that two kinds of shadows The case where ringing result: one, learning rate is excessive, is unable to reach optimal models in network model training, makes the accuracy for obtaining model It is very low；Two, learning rate is too small, and trained network model convergence is very slow, and the number of iterations for needing to grow can be only achieved expected knot very much Fruit wastes time very much, or even by aimed at precision is also not achieved for a long time.Therefore need to choose suitable data set.Depth The method for generally taking dynamic regularized learning algorithm rate during learning training, therefore this parameter of learning rate pad value has been used, first Learning rate is set for a bigger value, such as 0.001, after certain the number of iterations, learning rate is according to learning rate Pad value reduces, the new learning rate value after reduction are as follows: new learning rate=old learning rate * learning rate pad value generally sets learning rate Pad value can be 0.1.

Certainly, before the step of carrying out feature extraction to training sample set, needing acquisition includes the instruction of content of text Practice sample set；It includes the training image for marking the callout box of content of text that training sample, which is concentrated,.

For example, can be using license plate number image as training sample set.It particularly, can be with industrial camera to a large amount of License plate is taken pictures, and need to be met and be taken pictures from different perspectives with different background, guarantees the diversity of training sample set.Next The image comprising characters on license plate of acquisition is labeled, the location information and character content of image of the record comprising characters on license plate Information.Specific practice be character each in the image comprising characters on license plate is surrounded with rectangle frame with image labeling tool, and The character content surrounded and location information are recorded, to be trained.

And by the training image progress one or more of processing of training sample concentration: flip horizontal is handled, is turned over vertically Turn processing, adjust lighting process, removal noise processed；

By treated, the training image is added to training sample concentration.

Specifically, removal noise processed refers to chequered with black and white noise, is often caused by image cutting, removes salt-pepper noise Most common method is median filtering.To each image preprocessing mode can choose whether using.If horizontal using overturning Overturning and vertical overturning overturn image using central axes as axis on corresponding direction；If filtered using salt-pepper noise, need Set the size of salt-pepper noise, it is necessary to for the odd number greater than 1；If adjusting illumination need to be set using illumination variation is adjusted Parameter, the value are the number between one 0~255, will adjust number of the illumination parameter multiplied by one 0~1, then adjust to original image Section.Data prediction enhances training dataset, enriches the feature of training sample set, training is more filled Point, it is more efficient.

S120: centered on each pixel in characteristic pattern, candidate frame collection is generated；The candidate frame collection includes at least one A candidate frame.

Wherein, each of multiple characteristic patterns characteristic pattern is handled as follows, the convolution for the use of convolution kernel size being 3*3 Slide window processing is carried out to characteristic pattern, i.e., centered on each of characteristic pattern pixel, generates anchor point, different length and width can be generated Than the candidate frame with size, whole characteristic patterns is covered, candidate frame generally sets length-width ratio as 1/2,1/1,2/1 three kind of ratio Example, area ratio 4,8,16 3 kinds of ratios, such one is obtained the anchor point of 9 kinds of scales, and anchor point is mapped back original image.It is former in this way Image has just been all covered candidate frame, these candidate frames contain whole target and backgrounds to be identified.

S130: rejecting is more than the corresponding first kind candidate frame of the training image size, and is more than that default repetitive rate is corresponding The second class candidate frame；

Wherein, the step of rejecting first kind candidate frame corresponding more than the training image size, comprising:

Wherein, the step of rejecting the second class candidate frame corresponding more than default repetitive rate, comprising:

Multiple confidence levels are ranked up, Confidence queue is obtained；

Determine that being more than the default corresponding candidate frame of repetitive rate as the second class candidate frame carries out rejecting processing.

S140: according to the candidate frame collection after rejecting, training text identifies network.

The classification information and location information for reject the candidate frame collection of post-processing, calculated class are calculated first Other information and location information training text identify network.It is to be understood that by the training corresponding with candidate frame of each candidate frame The callout box of image carries out registration calculating, determines the classification information of multiple candidate frames, and extract multiple prospect frames；By prospect Frame callout box corresponding with prospect frame carries out coordinate and returns calculating, obtains offset；According to offset, multiple initial candidates are calculated The location information of frame.

Citing: registration calculating is carried out by the callout box of candidate frame training image corresponding with candidate frame, by candidate frame It is greater than 0.7 with callout box registration, the initial candidate frame that candidate frame classification is set to 1 is extracted as prospect frame, and to belonging to Prospect frame and callout box progress coordinate are returned and are calculated, and calculate offset, and initial candidate frame is obtained one newly by coordinate transform Location information, compare calculating with the classification of original image and location information respectively, obtain loss function.Loss function is a kind of The function for measuring loss and extent of error, in neural metwork training, backpropagation plays adjustment network parameter, optimizes network The effect of model.Treated candidate frame and the numerical value that coordinate transform obtains are passed into next layer network.

Area-of-interest is mapped back into characteristic pattern first, is the part of same size by the region division after mapping, to every Part carries out maximum pondization operation, obtains the stronger target area to be identified of specific aim, is exactly next to target class to be identified It is not trained.Target area is operated by a series of residual blocks, obtains the characteristic information of target area, and believe this feature The position of breath and classification carry out learning training, are continued to optimize by loss function backpropagation as a result, obtaining more accurate position Information and classification information.

Certain step S130 and step S140 when being executed, can intersect progress.As one example: by each initial time The corresponding callout box of frame is selected to carry out registration calculating, when initial candidate frame and callout box registration are greater than 0.7, candidate frame class It is not set to 1, belongs to prospect frame.When registration is less than 0.3, setting classification is 0, belongs to background frame, when registration is in 0.3~ Between 0.7, setting classification is -1, is set to -1 as the second class candidate frame, that is, is more than training image size, in training later not Give consideration.Then using carrying out coordinate to prospect frame and callout box and returning to calculate, calculate offset, respectively with the classification of original image and Location information compares calculating, obtains loss function.Loss function is a kind of function for measuring loss and extent of error, in mind Adjustment network parameter is played through backpropagation when network training, optimizes the effect of network model.Will treated candidate frame and The numerical value that coordinate transform obtains passes to next layer network.

Candidate frame, which is carried out, according to obtained coordinate transform and offset returns operation, all prospect frame and background frames of belonging to Initial candidate frame obtains a new location information by coordinate transform.Then non-maxima suppression, removal are carried out to candidate frame Extra and overlapping candidate frame, concrete operations are as follows: assign a confidence level to each frame according to location information.One, will own Frame by score sort, choose best result and corresponding frame.Two, remaining frame is traversed, if the weight with current best result frame Folded area is greater than certain threshold value, just removes the frame.Three, continue to choose a highest scoring from untreated frame, repeat One, two process.

The above are the overall processes that an image is once trained.In the training process, when observing that loss function no longer changes When, it can be considered that training terminates, save the model that training is completed at this time.Call training complete network model and test data set into Row test, testing process and training are essentially identical, confidence level of the test parameter for needing to set to test detection block works as test As a result target frame confidence level be less than threshold value when, be considered as it is unidentified come out, general threshold value is set as 0.7.After test, number is tested According to collection on can render recognition result, the content including target frame and target character comprising target character, it is possible thereby to according to The calculating of statistical test result progress quantizating index.When the obtained target frame of test and real goal frame Duplication are higher than 0.5, and Object content identification is correct, and recognition result is judged to positive example, if content recognition mistake, result is judged to negative example, by positive example quantity Division is done with the sum of the negative number of cases amount of positive example and obtains the accuracy rate of model, and the destination number of positive example quantity and all tests is done into division Recall rate is obtained, may determine that whether network model performance reaches expected according to two evaluation indexes.

The embodiment of the present invention also provides a kind of training device of text identification network, as shown in connection with fig. 2, described device packet It includes:

Extraction module 210 obtains multiple characteristic patterns for carrying out feature extraction to training sample set；The training sample Concentrating includes the training image for marking the callout box of content of text and text position；

Generation module 220, for generating candidate frame collection for centered on each pixel in characteristic pattern；The candidate frame Collection includes at least one candidate frame；

Module 230 is rejected, is more than the corresponding first kind candidate frame of the training image size for rejecting, and is more than default The corresponding second class candidate frame of repetitive rate；

Training module 240, for according to the candidate frame collection after rejecting, training text to identify network.

The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.

The embodiment of the present invention also provides a kind of optical character recognition method, as shown in connection with fig. 3, this method comprises:

S310: testing image is obtained；

S320: testing image is inputted in text identification network, identifies content of text in testing image, text identification net Network is to be obtained according to the training of the training method of the described in any item text identification networks of above-described embodiment.

The embodiment of the present invention also provide it is a kind of with processor can be performed non-volatile program code it is computer-readable Medium, said program code make the processor execute described any the method for above-described embodiment.

The embodiment of the present invention also provides a kind of electronic equipment, comprising: processor, memory；Wherein, memory may include High-speed random access memory (RAM, Random Access Memory), it is also possible to further include non-labile memory (non-volatile memory), for example, at least a magnetic disk storage.Wherein, memory is for storing program, the processing Device executes described program after receiving and executing instruction, the stream process definition that aforementioned any embodiment of the embodiment of the present invention discloses Device performed by method can be applied in processor, or realized by processor.

Technical solution of the present invention substantially the part that contributes to existing technology or the technical solution in other words Part can be embodied in the form of software products, which is stored in a storage medium, if including Dry instruction is used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes this hair The all or part of the steps of bright each embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only deposits Reservoir (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or The various media that can store program code such as CD.

Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light It is readily conceivable that variation or equivalent replacement of some of the technical features；And these modifications, variation or replacement, do not make The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. a kind of training method of text identification network characterized by comprising

Feature extraction is carried out to training sample set, obtains multiple characteristic patterns；It includes mark content of text that the training sample, which is concentrated, With the training image of the callout box of text position；

Centered on each pixel in characteristic pattern, candidate frame collection is generated；The candidate frame collection includes at least one candidate frame；

Rejecting is more than the corresponding first kind candidate frame of the training image size, and is more than that corresponding second class of default repetitive rate is waited Select frame；

2. the training method of text identification network according to claim 1, which is characterized in that described reject is more than the instruction The step of practicing image size corresponding first kind candidate frame, comprising:

It is rejected using candidate frame of the registration calculated result between first threshold and second threshold as first kind candidate frame Processing.

3. the training method of text identification network according to claim 2, which is characterized in that described reject is more than default weight The step of multiple rate corresponding second class candidate frame, comprising:

Calculate the location information of the candidate frame；

Multiple confidence levels are ranked up, Confidence queue is obtained；

4. the training method of text identification network according to claim 3, which is characterized in that described to calculate the candidate frame Location information the step of, comprising:

5. the training method of text identification network according to claim 1, which is characterized in that it is described to training sample set into Row feature extraction, the step of obtaining multiple characteristic patterns, comprising:

Wherein, the neural network model includes: multiple residual blocks, each residual block includes at least two convolutional layers, batch Regularization layer, batch normalization layer and two activation primitive layers；At least one convolutional layer, the batch regularization layer, activation primitive Layer, at least one convolutional layer, described batch of normalization layer, activation primitive layer are connected one by one.

6. the training method of text identification network according to claim 5, which is characterized in that described to pass through neural network mould Before the step of type carries out feature extraction to training sample set, obtains multiple characteristic patterns, the method also includes:

The training parameter of neural network model is set；The training parameter includes one or more in following parameter: iteration Number, batch size, image size, learning rate and learning rate pad value.

7. the training method of text identification network according to claim 1, which is characterized in that described in the trained sample This concentration chooses training image, the step of training image is input in neural network model, obtains multiple characteristic patterns it Before, the method also includes:

The training image that the training sample is concentrated carries out one or more of processing: flip horizontal processing, vertical overturning Processing adjusts lighting process, removal noise processed；

By treated, the training image is added to the training sample concentration.

8. a kind of training device of text identification network characterized by comprising

Extraction module obtains multiple characteristic patterns for carrying out feature extraction to training sample set；The training sample is concentrated Mark the training image of the callout box of content of text and text position；

Generation module, for generating candidate frame collection for centered on each pixel in characteristic pattern；The candidate frame collection includes extremely A few candidate frame；

Module is rejected, is more than the corresponding first kind candidate frame of the training image size for rejecting, and is more than default repetitive rate Corresponding second class candidate frame；

9. a kind of optical character recognition method characterized by comprising

Obtain testing image；

The testing image is inputted in the text identification network, identifies content of text and text position in the testing image It sets；The text identification network is trained according to the training methods of the described in any item text identification networks of claim 1-7 It arrives.

10. a kind of computer-readable medium for the non-volatile program code that can be performed with processor, which is characterized in that described Program code makes the processor execute described any the method for claim 1-7.