CN110135423A - The training method and optical character recognition method of text identification network - Google Patents
The training method and optical character recognition method of text identification network Download PDFInfo
- Publication number
- CN110135423A CN110135423A CN201910431918.4A CN201910431918A CN110135423A CN 110135423 A CN110135423 A CN 110135423A CN 201910431918 A CN201910431918 A CN 201910431918A CN 110135423 A CN110135423 A CN 110135423A
- Authority
- CN
- China
- Prior art keywords
- training
- candidate frame
- text
- image
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides the training methods and optical character recognition method of a kind of text identification network, are related to the technical field of text identification, this method comprises: carrying out feature extraction to training sample set, obtain multiple characteristic patterns;It includes the training image for marking the callout box of content of text and text position that training sample, which is concentrated,;Centered on each pixel in characteristic pattern, candidate frame collection is generated;Candidate frame collection includes at least one candidate frame;Rejecting is more than the corresponding first kind candidate frame of training image size, and is more than the default corresponding second class candidate frame of repetitive rate;According to the candidate frame collection after rejecting, training text identifies network.The present invention can improve training effectiveness in the case where guaranteeing accuracy.
Description
Technical field
The present invention relates to text recognition technique fields, more particularly, to the training method and optics of a kind of text identification network
Character identifying method.
Background technique
Optical character identification, which refers to, is identified the content of text in image simultaneously by optical device such as camera or scanner
It is input in computer, avoids and artificially repeat inefficient labour, greatly improve working efficiency, have in industry and life
It is widely applied.In present society life, optical character identification depends on traditional algorithm to the identification of image, in tradition
In training process, extraction characteristics of image figure, recycling characteristic pattern are trained convolutional neural networks first.However, from image
The characteristic pattern extracted includes the feature and the duplicate background of multiple possible contents of the duplicate target text of multiple possible contents
Feature, also include the feature of target and background doping, the feature duplicate for content or with doping in training
The effect and bad come is trained, causes training effectiveness relatively low.
Summary of the invention
In view of this, the purpose of the present invention is to provide the training method of text identification network and optical character identification sides
The accuracy of recognition result can be improved in method.
In a first aspect, the embodiment of the invention provides a kind of training methods of text identification network, comprising:
Feature extraction is carried out to training sample set, obtains multiple characteristic patterns;It includes mark text that the training sample, which is concentrated,
The training image of content and the callout box of text position;
Centered on each pixel in characteristic pattern, candidate frame collection is generated;The candidate frame collection includes at least one time
Select frame;
Rejecting is more than the corresponding first kind candidate frame of the training image size, and is more than default repetitive rate corresponding second
Class candidate frame;
According to the candidate frame collection after rejecting, training text identifies network.
With reference to first aspect, the embodiment of the invention provides the first possible embodiments of first aspect, wherein institute
State the step of rejecting first kind candidate frame corresponding more than the training image size, comprising:
The callout box of each candidate frame training image corresponding with candidate frame is subjected to registration calculating;
It is carried out candidate frame of the registration calculated result between first threshold and second threshold as first kind candidate frame
Rejecting processing.
With reference to first aspect, the embodiment of the invention provides second of possible embodiments of first aspect, wherein institute
State the step of rejecting the second class candidate frame corresponding more than default repetitive rate, comprising:
Calculate the location information of the candidate frame;
According to location information, the confidence level of each candidate frame is determined;
Multiple confidence levels are ranked up, Confidence queue is obtained;
The numerical value highest that confidence level is extracted from candidate frame is corresponding referring to candidate frame;
Overlapping area calculating is carried out referring to candidate frame and the candidate frame by described;
Determine that being more than the default corresponding candidate frame of repetitive rate as the second class candidate frame carries out rejecting processing.In conjunction with first party
Face, the embodiment of the invention provides the third possible embodiments of first aspect, wherein the calculating candidate frame
The step of location information, comprising:
Registration calculated result is greater than the corresponding candidate frame of first threshold as prospect frame;
Prospect frame callout box corresponding with the prospect frame is subjected to coordinate and returns calculating, obtains offset;
According to the offset, the location information of the candidate frame is calculated.
With reference to first aspect, the embodiment of the invention provides the 4th kind of possible embodiments of first aspect, wherein institute
The step of stating and feature extraction carried out to training sample set, obtaining multiple characteristic patterns, comprising:
Feature extraction is carried out to training sample set by neural network model, obtains multiple characteristic patterns;
Wherein, the neural network model includes: multiple residual blocks, each residual block include at least two convolutional layers,
Batch regularization layer, batch normalization layer and two activation primitive layers;At least one convolutional layer, the batch regularization layer, activation
Function layer, at least one convolutional layer, described batch of normalization layer, activation primitive layer are connected one by one.
With reference to first aspect, the embodiment of the invention provides the 5th kind of possible embodiments of first aspect, wherein institute
Before stating the step of carrying out feature extraction to training sample set by neural network model, obtain multiple characteristic patterns, the method
Further include:
The training parameter of neural network model is set;The training parameter includes one or more in following parameter:
The number of iterations, batch size, image size, learning rate and learning rate pad value.
With reference to first aspect, the embodiment of the invention provides the 6th kind of possible embodiments of first aspect, wherein
Described concentrate in the training sample chooses training image, and the training image is input in neural network model, is obtained more
Before the step of a characteristic pattern, the method also includes:
The training image that the training sample is concentrated carries out one or more of processing: flip horizontal handles, is vertical
Overturning processing adjusts lighting process, removal noise processed;
By treated, the training image is added to the training sample concentration.
Second aspect, the embodiment of the present invention also provide a kind of training device of text identification network, comprising:
Extraction module obtains multiple characteristic patterns for carrying out feature extraction to training sample set;The training sample is concentrated
The training image of callout box including mark content of text and text position;
Generation module, for generating candidate frame collection for centered on each pixel in characteristic pattern;The candidate frame Ji Bao
Include at least one candidate frame;
Module is rejected, is more than the corresponding first kind candidate frame of the training image size for rejecting, and is more than default weight
The multiple corresponding second class candidate frame of rate;
Training module, for according to the candidate frame collection after rejecting, training text to identify network.
The third aspect, the embodiment of the present invention also provide a kind of optical character recognition method, comprising:
Obtain testing image;
The testing image is inputted in the text identification network, identifies content of text and text in the testing image
This position;The text identification network is according to the training of the training method of the described in any item text identification networks of above-described embodiment
It obtains.
Fourth aspect, the embodiment of the present invention also provide a kind of meter of non-volatile program code that can be performed with processor
Calculation machine readable medium, said program code make the processor execute described any the method for above-described embodiment.
The embodiment of the present invention bring it is following the utility model has the advantages that by training sample set carry out feature extraction, obtain multiple
Characteristic pattern;It includes the training image for marking the callout box of content of text and text position that the training sample, which is concentrated, by characteristic pattern
In each pixel centered on, generate candidate frame collection;The candidate frame collection includes at least one candidate frame, and it is more than described for rejecting
The corresponding first kind candidate frame of training image size, and be more than the default corresponding second class candidate frame of repetitive rate, after rejecting
Candidate frame collection, training text identifies network, and the candidate frame after rejecting is trained by the present invention, can guarantee accuracy
In the case of improve training effectiveness.
Other features and advantages of the present invention will illustrate in the following description, also, partly become from specification
It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention are in specification, claims
And specifically noted structure is achieved and obtained in attached drawing.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate
Appended attached drawing, is described in detail below.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art
Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below
Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor
It puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is the flow chart of the training method of text identification network provided in an embodiment of the present invention;
Fig. 2 is the structure chart of the training device of text identification network provided in an embodiment of the present invention;
Fig. 3 is the flow chart of optical character recognition method provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention
Technical solution be clearly and completely described, it is clear that described embodiments are some of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
The training method and optical character recognition method of a kind of text identification network provided in an embodiment of the present invention, this hair
It is bright to be trained the candidate frame after rejecting, training effectiveness can be improved in the case where guaranteeing accuracy.For convenient for pair
The present embodiment is understood, is carried out first to a kind of training method of text identification network disclosed in the embodiment of the present invention detailed
It introduces,
As shown in connection with fig. 1, this method comprises:
S110: feature extraction is carried out to training sample set, obtains multiple characteristic patterns;It includes mark text that training sample, which is concentrated,
The training image of the callout box of content.
Optionally, step S110 can carry out feature extraction to training sample set by neural network model, obtain multiple
Characteristic pattern.
Wherein, the neural network model includes: multiple residual blocks, each residual block include at least two convolutional layers,
Batch regularization layer, batch normalization layer and two activation primitive layers;At least one convolutional layer, the batch regularization layer, activation
Function layer, at least one convolutional layer, described batch of normalization layer, activation primitive layer are connected one by one.Batch regularization layer can be effective
Training parameter over-fitting is prevented, this is because neural network model parameter is updated after iteration each time, upper one layer of net
By the calculating of network next time, it is that the learning tape of next layer network is next difficult that the distribution of data, which can change, for the output of network.
Non-linear factor is added to model in activation primitive layer, and neural network can be improved to the expression batch normalization layer of model, with jump
After structure is carried out by element phase add operation, then carry out activation primitive operation.This structure passes through study residual error (F (x)+x-x)
Mode solve the network number of plies it is more caused by gradient degenerate problems, allow network structure be easier to learn into data feature,
Important function has been played in various image processing tasks.
Citing: the structure of neural network model can be the residual error network structure of 50 layer depths, and residual error network is by one
Each and every one residual block is constituted.Residual block includes three convolutional layers, is all added after the first convolutional layer and the connection of the second convolutional layer
Batch regularization layer and the first activation primitive layer, the first activation primitive layer are connect with third convolutional layer, and are followed by batch normalization
Layer and the second activation primitive layer operation.
In the training process for above-mentioned neural network model, by training image by a series of convolution extract feature and under
16 times of sampling obtains the characteristic pattern of original image.Down-sampling refer to by step-length be 2 convolutional layer when, by the 2*2 window of original image
Interior image becomes a pixel, and the characteristic pattern extracted is made respectively to reduce one times in length and width, altogether by adopting under 4 times
Sample obtains characteristic pattern, therefore the 1/16 of the length of characteristic pattern and wide respectively original image.The spy in training image is obtained by residual block
Reference breath, the residual block of process is more, and the characteristic information of the image extracted is more detailed, more advanced, and network model is later
Test result will be more preferable.
Feature extraction is carried out to training sample set by neural network model described, the step of obtaining multiple characteristic patterns it
Before, the method also includes:
The training parameter of neural network model is set;The training parameter includes one or more in following parameter:
The number of iterations, batch size, image size, learning rate and learning rate pad value.The number of iterations refers to all instructions on training sample set
The frequency of training for practicing image, generally determines according to the quantity of training image and network model size.Criticizing size indicates network one
The picture number that secondary propagated forward is loaded into.Picture size refers to that image needs to be adjusted to size appropriate before training, according to definition
Length and width change the size of picture.Change image size and needs to consider following two situation: first, image is excessive, training process
In the maximum memory that hardware can be provided is had exceeded to memory required in image processing process, so as to cause program crashing;
Two, proportion is very small in the picture for the identified target in image, if being directly trained original image input, may calculate
The detection effect of method output is bad, in some instances it may even be possible to the case where missing identification target occurs.Learning rate is non-in deep learning training
Often important parameter, learning rate indicate the learning rate of Controlling model, when using inappropriate learning rate, it may appear that two kinds of shadows
The case where ringing result: one, learning rate is excessive, is unable to reach optimal models in network model training, makes the accuracy for obtaining model
It is very low;Two, learning rate is too small, and trained network model convergence is very slow, and the number of iterations for needing to grow can be only achieved expected knot very much
Fruit wastes time very much, or even by aimed at precision is also not achieved for a long time.Therefore need to choose suitable data set.Depth
The method for generally taking dynamic regularized learning algorithm rate during learning training, therefore this parameter of learning rate pad value has been used, first
Learning rate is set for a bigger value, such as 0.001, after certain the number of iterations, learning rate is according to learning rate
Pad value reduces, the new learning rate value after reduction are as follows: new learning rate=old learning rate * learning rate pad value generally sets learning rate
Pad value can be 0.1.
Certainly, before the step of carrying out feature extraction to training sample set, needing acquisition includes the instruction of content of text
Practice sample set;It includes the training image for marking the callout box of content of text that training sample, which is concentrated,.
For example, can be using license plate number image as training sample set.It particularly, can be with industrial camera to a large amount of
License plate is taken pictures, and need to be met and be taken pictures from different perspectives with different background, guarantees the diversity of training sample set.Next
The image comprising characters on license plate of acquisition is labeled, the location information and character content of image of the record comprising characters on license plate
Information.Specific practice be character each in the image comprising characters on license plate is surrounded with rectangle frame with image labeling tool, and
The character content surrounded and location information are recorded, to be trained.
And by the training image progress one or more of processing of training sample concentration: flip horizontal is handled, is turned over vertically
Turn processing, adjust lighting process, removal noise processed;
By treated, the training image is added to training sample concentration.
Specifically, removal noise processed refers to chequered with black and white noise, is often caused by image cutting, removes salt-pepper noise
Most common method is median filtering.To each image preprocessing mode can choose whether using.If horizontal using overturning
Overturning and vertical overturning overturn image using central axes as axis on corresponding direction;If filtered using salt-pepper noise, need
Set the size of salt-pepper noise, it is necessary to for the odd number greater than 1;If adjusting illumination need to be set using illumination variation is adjusted
Parameter, the value are the number between one 0~255, will adjust number of the illumination parameter multiplied by one 0~1, then adjust to original image
Section.Data prediction enhances training dataset, enriches the feature of training sample set, training is more filled
Point, it is more efficient.
S120: centered on each pixel in characteristic pattern, candidate frame collection is generated;The candidate frame collection includes at least one
A candidate frame.
Wherein, each of multiple characteristic patterns characteristic pattern is handled as follows, the convolution for the use of convolution kernel size being 3*3
Slide window processing is carried out to characteristic pattern, i.e., centered on each of characteristic pattern pixel, generates anchor point, different length and width can be generated
Than the candidate frame with size, whole characteristic patterns is covered, candidate frame generally sets length-width ratio as 1/2,1/1,2/1 three kind of ratio
Example, area ratio 4,8,16 3 kinds of ratios, such one is obtained the anchor point of 9 kinds of scales, and anchor point is mapped back original image.It is former in this way
Image has just been all covered candidate frame, these candidate frames contain whole target and backgrounds to be identified.
S130: rejecting is more than the corresponding first kind candidate frame of the training image size, and is more than that default repetitive rate is corresponding
The second class candidate frame;
Wherein, the step of rejecting first kind candidate frame corresponding more than the training image size, comprising:
It is carried out candidate frame of the registration calculated result between first threshold and second threshold as first kind candidate frame
Rejecting processing.
Wherein, the step of rejecting the second class candidate frame corresponding more than default repetitive rate, comprising:
According to location information, the confidence level of each candidate frame is determined;
Multiple confidence levels are ranked up, Confidence queue is obtained;
The numerical value highest that confidence level is extracted from candidate frame is corresponding referring to candidate frame;
Overlapping area calculating is carried out referring to candidate frame and the candidate frame by described;
Determine that being more than the default corresponding candidate frame of repetitive rate as the second class candidate frame carries out rejecting processing.
S140: according to the candidate frame collection after rejecting, training text identifies network.
The classification information and location information for reject the candidate frame collection of post-processing, calculated class are calculated first
Other information and location information training text identify network.It is to be understood that by the training corresponding with candidate frame of each candidate frame
The callout box of image carries out registration calculating, determines the classification information of multiple candidate frames, and extract multiple prospect frames;By prospect
Frame callout box corresponding with prospect frame carries out coordinate and returns calculating, obtains offset;According to offset, multiple initial candidates are calculated
The location information of frame.
Citing: registration calculating is carried out by the callout box of candidate frame training image corresponding with candidate frame, by candidate frame
It is greater than 0.7 with callout box registration, the initial candidate frame that candidate frame classification is set to 1 is extracted as prospect frame, and to belonging to
Prospect frame and callout box progress coordinate are returned and are calculated, and calculate offset, and initial candidate frame is obtained one newly by coordinate transform
Location information, compare calculating with the classification of original image and location information respectively, obtain loss function.Loss function is a kind of
The function for measuring loss and extent of error, in neural metwork training, backpropagation plays adjustment network parameter, optimizes network
The effect of model.Treated candidate frame and the numerical value that coordinate transform obtains are passed into next layer network.
Area-of-interest is mapped back into characteristic pattern first, is the part of same size by the region division after mapping, to every
Part carries out maximum pondization operation, obtains the stronger target area to be identified of specific aim, is exactly next to target class to be identified
It is not trained.Target area is operated by a series of residual blocks, obtains the characteristic information of target area, and believe this feature
The position of breath and classification carry out learning training, are continued to optimize by loss function backpropagation as a result, obtaining more accurate position
Information and classification information.
Certain step S130 and step S140 when being executed, can intersect progress.As one example: by each initial time
The corresponding callout box of frame is selected to carry out registration calculating, when initial candidate frame and callout box registration are greater than 0.7, candidate frame class
It is not set to 1, belongs to prospect frame.When registration is less than 0.3, setting classification is 0, belongs to background frame, when registration is in 0.3~
Between 0.7, setting classification is -1, is set to -1 as the second class candidate frame, that is, is more than training image size, in training later not
Give consideration.Then using carrying out coordinate to prospect frame and callout box and returning to calculate, calculate offset, respectively with the classification of original image and
Location information compares calculating, obtains loss function.Loss function is a kind of function for measuring loss and extent of error, in mind
Adjustment network parameter is played through backpropagation when network training, optimizes the effect of network model.Will treated candidate frame and
The numerical value that coordinate transform obtains passes to next layer network.
Candidate frame, which is carried out, according to obtained coordinate transform and offset returns operation, all prospect frame and background frames of belonging to
Initial candidate frame obtains a new location information by coordinate transform.Then non-maxima suppression, removal are carried out to candidate frame
Extra and overlapping candidate frame, concrete operations are as follows: assign a confidence level to each frame according to location information.One, will own
Frame by score sort, choose best result and corresponding frame.Two, remaining frame is traversed, if the weight with current best result frame
Folded area is greater than certain threshold value, just removes the frame.Three, continue to choose a highest scoring from untreated frame, repeat
One, two process.
The above are the overall processes that an image is once trained.In the training process, when observing that loss function no longer changes
When, it can be considered that training terminates, save the model that training is completed at this time.Call training complete network model and test data set into
Row test, testing process and training are essentially identical, confidence level of the test parameter for needing to set to test detection block works as test
As a result target frame confidence level be less than threshold value when, be considered as it is unidentified come out, general threshold value is set as 0.7.After test, number is tested
According to collection on can render recognition result, the content including target frame and target character comprising target character, it is possible thereby to according to
The calculating of statistical test result progress quantizating index.When the obtained target frame of test and real goal frame Duplication are higher than 0.5, and
Object content identification is correct, and recognition result is judged to positive example, if content recognition mistake, result is judged to negative example, by positive example quantity
Division is done with the sum of the negative number of cases amount of positive example and obtains the accuracy rate of model, and the destination number of positive example quantity and all tests is done into division
Recall rate is obtained, may determine that whether network model performance reaches expected according to two evaluation indexes.
The embodiment of the present invention also provides a kind of training device of text identification network, as shown in connection with fig. 2, described device packet
It includes:
Extraction module 210 obtains multiple characteristic patterns for carrying out feature extraction to training sample set;The training sample
Concentrating includes the training image for marking the callout box of content of text and text position;
Generation module 220, for generating candidate frame collection for centered on each pixel in characteristic pattern;The candidate frame
Collection includes at least one candidate frame;
Module 230 is rejected, is more than the corresponding first kind candidate frame of the training image size for rejecting, and is more than default
The corresponding second class candidate frame of repetitive rate;
Training module 240, for according to the candidate frame collection after rejecting, training text to identify network.
The technical effect and preceding method embodiment phase of device provided by the embodiment of the present invention, realization principle and generation
Together, to briefly describe, Installation practice part does not refer to place, can refer to corresponding contents in preceding method embodiment.
The embodiment of the present invention also provides a kind of optical character recognition method, as shown in connection with fig. 3, this method comprises:
S310: testing image is obtained;
S320: testing image is inputted in text identification network, identifies content of text in testing image, text identification net
Network is to be obtained according to the training of the training method of the described in any item text identification networks of above-described embodiment.
The embodiment of the present invention also provide it is a kind of with processor can be performed non-volatile program code it is computer-readable
Medium, said program code make the processor execute described any the method for above-described embodiment.
The embodiment of the present invention also provides a kind of electronic equipment, comprising: processor, memory;Wherein, memory may include
High-speed random access memory (RAM, Random Access Memory), it is also possible to further include non-labile memory
(non-volatile memory), for example, at least a magnetic disk storage.Wherein, memory is for storing program, the processing
Device executes described program after receiving and executing instruction, the stream process definition that aforementioned any embodiment of the embodiment of the present invention discloses
Device performed by method can be applied in processor, or realized by processor.
Technical solution of the present invention substantially the part that contributes to existing technology or the technical solution in other words
Part can be embodied in the form of software products, which is stored in a storage medium, if including
Dry instruction is used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes this hair
The all or part of the steps of bright each embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only deposits
Reservoir (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or
The various media that can store program code such as CD.
Finally, it should be noted that embodiment described above, only a specific embodiment of the invention, to illustrate the present invention
Technical solution, rather than its limitations, scope of protection of the present invention is not limited thereto, although with reference to the foregoing embodiments to this hair
It is bright to be described in detail, those skilled in the art should understand that: anyone skilled in the art
In the technical scope disclosed by the present invention, it can still modify to technical solution documented by previous embodiment or can be light
It is readily conceivable that variation or equivalent replacement of some of the technical features;And these modifications, variation or replacement, do not make
The essence of corresponding technical solution is detached from the spirit and scope of technical solution of the embodiment of the present invention, should all cover in protection of the invention
Within the scope of.Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. a kind of training method of text identification network characterized by comprising
Feature extraction is carried out to training sample set, obtains multiple characteristic patterns;It includes mark content of text that the training sample, which is concentrated,
With the training image of the callout box of text position;
Centered on each pixel in characteristic pattern, candidate frame collection is generated;The candidate frame collection includes at least one candidate frame;
Rejecting is more than the corresponding first kind candidate frame of the training image size, and is more than that corresponding second class of default repetitive rate is waited
Select frame;
According to the candidate frame collection after rejecting, training text identifies network.
2. the training method of text identification network according to claim 1, which is characterized in that described reject is more than the instruction
The step of practicing image size corresponding first kind candidate frame, comprising:
The callout box of each candidate frame training image corresponding with candidate frame is subjected to registration calculating;
It is rejected using candidate frame of the registration calculated result between first threshold and second threshold as first kind candidate frame
Processing.
3. the training method of text identification network according to claim 2, which is characterized in that described reject is more than default weight
The step of multiple rate corresponding second class candidate frame, comprising:
Calculate the location information of the candidate frame;
According to location information, the confidence level of each candidate frame is determined;
Multiple confidence levels are ranked up, Confidence queue is obtained;
The numerical value highest that confidence level is extracted from candidate frame is corresponding referring to candidate frame;
Overlapping area calculating is carried out referring to candidate frame and the candidate frame by described;
Determine that being more than the default corresponding candidate frame of repetitive rate as the second class candidate frame carries out rejecting processing.
4. the training method of text identification network according to claim 3, which is characterized in that described to calculate the candidate frame
Location information the step of, comprising:
Registration calculated result is greater than the corresponding candidate frame of first threshold as prospect frame;
Prospect frame callout box corresponding with the prospect frame is subjected to coordinate and returns calculating, obtains offset;
According to the offset, the location information of the candidate frame is calculated.
5. the training method of text identification network according to claim 1, which is characterized in that it is described to training sample set into
Row feature extraction, the step of obtaining multiple characteristic patterns, comprising:
Feature extraction is carried out to training sample set by neural network model, obtains multiple characteristic patterns;
Wherein, the neural network model includes: multiple residual blocks, each residual block includes at least two convolutional layers, batch
Regularization layer, batch normalization layer and two activation primitive layers;At least one convolutional layer, the batch regularization layer, activation primitive
Layer, at least one convolutional layer, described batch of normalization layer, activation primitive layer are connected one by one.
6. the training method of text identification network according to claim 5, which is characterized in that described to pass through neural network mould
Before the step of type carries out feature extraction to training sample set, obtains multiple characteristic patterns, the method also includes:
The training parameter of neural network model is set;The training parameter includes one or more in following parameter: iteration
Number, batch size, image size, learning rate and learning rate pad value.
7. the training method of text identification network according to claim 1, which is characterized in that described in the trained sample
This concentration chooses training image, the step of training image is input in neural network model, obtains multiple characteristic patterns it
Before, the method also includes:
The training image that the training sample is concentrated carries out one or more of processing: flip horizontal processing, vertical overturning
Processing adjusts lighting process, removal noise processed;
By treated, the training image is added to the training sample concentration.
8. a kind of training device of text identification network characterized by comprising
Extraction module obtains multiple characteristic patterns for carrying out feature extraction to training sample set;The training sample is concentrated
Mark the training image of the callout box of content of text and text position;
Generation module, for generating candidate frame collection for centered on each pixel in characteristic pattern;The candidate frame collection includes extremely
A few candidate frame;
Module is rejected, is more than the corresponding first kind candidate frame of the training image size for rejecting, and is more than default repetitive rate
Corresponding second class candidate frame;
Training module, for according to the candidate frame collection after rejecting, training text to identify network.
9. a kind of optical character recognition method characterized by comprising
Obtain testing image;
The testing image is inputted in the text identification network, identifies content of text and text position in the testing image
It sets;The text identification network is trained according to the training methods of the described in any item text identification networks of claim 1-7
It arrives.
10. a kind of computer-readable medium for the non-volatile program code that can be performed with processor, which is characterized in that described
Program code makes the processor execute described any the method for claim 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910431918.4A CN110135423A (en) | 2019-05-23 | 2019-05-23 | The training method and optical character recognition method of text identification network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910431918.4A CN110135423A (en) | 2019-05-23 | 2019-05-23 | The training method and optical character recognition method of text identification network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110135423A true CN110135423A (en) | 2019-08-16 |
Family
ID=67572644
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910431918.4A Pending CN110135423A (en) | 2019-05-23 | 2019-05-23 | The training method and optical character recognition method of text identification network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110135423A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111242930A (en) * | 2020-01-15 | 2020-06-05 | 河北省胸科医院 | Method and device for acquiring pulmonary tuberculosis recognition model, storage medium and processor |
CN111626383A (en) * | 2020-05-29 | 2020-09-04 | Oppo广东移动通信有限公司 | Font identification method and device, electronic equipment and storage medium |
CN112613348A (en) * | 2020-12-01 | 2021-04-06 | 浙江华睿科技有限公司 | Character recognition method and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480730A (en) * | 2017-09-05 | 2017-12-15 | 广州供电局有限公司 | Power equipment identification model construction method and system, the recognition methods of power equipment |
CN108629267A (en) * | 2018-03-01 | 2018-10-09 | 南京航空航天大学 | A kind of model recognizing method based on depth residual error network |
CN108764228A (en) * | 2018-05-28 | 2018-11-06 | 嘉兴善索智能科技有限公司 | Word object detection method in a kind of image |
CN109712118A (en) * | 2018-12-11 | 2019-05-03 | 武汉三江中电科技有限责任公司 | A kind of substation isolating-switch detection recognition method based on Mask RCNN |
-
2019
- 2019-05-23 CN CN201910431918.4A patent/CN110135423A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107480730A (en) * | 2017-09-05 | 2017-12-15 | 广州供电局有限公司 | Power equipment identification model construction method and system, the recognition methods of power equipment |
CN108629267A (en) * | 2018-03-01 | 2018-10-09 | 南京航空航天大学 | A kind of model recognizing method based on depth residual error network |
CN108764228A (en) * | 2018-05-28 | 2018-11-06 | 嘉兴善索智能科技有限公司 | Word object detection method in a kind of image |
CN109712118A (en) * | 2018-12-11 | 2019-05-03 | 武汉三江中电科技有限责任公司 | A kind of substation isolating-switch detection recognition method based on Mask RCNN |
Non-Patent Citations (3)
Title |
---|
SHUZFAN: ""NMS-非极大值抑制"", 《CSDN》 * |
WATERSINK: ""非极大值抑制(nonMaximumSuppression)"", 《CSDN》 * |
康行天下: ""非极大值抑制(Non-Maximum Suppression,NMS)"", 《博客园》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111242930A (en) * | 2020-01-15 | 2020-06-05 | 河北省胸科医院 | Method and device for acquiring pulmonary tuberculosis recognition model, storage medium and processor |
CN111242930B (en) * | 2020-01-15 | 2024-05-14 | 河北省胸科医院 | Method and device for acquiring pulmonary tuberculosis recognition model, storage medium and processor |
CN111626383A (en) * | 2020-05-29 | 2020-09-04 | Oppo广东移动通信有限公司 | Font identification method and device, electronic equipment and storage medium |
CN111626383B (en) * | 2020-05-29 | 2023-11-07 | Oppo广东移动通信有限公司 | Font identification method and device, electronic equipment and storage medium |
CN112613348A (en) * | 2020-12-01 | 2021-04-06 | 浙江华睿科技有限公司 | Character recognition method and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111914944B (en) | Object detection method and system based on dynamic sample selection and loss consistency | |
WO2020177432A1 (en) | Multi-tag object detection method and system based on target detection network, and apparatuses | |
CN109919934B (en) | Liquid crystal panel defect detection method based on multi-source domain deep transfer learning | |
CN110310259A (en) | It is a kind of that flaw detection method is tied based on the wood for improving YOLOv3 algorithm | |
CN109711474A (en) | A kind of aluminium material surface defects detection algorithm based on deep learning | |
CN108898610A (en) | A kind of object contour extraction method based on mask-RCNN | |
CN108345911A (en) | Surface Defects in Steel Plate detection method based on convolutional neural networks multi-stage characteristics | |
CN113160062B (en) | Infrared image target detection method, device, equipment and storage medium | |
CN109271374A (en) | A kind of database health scoring method and scoring system based on machine learning | |
CN109272509A (en) | A kind of object detection method of consecutive image, device, equipment and storage medium | |
CN110245697B (en) | Surface contamination detection method, terminal device and storage medium | |
CN109658412A (en) | It is a kind of towards de-stacking sorting packing case quickly identify dividing method | |
CN112528845B (en) | Physical circuit diagram identification method based on deep learning and application thereof | |
CN110135423A (en) | The training method and optical character recognition method of text identification network | |
CN110120065A (en) | A kind of method for tracking target and system based on layering convolution feature and dimension self-adaption core correlation filtering | |
CN110390673A (en) | Cigarette automatic testing method based on deep learning under a kind of monitoring scene | |
CN108961358A (en) | A kind of method, apparatus and electronic equipment obtaining samples pictures | |
CN109145971A (en) | Based on the single sample learning method for improving matching network model | |
CN110827312A (en) | Learning method based on cooperative visual attention neural network | |
CN114781514A (en) | Floater target detection method and system integrating attention mechanism | |
CN114359199A (en) | Fish counting method, device, equipment and medium based on deep learning | |
CN109086781A (en) | A kind of cabinet lamp state identification method based on deep learning | |
CN112257810A (en) | Submarine biological target detection method based on improved FasterR-CNN | |
CN110544267A (en) | correlation filtering tracking method for self-adaptive selection characteristics | |
Li et al. | Improvement of YOLOv3 algorithm in workpiece detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190816 |