CN108427950A - A kind of literal line detection method and device - Google Patents

A kind of literal line detection method and device Download PDF

Info

Publication number
CN108427950A
CN108427950A CN201810102229.4A CN201810102229A CN108427950A CN 108427950 A CN108427950 A CN 108427950A CN 201810102229 A CN201810102229 A CN 201810102229A CN 108427950 A CN108427950 A CN 108427950A
Authority
CN
China
Prior art keywords
literal line
inclination
layer
matrix
angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810102229.4A
Other languages
Chinese (zh)
Other versions
CN108427950B (en
Inventor
高大帅
李健
张连毅
武卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING INFOQUICK SINOVOICE SPEECH TECHNOLOGY CORP
Beijing Sinovoice Technology Co Ltd
Original Assignee
BEIJING INFOQUICK SINOVOICE SPEECH TECHNOLOGY CORP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING INFOQUICK SINOVOICE SPEECH TECHNOLOGY CORP filed Critical BEIJING INFOQUICK SINOVOICE SPEECH TECHNOLOGY CORP
Priority to CN201810102229.4A priority Critical patent/CN108427950B/en
Publication of CN108427950A publication Critical patent/CN108427950A/en
Application granted granted Critical
Publication of CN108427950B publication Critical patent/CN108427950B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

An embodiment of the present invention provides a kind of literal line detection method and device.In embodiments of the present invention, detected simultaneously using default YOLO models position in image to be detected of literal line in image to be detected, the word that the both forward and reverse directions for the word that the angle of inclination of literal line, literal line include and literal line include languages.The embodiment of the present invention extracts the word in image without using self-adaption binaryzation method, so as to avoid reducing the accuracy in detection of literal line due to illumination or shade, without using pocket type feature classifiers to determine the words direction in line of text and word languages, avoids lower by its generalization ability and reduce the accuracy in detection of literal line.It is better than pocket type feature classifiers in the generalization ability of the YOLO models of the embodiment of the present invention, therefore, compared with the prior art, the embodiment of the present invention can improve the accuracy in detection of literal line.

Description

A kind of literal line detection method and device
Technical field
The present invention relates to field of computer technology, more particularly to a kind of literal line detection method and device.
Background technology
Currently, there is the demand being detected to the word in image in many occasions, such as acquisition includes identity card, row Sail card, the image then text informations such as name, number or position in detection image of driver's license either business card.Its In, include two Chinese characters, body comprising the multiple words for being arranged as a line in each text information, such as in name " Zhang San " Comprising 18 numbers and position by including more than two Chinese characters etc. in part card number.
Wherein, the literal line that each text information is made of multiple words, when need identify image in word When information, it usually needs first determine literal line in the picture, then use OCR (Optical Character Recognition, optical character identification) technology identification literal line in text information.
The prior art provides a kind of literal line detection method, including:It is extracted in image using self-adaption binaryzation method Word, further according to word size and location using cluster printed page analysis method generate line of text, then use pocket type feature Grader determines words direction and word languages in line of text.
However, inventor has found during realizing the embodiment of the present invention, following defect exists in the prior art:
First, in the word in extracting image using self-adaption binaryzation method, often by illumination or shade Influence lead to hiatus, alternatively, it includes non-legible noise in the word extracted to cause, and then the text detected may be led to Word row and actual literal line in image are not quite identical, to reduce the accuracy in detection of literal line.
Secondly, the generalization ability of pocket type feature classifiers is limited to the size of dictionary and its corresponding feature vector, is one The classification direction of kind out-of-order, can not characterize the structural information of image, so that the generalization ability of pocket type feature classifiers It is relatively low, thereby reduce the accuracy in detection of literal line.
Invention content
In order to solve the above technical problems, the embodiment of the present invention shows a kind of literal line detection method and device.
In a first aspect, the embodiment of the present invention shows a kind of literal line detection method, the method includes:
It obtains and presets YOLO models, obtain and preset YOLO models, the YOLO models include 24 layers of convolution storehouse, and one layer complete Whole convolution storehouse includes convolutional layer, pond layer, batch normalization and active coating, has 4 complete convolution in the YOLO models Storehouse and 20 contain only convolutional layer and the convolution storehouse of active coating, and convolution storehouse activation primitive selects line rectification unit, and It further includes 8 output convolutional layers, the 8 output convolutional layer that residual error jumper wire construction, the YOLO models are used between convolution storehouse Site layer including the layering of 1 confidence, 4 literal lines, the angle of inclination layer of 1 literal line, 1 literal line both forward and reverse directions layer And the languages layer of 1 literal line;
Described image to be detected is input in the YOLO models, the YOLO models is obtained and exports volume at described 8 The matrix that lamination exports respectively;
The matrix exported respectively according to described 8 output convolutional layers determines the literal line in described image to be detected described In the both forward and reverse directions and literal line of the word that the angle of inclination of position, literal line in image to be detected, literal line include Including word languages.
In an optional realization method, the matrix exported respectively according to described 8 output convolutional layers determines institute State position of the literal line in image to be detected in described image to be detected, the angle of inclination of literal line, literal line include Word both forward and reverse directions and the literal line word that includes languages, including:
The matrix for parsing the confidence layering output, obtains confidence point;
Judge whether the confidence point is more than the first predetermined threshold value;
If the confidence point is more than first predetermined threshold value, the matrix that 4 site layers of parsing export respectively And the matrix of the parsing angle of inclination layer output, obtain the prediction rectangle frame for including literal line;
The inclination of position and literal line of the literal line in described image to be detected is determined according to the prediction rectangle frame Angle;
The matrix for parsing the both forward and reverse directions layer output, obtains the word that the literal line in described image to be detected includes Both forward and reverse directions;
The matrix for parsing the languages layer output, obtains the language for the word that the literal line in described image to be detected includes Kind.
In an optional realization method, parsing described in the matrix and parsing that 4 site layers export respectively After the matrix of angle of inclination layer output, 1 prediction rectangle frame comprising literal line is obtained;
It is described that position and literal line of the literal line in described image to be detected are determined according to the prediction rectangle frame Angle of inclination, including:
It is literal line in described image to be detected by the location determination of the prediction rectangle frame in described image to be detected In position;
The angle of inclination of the prediction rectangle frame is determined as to the angle of inclination of literal line.
In an optional realization method, parsing described in the matrix and parsing that 4 site layers export respectively After the matrix of angle of inclination layer output, 1 prediction rectangle frame comprising literal line is obtained;
It is described that position and literal line of the literal line in described image to be detected are determined according to the prediction rectangle frame Angle of inclination, including:
Two prediction rectangle frames are selected from the multiple prediction rectangle frame;
Calculate the area of the intersection between described two prediction rectangle frames;
Calculate the sum of the area of described two prediction rectangle frames;
Calculate the ratio between the sum of the area of the intersection and the area of described two prediction rectangle frames;
Judge whether the ratio is more than the second predetermined threshold value;
If the ratio is more than second predetermined threshold value, in described two prediction rectangle frames, most by confidence point The location determination of big prediction rectangle frame is the position of literal line, and, by the inclination angle of the maximum prediction rectangle frame of confidence point Degree is determined as the angle of inclination of literal line.
Second aspect, the embodiment of the present invention show that a kind of literal line detection device, described device include:
Acquisition module, for obtaining default YOLO models, the YOLO models include 24 layers of convolution storehouse, and one layer complete Convolution storehouse includes convolutional layer, pond layer, batch normalization and active coating, has 4 complete convolution storehouses in the YOLO models Convolutional layer and the convolution storehouse of active coating are contained only with 20, convolution storehouse activation primitive selects line rectification unit, and convolution It further includes 8 output convolutional layers that residual error jumper wire construction, the YOLO models are used between storehouse, and the 8 output convolutional layer includes 1 The layering of a confidence, 4 literal lines site layer, the angle of inclination layer of 1 literal line, 1 literal line both forward and reverse directions layer and 1 The languages layer of a literal line;
Input module obtains the YOLO models and exists for described image to be detected to be input in the YOLO models The matrix that the 8 output convolutional layer exports respectively;
Determining module, the matrix for being exported respectively according to described 8 output convolutional layers determine in described image to be detected Position of the literal line in described image to be detected, the angle of inclination of literal line, the literal line word that includes positive negative side To and the literal line word that includes languages.
In an optional realization method, the determining module includes:
First resolution unit, the matrix for parsing the confidence layering output, obtains confidence point;
Judging unit, for judging whether the confidence point is more than the first predetermined threshold value;
Second resolution unit parses 4 positions if being more than first predetermined threshold value for the confidence point The matrix for matrix and parsing the angle of inclination layer output that layer exports respectively, obtains the prediction rectangle frame for including literal line;
Determination unit, for according to the prediction rectangle frame determine position of the literal line in described image to be detected and The angle of inclination of literal line;
Third resolution unit, the matrix for parsing the both forward and reverse directions layer output, obtains in described image to be detected The both forward and reverse directions for the word that literal line includes;
4th resolution unit, the matrix for parsing the languages layer output, obtains the word in described image to be detected The languages for the word that row includes.
In an optional realization method, parsing described in the matrix and parsing that 4 site layers export respectively After the matrix of angle of inclination layer output, 1 prediction rectangle frame comprising literal line is obtained;
The determination unit includes:
First determination subelement, for being word by location determination of the prediction rectangle frame in described image to be detected Position of the row in described image to be detected;
Second determination subelement, the angle of inclination for the angle of inclination of the prediction rectangle frame to be determined as to literal line.
In an optional realization method, parsing described in the matrix and parsing that 4 site layers export respectively After the matrix of angle of inclination layer output, 1 prediction rectangle frame comprising literal line is obtained;
The determination unit includes:
Subelement is selected, for selecting two prediction rectangle frames from the multiple prediction rectangle frame;
First computation subunit, the area for calculating the intersection between described two prediction rectangle frames;
Second computation subunit, the sum of the area for calculating described two prediction rectangle frames;
Third computation subunit, for calculate the area of the intersection and described two prediction rectangle frames areas it Ratio between and;
Judgment sub-unit, for judging whether the ratio is more than the second predetermined threshold value;
Third determination subelement, if being more than second predetermined threshold value for the ratio, in described two predictions In rectangle frame, the location determination by the maximum prediction rectangle frame of confidence point is the position of literal line, and, confidence point is maximum The angle of inclination of prediction rectangle frame is determined as the angle of inclination of literal line.
The third aspect, the embodiment of the present invention show a kind of electronic equipment, including memory, processor and are stored in storage On device and the computer program that can run on a processor, the processor are realized as described in relation to the first aspect when executing described program A kind of literal line detection method the step of.
Fourth aspect, the embodiment of the present invention show a kind of computer readable storage medium, the computer-readable storage It is stored with computer program on medium, a kind of text as described in relation to the first aspect is realized when the computer program is executed by processor The step of word row detection method.
Compared with prior art, the embodiment of the present invention includes following advantages:
In embodiments of the present invention, the literal line in image to be detected is detected simultaneously using default YOLO models to be detected The both forward and reverse directions and literal line for the word that the angle of inclination of position, literal line in image, literal line include include The languages of word.The embodiment of the present invention extracts the word in image without using self-adaption binaryzation method, so as to avoid The accuracy in detection that literal line is reduced due to illumination or shade is determined without pocket type feature classifiers are used in line of text Words direction and word languages, avoid lower by its generalization ability and reduce the accuracy in detection of literal line.In the present invention The generalization ability of the YOLO models of embodiment is better than pocket type feature classifiers, therefore, compared with the prior art, the embodiment of the present invention The accuracy in detection of literal line can be improved.
Description of the drawings
Fig. 1 is a kind of step flow chart of literal line detection method embodiment of the present invention;
Fig. 2 is a kind of schematic diagram of literal line of the present invention;
Fig. 3 is a kind of schematic diagram of literal line of the present invention;
Fig. 4 is a kind of structure diagram of literal line detection device embodiment of the present invention.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is described in further detail.
Referring to Fig.1, the step flow chart for showing a kind of literal line detection method embodiment of the present invention, can specifically wrap Include following steps:
It in step S101, obtains and presets YOLO models, which includes 24 layers of convolution storehouse, one layer of complete volume Product storehouse includes convolutional layer, pond layer, batch normalization and active coating, has 4 complete convolution storehouses and 20 in the YOLO models A convolution storehouse for containing only convolutional layer and active coating, convolution storehouse activation primitive select line rectification unit, and convolution storehouse Between use the residual error jumper wire construction, the YOLO models to further include 8 output convolutional layers, this 8 output convolutional layers include 1 confidence point The site layer of layer, 4 literal lines, the angle of inclination layer of 1 literal line, 1 literal line both forward and reverse directions layer and 1 literal line Languages layer;
In YOLO models, image to be detected can be divided into 16*16 grid, image to be detected can also be divided This is not limited for 32*32 grid or 8*8 grid, the embodiment of the present invention.
In embodiments of the present invention, the image for needing a large amount of unified size of synthesis in advance, is then trained using image YOLO models are trained for example, by using self-adapting random gradient descent method, and initialization learning rate is 0.00002, exercise wheel Number is 800, in order to enhance scale robustness, picture size is amplified 2 times when a wheel training, the model after convergence is 2000 Open tuning on mark image.Deep learning frame can select theano etc..Then the loss function designed based on LOSS is used Optimize the YOLO models trained, is then stored in local using finally obtained YOLO models as default YOLO models.
Total loss is equal to the angle of inclination for classifying and returning the loss of the position of literal line in the picture, literal line The loss's of the languages for the word that the loss and literal line of the both forward and reverse directions for the word that loss, literal line include include adds Power combination, such as:
Loss=lobj+0.1*lnonObj+5*lbnd+lori+lscript
Wherein lobjAnd lnonObj1 confidence for including for the Classification Loss with the presence or absence of word, corresponding 8 output convolutional layer Layering, lbndFor the site layer for 4 literal lines that the recurrence loss of minimum rotation boundary rectangle, corresponding 8 output convolutional layer include With the angle of inclination layer of 1 literal line, loriIt is lost for the both forward and reverse directions of word, 1 text that corresponding 8 output convolutional layer includes The both forward and reverse directions layer of word row, lscriptIt is languages loss, the languages layer for 1 literal line that corresponding 8 output convolutional layer includes.
In step s 102, image to be detected is input in the YOLO models, obtains the YOLO models and is exported at this 8 The matrix that convolutional layer exports respectively;
In step s 103, the matrix exported respectively according to this 8 output convolutional layers determines the word in image to be detected The both forward and reverse directions and word for the word that position of the row in image to be detected, the angle of inclination of literal line, literal line include The languages for the word that row includes.
In embodiments of the present invention, image to be detected is rectangle.Literal line includes multiple words of a line arrangement, each Angle between the horizontal edge of line segment and image to be detected that the central point of word is linked to be is the angle of inclination of literal line.In literal line Including the languages of word include Chinese, English, Japanese, Korean, Latin Russian and Russian etc..What literal line included The both forward and reverse directions of word are that word is positive or anti-, for example, the square direction for the word that literal line shown in Fig. 2 includes The square direction for the word for including for forward direction, literal line shown in Fig. 3 is reversed.
Specifically, the matrix that can parse confidence layering output, obtains confidence point, then judges whether the confidence point is big In the first predetermined threshold value, if the confidence point is more than first predetermined threshold value, the matrix that 4 site layers of parsing export respectively And the matrix of angle of inclination layer output is parsed, the prediction rectangle frame for including literal line is obtained, then according to the prediction square Shape frame determines the angle of inclination of position and literal line of the literal line in described image to be detected, then parses the both forward and reverse directions layer The matrix of output obtains the both forward and reverse directions for the word that the literal line in image to be detected includes, and to parse the languages layer defeated The matrix gone out obtains the languages for the word that the literal line in image to be detected includes.
In an embodiment of the invention, the matrix and parse the inclination angle that 4 site layers export respectively are being parsed It spends after the matrix of layer output, if obtaining 1 prediction rectangle frame comprising literal line, text is determined according to the prediction rectangle frame The angle of inclination of position and literal line of the word row in image to be detected, Ke Yiwei:By the prediction rectangle frame in mapping to be checked Location determination as in is position of the literal line in image to be detected, and, the angle of inclination of the prediction rectangle frame is determined For the angle of inclination of literal line.
In an alternative embodiment of the invention, the matrix and parse the inclination angle that 4 site layers export respectively are being parsed It spends after the matrix of layer output, if obtaining including multiple prediction rectangle frames of literal line, is determined according to the prediction rectangle frame The angle of inclination of position and literal line of the literal line in image to be detected, Ke Yiwei:It is selected from multiple prediction rectangle frames Two prediction rectangle frames, calculate the area of the intersection between described two prediction rectangle frames, then calculate described two predictions Then the sum of area of rectangle frame calculates between the sum of the area of the intersection and the area of described two prediction rectangle frames Ratio, judge later the ratio whether be more than the second predetermined threshold value;If the ratio is more than second predetermined threshold value, at this In two prediction rectangle frames, the location determination by the maximum prediction rectangle frame of confidence point is the position of literal line, and, by confidence The angle of inclination for dividing the angle of inclination of maximum prediction rectangle frame to be determined as literal line.
If the ratio is less than or equal to second predetermined threshold value, two prediction rectangle frames are merged into new prediction square Shape frame predicts rectangle frame and area minimum new prediction rectangle frame for example, creating one and can include simultaneously this two, Then the prediction rectangle frame of reselection one from the remaining prediction rectangle frames in multiple prediction rectangle frames again, then by new prediction Rectangle frame and a prediction rectangle frame of selection continue aforesaid operations, and it is not described here in detail for detailed process.
In embodiments of the present invention, the literal line in image to be detected is detected simultaneously using default YOLO models to be detected The both forward and reverse directions and literal line for the word that the angle of inclination of position, literal line in image, literal line include include The languages of word.The embodiment of the present invention extracts the word in image without using self-adaption binaryzation method, so as to avoid The accuracy in detection that literal line is reduced due to illumination or shade is determined without pocket type feature classifiers are used in line of text Words direction and word languages, avoid lower by its generalization ability and reduce the accuracy in detection of literal line.In the present invention The generalization ability of the YOLO models of embodiment is better than pocket type feature classifiers, therefore, compared with the prior art, the embodiment of the present invention The accuracy in detection of literal line can be improved.
It should be noted that for embodiment of the method, for simple description, therefore it is all expressed as a series of action group It closes, but those skilled in the art should understand that, the embodiment of the present invention is not limited by the described action sequence, because according to According to the embodiment of the present invention, certain steps can be performed in other orders or simultaneously.Secondly, those skilled in the art also should Know, embodiment described in this description belongs to preferred embodiment, and the involved action not necessarily present invention is implemented Necessary to example.
With reference to Fig. 4, a kind of structure diagram of literal line detection device embodiment of the present invention is shown, which specifically can be with Including following module:
Acquisition module 11, for obtaining default YOLO models, the YOLO models include 24 layers of convolution storehouse, and one layer complete Convolution storehouse include convolutional layer, pond layer, batch normalize and active coating, have 4 complete convolution heaps in the YOLO models Stack and 20 contain only convolutional layer and the convolution storehouse of active coating, and convolution storehouse activation primitive selects line rectification unit, and rolls up It further includes 8 output convolutional layers, the 8 output convolutional layer packet that residual error jumper wire construction, the YOLO models are used between product storehouse Include the layering of 1 confidence, 4 literal lines site layer, the angle of inclination layer of 1 literal line, 1 literal line both forward and reverse directions layer with And the languages layer of 1 literal line;
Input module 12 obtains the YOLO models for described image to be detected to be input in the YOLO models The matrix that convolutional layer exports respectively is exported at described 8;
Determining module 13, the matrix for being exported respectively according to described 8 output convolutional layers determine described image to be detected In position of the literal line in described image to be detected, the angle of inclination of literal line, the literal line word that includes it is positive and negative The languages for the word that direction and literal line include.
In an optional realization method, the determining module 13 includes:
First resolution unit, the matrix for parsing the confidence layering output, obtains confidence point;
Judging unit, for judging whether the confidence point is more than the first predetermined threshold value;
Second resolution unit parses 4 positions if being more than first predetermined threshold value for the confidence point The matrix for matrix and parsing the angle of inclination layer output that layer exports respectively, obtains the prediction rectangle frame for including literal line;
Determination unit, for according to the prediction rectangle frame determine position of the literal line in described image to be detected and The angle of inclination of literal line;
Third resolution unit, the matrix for parsing the both forward and reverse directions layer output, obtains in described image to be detected The both forward and reverse directions for the word that literal line includes;
4th resolution unit, the matrix for parsing the languages layer output, obtains the word in described image to be detected The languages for the word that row includes.
In an optional realization method, parsing described in the matrix and parsing that 4 site layers export respectively After the matrix of angle of inclination layer output, 1 prediction rectangle frame comprising literal line is obtained;
The determination unit includes:
First determination subelement, for being word by location determination of the prediction rectangle frame in described image to be detected Position of the row in described image to be detected;
Second determination subelement, the angle of inclination for the angle of inclination of the prediction rectangle frame to be determined as to literal line.
In an optional realization method, parsing described in the matrix and parsing that 4 site layers export respectively After the matrix of angle of inclination layer output, 1 prediction rectangle frame comprising literal line is obtained;
The determination unit includes:
Subelement is selected, for selecting two prediction rectangle frames from the multiple prediction rectangle frame;
First computation subunit, the area for calculating the intersection between described two prediction rectangle frames;
Second computation subunit, the sum of the area for calculating described two prediction rectangle frames;
Third computation subunit, for calculate the area of the intersection and described two prediction rectangle frames areas it Ratio between and;
Judgment sub-unit, for judging whether the ratio is more than the second predetermined threshold value;
Third determination subelement, if being more than second predetermined threshold value for the ratio, in described two predictions In rectangle frame, the location determination by the maximum prediction rectangle frame of confidence point is the position of literal line, and, confidence point is maximum The angle of inclination of prediction rectangle frame is determined as the angle of inclination of literal line.
In embodiments of the present invention, the literal line in image to be detected is detected simultaneously using default YOLO models to be detected The both forward and reverse directions and literal line for the word that the angle of inclination of position, literal line in image, literal line include include The languages of word.The embodiment of the present invention extracts the word in image without using self-adaption binaryzation method, so as to avoid The accuracy in detection that literal line is reduced due to illumination or shade is determined without pocket type feature classifiers are used in line of text Words direction and word languages, avoid lower by its generalization ability and reduce the accuracy in detection of literal line.In the present invention The generalization ability of the YOLO models of embodiment is better than pocket type feature classifiers, therefore, compared with the prior art, the embodiment of the present invention The accuracy in detection of literal line can be improved.
For device embodiments, since it is basically similar to the method embodiment, so fairly simple, the correlation of description Place illustrates referring to the part of embodiment of the method.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with The difference of other embodiment, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can be provided as method, apparatus or calculate Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.
The embodiment of the present invention be with reference to according to the method for the embodiment of the present invention, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructions In each flow and/or block and flowchart and/or the block diagram in flow and/or box combination.These can be provided Computer program instructions are set to all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is to generate a machine so that is held by the processor of computer or other programmable data processing terminal equipments Capable instruction generates for realizing in one flow of flow chart or multiple flows and/or one box of block diagram or multiple boxes The device of specified function.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing terminal equipments In computer-readable memory operate in a specific manner so that instruction stored in the computer readable memory generates packet The manufacture of command device is included, which realizes in one flow of flow chart or multiple flows and/or one side of block diagram The function of being specified in frame or multiple boxes.
These computer program instructions can be also loaded into computer or other programmable data processing terminal equipments so that Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus The instruction executed on computer or other programmable terminal equipments is provided for realizing in one flow of flow chart or multiple flows And/or in one box of block diagram or multiple boxes specify function the step of.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrap Those elements are included, but also include other elements that are not explicitly listed, or further include for this process, method, article Or the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limited Element, it is not excluded that there is also other identical elements in process, method, article or the terminal device including the element.
Above to a kind of literal line detection method and device provided by the present invention, it is described in detail, answers herein With specific case, principle and implementation of the present invention are described, and the explanation of above example is only intended to help to manage Solve the method and its core concept of the present invention;Meanwhile for those of ordinary skill in the art, according to the thought of the present invention, There will be changes in specific implementation mode and application range, in conclusion the content of the present specification should not be construed as to this hair Bright limitation.

Claims (10)

1. a kind of literal line detection method, which is characterized in that the method includes:
It obtains and presets YOLO models, the YOLO models include 24 layers of convolution storehouse, and one layer of complete convolution storehouse includes convolution Layer, pond layer, batch normalization and active coating, there is 4 complete convolution storehouses in the YOLO models and 20 contain only convolution The convolution storehouse of layer and active coating, convolution storehouse activation primitive selects line rectification unit, and is jumped using residual error between convolution storehouse Cable architecture, the YOLO models further include 8 output convolutional layers, and the 8 output convolutional layer includes 1 confidence layering, 4 texts The site layer of word row, the angle of inclination layer of 1 literal line, the both forward and reverse directions layer of 1 literal line and 1 literal line languages Layer;
Described image to be detected is input in the YOLO models, the YOLO models is obtained and exports convolutional layer at described 8 The matrix exported respectively;
The matrix exported respectively according to described 8 output convolutional layers determines the literal line in described image to be detected described to be checked The both forward and reverse directions and literal line for the word that the angle of inclination of position, literal line in altimetric image, literal line include include Word languages.
2. according to the method described in claim 1, it is characterized in that, described export what convolutional layers exported respectively according to described 8 Matrix determines position of the literal line in described image to be detected in described image to be detected, the angle of inclination of literal line, text The languages for the word that the both forward and reverse directions and literal line for the word that word row includes include, including:
The matrix for parsing the confidence layering output, obtains confidence point;
Judge whether the confidence point is more than the first predetermined threshold value;
If the confidence point is more than first predetermined threshold value, matrix that 4 site layers of parsing export respectively and The matrix for parsing the angle of inclination layer output, obtains the prediction rectangle frame for including literal line;
The angle of inclination of position and literal line of the literal line in described image to be detected is determined according to the prediction rectangle frame;
The matrix for parsing both forward and reverse directions layer output is obtaining word that the literal line in described image to be detected includes just Negative direction;
The matrix for parsing the languages layer output, obtains the languages for the word that the literal line in described image to be detected includes.
3. according to the method described in claim 2, it is characterized in that, parse the matrix that exports respectively of 4 site layers with And after the matrix of the parsing angle of inclination layer output, 1 prediction rectangle frame comprising literal line is obtained;
The inclination that position and literal line of the literal line in described image to be detected are determined according to the prediction rectangle frame Angle, including:
It is literal line in described image to be detected by the location determination of the prediction rectangle frame in described image to be detected Position;
The angle of inclination of the prediction rectangle frame is determined as to the angle of inclination of literal line.
4. according to the method described in claim 2, it is characterized in that, parse the matrix that exports respectively of 4 site layers with And after the matrix of the parsing angle of inclination layer output, 1 prediction rectangle frame comprising literal line is obtained;
The inclination that position and literal line of the literal line in described image to be detected are determined according to the prediction rectangle frame Angle, including:
Two prediction rectangle frames are selected from the multiple prediction rectangle frame;
Calculate the area of the intersection between described two prediction rectangle frames;
Calculate the sum of the area of described two prediction rectangle frames;
Calculate the ratio between the sum of the area of the intersection and the area of described two prediction rectangle frames;
Judge whether the ratio is more than the second predetermined threshold value;
It is in described two prediction rectangle frames, confidence point is maximum if the ratio is more than second predetermined threshold value Predict that the location determination of rectangle frame is the position of literal line, and, the angle of inclination of the maximum prediction rectangle frame of confidence point is true It is set to the angle of inclination of literal line.
5. a kind of literal line detection device, which is characterized in that described device includes:
Acquisition module, for obtaining default YOLO models, the YOLO models include 24 layers of convolution storehouse, one layer of complete convolution Storehouse includes convolutional layer, pond layer, batch normalization and active coating, has 4 complete convolution storehouses and 20 in the YOLO models A convolution storehouse for containing only convolutional layer and active coating, convolution storehouse activation primitive select line rectification unit, and convolution storehouse Between use residual error jumper wire construction, the YOLO models further include 8 output convolutional layers, it is described 8 output convolutional layer set including 1 Letter layering, 4 literal lines site layer, the angle of inclination layer of 1 literal line, 1 literal line both forward and reverse directions layer and 1 text The languages layer of word row;
Input module obtains the YOLO models described 8 for described image to be detected to be input in the YOLO models The matrix that a output convolutional layer exports respectively;
Determining module, the matrix for being exported respectively according to described 8 output convolutional layers determine the text in described image to be detected The both forward and reverse directions for the word that position of the word row in described image to be detected, the angle of inclination of literal line, literal line include with And the languages of the literal line word that includes.
6. device according to claim 5, which is characterized in that the determining module includes:
First resolution unit, the matrix for parsing the confidence layering output, obtains confidence point;
Judging unit, for judging whether the confidence point is more than the first predetermined threshold value;
Second resolution unit parses 4 site layers point if being more than first predetermined threshold value for the confidence point The matrix for matrix and parsing the angle of inclination layer output not exported, obtains the prediction rectangle frame for including literal line;
Determination unit, for determining position and word of the literal line in described image to be detected according to the prediction rectangle frame Capable angle of inclination;
Third resolution unit, the matrix for parsing the both forward and reverse directions layer output, obtains the word in described image to be detected The both forward and reverse directions for the word that row includes;
4th resolution unit, the matrix for parsing the languages layer output, obtains in the literal line in described image to be detected Including word languages.
7. device according to claim 6, which is characterized in that parse the matrix that exports respectively of 4 site layers with And after the matrix of the parsing angle of inclination layer output, 1 prediction rectangle frame comprising literal line is obtained;
The determination unit includes:
First determination subelement, for being that literal line exists by location determination of the prediction rectangle frame in described image to be detected Position in described image to be detected;
Second determination subelement, the angle of inclination for the angle of inclination of the prediction rectangle frame to be determined as to literal line.
8. device according to claim 6, which is characterized in that parse the matrix that exports respectively of 4 site layers with And after the matrix of the parsing angle of inclination layer output, 1 prediction rectangle frame comprising literal line is obtained;
The determination unit includes:
Subelement is selected, for selecting two prediction rectangle frames from the multiple prediction rectangle frame;
First computation subunit, the area for calculating the intersection between described two prediction rectangle frames;
Second computation subunit, the sum of the area for calculating described two prediction rectangle frames;
Third computation subunit, for calculate the sum of area of the intersection and areas of described two prediction rectangle frames it Between ratio;
Judgment sub-unit, for judging whether the ratio is more than the second predetermined threshold value;
Third determination subelement, if being more than second predetermined threshold value for the ratio, in described two prediction rectangles In frame, the location determination by the maximum prediction rectangle frame of confidence point is the position of literal line, and, by the maximum prediction of confidence point The angle of inclination of rectangle frame is determined as the angle of inclination of literal line.
9. a kind of electronic equipment, including memory, processor and storage are on a memory and the calculating that can run on a processor Machine program, which is characterized in that the processor realizes according to any one of claims 1 to 4 one when executing described program The step of kind literal line detection method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes a kind of literal line inspection according to any one of claims 1 to 4 when the computer program is executed by processor The step of survey method.
CN201810102229.4A 2018-02-01 2018-02-01 Character line detection method and device Active CN108427950B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810102229.4A CN108427950B (en) 2018-02-01 2018-02-01 Character line detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810102229.4A CN108427950B (en) 2018-02-01 2018-02-01 Character line detection method and device

Publications (2)

Publication Number Publication Date
CN108427950A true CN108427950A (en) 2018-08-21
CN108427950B CN108427950B (en) 2021-02-19

Family

ID=63156322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810102229.4A Active CN108427950B (en) 2018-02-01 2018-02-01 Character line detection method and device

Country Status (1)

Country Link
CN (1) CN108427950B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409363A (en) * 2018-10-13 2019-03-01 长沙芯希电子科技有限公司 The reverse judgement of text image based on content and bearing calibration
CN109508710A (en) * 2018-10-23 2019-03-22 东华大学 Based on the unmanned vehicle night-environment cognitive method for improving YOLOv3 network
CN110135411A (en) * 2019-04-30 2019-08-16 北京邮电大学 Business card identification method and device
CN110163205A (en) * 2019-05-06 2019-08-23 网易有道信息技术(北京)有限公司 Image processing method, device, medium and calculating equipment
CN110211048A (en) * 2019-05-28 2019-09-06 湖北华中电力科技开发有限责任公司 A kind of complicated archival image Slant Rectify method based on convolutional neural networks
CN110674811A (en) * 2019-09-04 2020-01-10 广东浪潮大数据研究有限公司 Image recognition method and device
CN110751232A (en) * 2019-11-04 2020-02-04 哈尔滨理工大学 Chinese complex scene text detection and identification method
CN111062374A (en) * 2019-12-10 2020-04-24 爱信诺征信有限公司 Identification method, device, system, equipment and readable medium of identity card information
CN111353491A (en) * 2020-03-12 2020-06-30 中国建设银行股份有限公司 Character direction determining method, device, equipment and storage medium
CN111797838A (en) * 2019-04-08 2020-10-20 上海怀若智能科技有限公司 Blind denoising system, method and device for picture documents
CN112418238A (en) * 2020-12-09 2021-02-26 安徽吉秒科技有限公司 Image character recognition method and device
CN112651399A (en) * 2020-12-30 2021-04-13 中国平安人寿保险股份有限公司 Method for detecting same-line characters in oblique image and related equipment thereof
CN112766266A (en) * 2021-01-29 2021-05-07 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN113313117A (en) * 2021-06-25 2021-08-27 北京奇艺世纪科技有限公司 Method and device for recognizing text content
CN113785305A (en) * 2019-05-05 2021-12-10 华为技术有限公司 Method, device and equipment for detecting inclined characters

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN105809164A (en) * 2016-03-11 2016-07-27 北京旷视科技有限公司 Character identification method and device
CN107609560A (en) * 2017-09-27 2018-01-19 北京小米移动软件有限公司 Character recognition method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN105809164A (en) * 2016-03-11 2016-07-27 北京旷视科技有限公司 Character identification method and device
CN107609560A (en) * 2017-09-27 2018-01-19 北京小米移动软件有限公司 Character recognition method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JOSEPH REDMON等: "YOLO9000: Better, Faster, Stronger", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
JOSEPH REDMON等: "You Only Look Once:Unified, Real-Time Object Detection", 《ARXIV PREPRINT ARXIV》 *
丁明宇等: "基于深度学习的图片中商品参数识别方法", 《软件学报》 *
叶虎: "YOLO算法的原理与实现", 《练数成金HTTP://WWW.DATAGURU.CN/ARTICLE-12966-1.HTML》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409363B (en) * 2018-10-13 2021-11-12 长沙芯希电子科技有限公司 Content-based text image inversion judgment and correction method
CN109409363A (en) * 2018-10-13 2019-03-01 长沙芯希电子科技有限公司 The reverse judgement of text image based on content and bearing calibration
CN109508710A (en) * 2018-10-23 2019-03-22 东华大学 Based on the unmanned vehicle night-environment cognitive method for improving YOLOv3 network
CN111797838A (en) * 2019-04-08 2020-10-20 上海怀若智能科技有限公司 Blind denoising system, method and device for picture documents
CN110135411B (en) * 2019-04-30 2021-09-10 北京邮电大学 Business card recognition method and device
CN110135411A (en) * 2019-04-30 2019-08-16 北京邮电大学 Business card identification method and device
CN113785305B (en) * 2019-05-05 2024-04-16 华为云计算技术有限公司 Method, device and equipment for detecting inclined characters
CN113785305A (en) * 2019-05-05 2021-12-10 华为技术有限公司 Method, device and equipment for detecting inclined characters
CN110163205A (en) * 2019-05-06 2019-08-23 网易有道信息技术(北京)有限公司 Image processing method, device, medium and calculating equipment
CN110211048B (en) * 2019-05-28 2020-06-16 国家电网有限公司 Complex archive image tilt correction method based on convolutional neural network
CN110211048A (en) * 2019-05-28 2019-09-06 湖北华中电力科技开发有限责任公司 A kind of complicated archival image Slant Rectify method based on convolutional neural networks
CN110674811A (en) * 2019-09-04 2020-01-10 广东浪潮大数据研究有限公司 Image recognition method and device
CN110751232A (en) * 2019-11-04 2020-02-04 哈尔滨理工大学 Chinese complex scene text detection and identification method
CN111062374A (en) * 2019-12-10 2020-04-24 爱信诺征信有限公司 Identification method, device, system, equipment and readable medium of identity card information
CN111353491A (en) * 2020-03-12 2020-06-30 中国建设银行股份有限公司 Character direction determining method, device, equipment and storage medium
CN111353491B (en) * 2020-03-12 2024-04-26 中国建设银行股份有限公司 Text direction determining method, device, equipment and storage medium
CN112418238A (en) * 2020-12-09 2021-02-26 安徽吉秒科技有限公司 Image character recognition method and device
CN112651399A (en) * 2020-12-30 2021-04-13 中国平安人寿保险股份有限公司 Method for detecting same-line characters in oblique image and related equipment thereof
CN112651399B (en) * 2020-12-30 2024-05-14 中国平安人寿保险股份有限公司 Method for detecting same-line characters in inclined image and related equipment thereof
CN112766266A (en) * 2021-01-29 2021-05-07 云从科技集团股份有限公司 Text direction correction method, system and device based on staged probability statistics
CN113313117A (en) * 2021-06-25 2021-08-27 北京奇艺世纪科技有限公司 Method and device for recognizing text content
CN113313117B (en) * 2021-06-25 2023-07-25 北京奇艺世纪科技有限公司 Method and device for identifying text content

Also Published As

Publication number Publication date
CN108427950B (en) 2021-02-19

Similar Documents

Publication Publication Date Title
CN108427950A (en) A kind of literal line detection method and device
CN112685565B (en) Text classification method based on multi-mode information fusion and related equipment thereof
CN107291822B (en) Problem classification model training method, classification method and device based on deep learning
CN115035538B (en) Training method of text recognition model, and text recognition method and device
KR20190126347A (en) Efficient Image Analysis Using Sensor Data
CN111488826A (en) Text recognition method and device, electronic equipment and storage medium
CN111709406A (en) Text line identification method and device, readable storage medium and electronic equipment
CN111242291A (en) Neural network backdoor attack detection method and device and electronic equipment
US20230137337A1 (en) Enhanced machine learning model for joint detection and multi person pose estimation
CN115422389B (en) Method and device for processing text image and training method of neural network
CN116311214B (en) License plate recognition method and device
CN114863429A (en) Text error correction method and training method based on RPA and AI and related equipment thereof
KR20210065076A (en) Method, apparatus, device, and storage medium for obtaining document layout
WO2021237227A1 (en) Method and system for multi-language text recognition model with autonomous language classification
CN113033660A (en) Universal language detection method, device and equipment
CN114639087A (en) Traffic sign detection method and device
CN110825874A (en) Chinese text classification method and device and computer readable storage medium
CN115578739A (en) Training method and device for realizing IA classification model by combining RPA and AI
US20230036812A1 (en) Text Line Detection
CN113204665A (en) Image retrieval method, image retrieval device, electronic equipment and computer-readable storage medium
CN110688511A (en) Fine-grained image retrieval method and device, computer equipment and storage medium
CN113033518B (en) Image detection method, image detection device, electronic equipment and storage medium
CN114140802B (en) Text recognition method and device, electronic equipment and storage medium
US11769323B2 (en) Generating assistive indications based on detected characters
CN115761752A (en) Natural scene text detection model training method and device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant