CN104408449B - Intelligent mobile terminal scene literal processing method - Google Patents

Intelligent mobile terminal scene literal processing method Download PDF

Info

Publication number
CN104408449B
CN104408449B CN201410581464.6A CN201410581464A CN104408449B CN 104408449 B CN104408449 B CN 104408449B CN 201410581464 A CN201410581464 A CN 201410581464A CN 104408449 B CN104408449 B CN 104408449B
Authority
CN
China
Prior art keywords
text filed
candidate
stroke width
pixel
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410581464.6A
Other languages
Chinese (zh)
Other versions
CN104408449A (en
Inventor
卢朝阳
李静
刘晓佩
姜维
通天意
汪文芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XIDIAN-NINGBO INFORMATION TECHNOLOGY INSTITUTE
Original Assignee
XIDIAN-NINGBO INFORMATION TECHNOLOGY INSTITUTE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XIDIAN-NINGBO INFORMATION TECHNOLOGY INSTITUTE filed Critical XIDIAN-NINGBO INFORMATION TECHNOLOGY INSTITUTE
Priority to CN201410581464.6A priority Critical patent/CN104408449B/en
Publication of CN104408449A publication Critical patent/CN104408449A/en
Application granted granted Critical
Publication of CN104408449B publication Critical patent/CN104408449B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)

Abstract

The present invention relates to a kind of intelligent mobile terminal scene literal processing method, including step 1:Text rough detection based on edge;The text filed progress stroke width of each candidate and geometrical Characteristics Analysis in the stroke width figure T of step 2, acquisition input scene image I, set S text filed to candidate, reject undesirable non-textual region, finally export positioning result figure L1;Step 3, identification pretreatment;Step 4, the monocase after cutting is standardized and the extraction of directional element features operation;Step 5, the disaggregated classification based on Gabor characteristic.Compared with prior art, the advantage of the invention is that:Accuracy rate, which has, significantly to be lifted, and recall rate is higher, is had on time performance and is substantially improved, and the accuracy rate of character recognition is substantially improved.

Description

Intelligent mobile terminal scene literal processing method
Technical field
The present invention relates to the type mode in pattern-recognition to identify field, more particularly to intelligent mobile terminal scene word Processing method, the scene word for being shot to intelligent mobile terminal are identified.
Background technology
With developing rapidly for information technology, pattern-recognition is widely used and paid attention in many sciemtifec and technical spheres, Such as artificial intelligence, medical science, Neurobiology, weapon manufacture, navigation field.In these areas, common application has fingerprint Identification, recognition of face, optical character identification, Text region, precise guidance, fault detect, speech recognition and translation etc..Pattern is known The high speed development of other technology and extensive use, are greatly promoted the development of the national economy and science and techniques of defence modernization construction.
Word processing is an important branch of area of pattern recognition.In real world, people be unable to do without word, natural field The processing of scape word is always one of hot issue in pattern-recognition.Since the nineties in last century, international documentation analysis With identification meeting (International Conference of Document Analysis and Recognition, ICDAR) held once every 2 years, be greatly promoted the development of word processing technology.
With the popularization and development of mobile intelligent terminal, smart mobile phone increasingly obtains people with its exclusive convenient and intelligence Dote on.In daily life, it is seen that oneself word interested, can be shot into figure using the mobile phone of oneself at any time Piece, text information therein is then extracted, can so remove the trouble of people's handwriting input from, make the life of people more convenient. Meanwhile the word processing on mobile terminal can also be applied to other multiple fields, such as the guideboard in identification street, with reference to GPS Positioning, can give blind man navigation;License plate is identified, can more facilitate traffic police's management and record information;Extract shop doorplate Text information and translate into language known to user, their travellings abroad etc. can be facilitated.Therefore, in smart mobile phone Enterprising style of writing word processing has great application prospect.
However, realizing that above-mentioned application has larger technological challenge on smart mobile phone, following two aspect is mainly reflected in: On the one hand, the diversity of word and uncertainty cause it is abnormal tired to become the word processing in natural scene in natural scene It is difficult;On the other hand, CPU, GPU of smart mobile phone limitation, the degree of accuracy and real-time to literal processing method propose higher Requirement.
To sum up, natural scene word processing is always a difficult point of field of image recognition, especially on smart mobile phone Word processing is carried out, carrying out the development studied based on the scene word processing on smart mobile phone to artificial intelligence has actual anticipate Justice, the informatization to China also play an important roll.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of intelligent mobile terminal scene text for above-mentioned prior art Word processing method, this method have taken into account speed and accuracy rate, are adapted to use in mobile platform.
Technical scheme is used by the present invention solves above-mentioned technical problem:A kind of intelligent mobile terminal scene word processing Method, it is characterised in that:Comprise the following steps:
Step 1:Text rough detection based on edge, is specifically included:
(1-1), using color image edge detection method in intelligent mobile terminal input scene image I carry out edge Detection, obtains the first intermediate processed images;
(1-2), morphology operations are carried out to the first intermediate processed images, connect the fracture text in the first intermediate processed images Word and adjacent word, obtain the second intermediate processed images;
(1-3), by finding the method for connected domain the second intermediate processed images are handled, so as to obtain input scene The image I text filed set S of candidate;
Each candidate in the stroke width figure T of step 2, acquisition input scene image I, set S text filed to candidate Text filed progress stroke width and geometrical Characteristics Analysis, reject undesirable non-textual region, finally output positioning knot Fruit schemes L1;
Step 3, identification pretreatment, are specifically included:
(3-1), the text filed carry out contrast enhancing to positioning result figure L1;
(3-2), to enhanced text filed carry out medium filtering;
(3-3), to the text filed carry out binaryzation after medium filtering;
(3-4), to the text filed carry out character cutting after binaryzation;
Step 4, the monocase after cutting is standardized and the extraction of directional element features operation, specifically include:
(4-1), each character after cutting is cut, the white background gone around dropping character, to every after cutting out Its size is uniformly transformed to N × N by width character picture using bilinear interpolation;
The profile of single character after (4-2), extraction uniform sizes, and calculate its directional element features;
(4-3), using distance classifier each character is identified, obtains the immediate X character of each character;
Step 5, the disaggregated classification based on Gabor characteristic, are specifically included:
(5-1), its size is uniformly transformed to M × M by each character using bilinear interpolation.
(5-2), Gabor transformation is carried out to the character after uniform sizes, extract Gabor characteristic;
(5-3), on the basis of obtaining the immediate X character of each character after (4-3) identification, distance classifier is utilized It is identified again, draws the recognition result of each character.
As an improvement, the step 2 specifically includes:
(2-1), using Canny edge detection methods to input scene image I carry out rim detection, obtain input scene figure As I edge graph, while record the gradient direction of each edge pixel point;
(2-2), stroke width conversion is carried out to edge pixel:
(2-2-1), assume that p is an edge pixel point, if dp is edge pixel point p gradient direction, according to dp directions Another matched edge pixel point is found in the edge pixel point of edge graph along route r=p+ndp (n≤0) Q, if dq is edge pixel point q gradient direction, dq and dp be in opposite direction or dq=-dp ± pi/2s;
It is not opposite, route r=p that if p, which does not find matched pixel q or dq with dp direction, + ndp (n≤0) goes out of use, it is necessary to reselect new edge pixel point p and find edge pixel point q on the other side;
If finding the pixel q to match, the stroke width of each pixel corresponded on [p, q] this route Value is each specified as | | p-q | |, | | p-q | | the Euclidean distance between pixel p and pixel q, if [p, q] this route On pixel had a stroke width value S, then take S with | | p-q | | in a less value as the pixel Actual stroke width value;
(2-2-2), (2-2-1) is repeated, the stroke width value until calculating pixel on all routes not gone out of use;
(2-2-3), again traversal either with or without the route being dropped, the stroke for calculating whole pixels on each route is wide Average M is spent, then finds out the pixel that all stroke width values on this route are more than M, then the stroke width value these pixels M is set to, after all routes travel through, finally gives input scene image I stroke width figure T;
It is corresponding to find step 1 (2-3), on the basis of the stroke width figure T for the input scene image I that step (2-2) obtains The obtained text filed set S of candidate, then set S text filed to candidate screen, screening rule is as follows:
(2-3-a), by the text filed rejecting of candidate of the Aspect Ratio not between 0.1 to 10;
(2-3-b), by character duration not between W/20 and W pixel, candidate text of the height not between H/20 and H Region rejecting, wherein W and H represent the width and height of image respectively;
(2-3-c), by area be less than 20 pixels the text filed deletion of candidate;
(2-3-d), set S text filed to candidate carry out binaryzation, calculate the ratio Rb shared by black pixel point, will be black The text filed rejecting of candidates of the ratio Rb not between 0.2 and 0.8 shared by colour vegetarian refreshments, Rb definition are
Wherein, f (i, j) represent be (i, j) position in the text filed image of candidate pixel value, w, what h was represented respectively It is the text filed width of candidate and height, what ⊕ was represented is XOR;
(2-3-e), set S text filed to candidate carry out binaryzation, the intercrossing Rcc in the region are calculated, by intercrossing The text filed rejecting of candidates of the Rcc not between 0.05 and 0.6, intercrossing Rcc definition are:
Wherein, what f (i, j) was represented is the pixel value of (i, j) position in the text filed image of candidate, what f (i, j+1) was represented It is the pixel value of (i, j+1) position in the text filed image of candidate, what w, h were represented respectively is candidate's text filed width and height Degree, what ⊕ was represented is XOR;
(2-3-f), set S text filed to candidate carry out stroke width conversion, obtain all candidates are text filed First stroke width figure, stroke width conversion will be carried out again after the text filed set S inverses of candidate, obtained all candidate's texts The second stroke width figure in region, if in text filed the first stroke width figure and the second stroke width figure of a certain candidate, Stroke width variance more than stroke width average value half, and the stroke width ratio of adjacent pixel is more than 3.0, then By the text filed rejecting of the candidate;
(2-4), text detection export:After the screening of (2-3), obtain it is final text filed, then according to each Text filed position relationship, it is ranked up and numbers according to rule from top to bottom, from left to right, will after sequence is completed Text area exports.
Preferably, the text filed of positioning result figure L1 is carried out pair using algorithm of histogram equalization in described (3-1) Strengthen than degree;Medium filtering is carried out to enhanced region using 3 × 3 rectangular slide templates in (3-2), i.e., using 3 × 3 Rectangular slide template, the pixel in template is ranked up according to the size of pixel value, generates the two dimension of monotone increasing or decline Data sequence, then the value with each pixel in the intermediate value replacement template of this group, are then exported;The step (3-3) is using maximum Ostu method carries out binaryzation to the region after medium filtering.
Compared with prior art, the advantage of the invention is that:
(1), the present invention is compared with the Method for text detection for being based purely on edge, and accuracy rate, which has, significantly to be lifted, and this is Because present invention employs being screened based on stroke width conversion to candidate regions, it is uneven effectively to eliminate many stroke widths Even non-textual region, so as to reduce text filed false drop rate;Stroke width Method for text detection of the invention and simple Compare, recall rate is higher, because the present invention uses the text detection algorithm based on edge as rough detection;
(2), the present invention has small size decline, but have on time performance compared with Gabor characteristic identifies on recognition performance It is substantially improved, the recognition time of single character averagely shortens about 41%, because present invention employs directional line element feature work Candidate characters roughing has been subjected to for thick feature;The present invention is compared with individually using to linear element feature, the accuracy rate of character recognition It is substantially improved, because being enhanced present invention employs Gabor characteristic as thin feature to the separating capacity of character;Therefore The present invention fully combines the two-fold advantage of the rapidity of extraction directional element features and the accuracy of Gabor characteristic identification;Compared with Speed and accuracy rate have been taken into account well, therefore are more suitable for using in mobile platform.
Brief description of the drawings
Fig. 1 is intelligent mobile terminal scene literal processing method flow chart in the embodiment of the present invention.
Embodiment
The present invention is described in further detail below in conjunction with accompanying drawing embodiment.
The invention provides a kind of intelligent mobile terminal scene literal processing method, it comprises the following steps, referring to Fig. 1 institutes Show:
Step 1:Text rough detection based on edge:
The rough detection of text is the first step, and its main task is the text as much as possible detected in input scene image I Word, only when " recall ratio " of the rough detection of text is high, text candidates area screening below just would make sense, so as to overall inspection Surveying accuracy rate just can be higher, because rim detection is than faster and " recall ratio " is high, being adapted to smart mobile phone to use, so of the invention The method of the text rough detection of use is the text detection algorithm based on edge, is specifically included:
(1-1), using color image edge detection method in intelligent mobile terminal input scene image I carry out edge Detection, obtains the first intermediate processed images;The present invention carries out rim detection using color image edge detection method, because This method is preferable to the effect of coloured image, and the edge lines detected are thicker, comparatively facilitate follow-up text rough detection, Color image edge detection method is conventional method of the prior art, respectively to tri- points of the RGB of image in 3 × 3 neighborhoods Amount seeks edge, takes marginal value of the maximum in four direction as present component, after the marginal value for trying to achieve whole pixels, use Nibalck algorithms carry out binaryzation to edge, finally give the first intermediate processed images;
(1-2), morphology operations are carried out to the first intermediate processed images, connect the fracture text in the first intermediate processed images Word and adjacent word, obtain the second intermediate processed images;Morphology operations are also conventional algorithm of the prior art, this hair It is bright to be more beneficial for the text detection based on edge below, the form student movement that the present invention uses by carrying out morphology operations At last to the dilation operation for both vertically and horizontally having carried out 3 pixels respectively of image, then respectively in the vertical of image Direction and horizontal direction have carried out the closed operation of 3 pixels respectively;
(1-3), by finding the method for connected domain the second intermediate processed images are handled, so as to obtain input scene The image I text filed set S of candidate;The method for finding connected domain is also conventional method of the prior art;
Each candidate in the stroke width figure T of step 2, acquisition input scene image I, set S text filed to candidate Text filed progress stroke width and geometrical Characteristics Analysis, reject undesirable non-textual region, finally output positioning knot Fruit schemes L1;
Text candidates area screening be the present invention second step, in order to rough detection result is analyzed, screen with Non-textual region is rejected, research shows, the text element in natural scene has nearly constant stroke width, and adjacent Strokes of characters width in text filed is roughly equal, therefore can distinguish text filed and non-textual area using this feature Domain;According to the characteristics of strokes of characters tends to fixed width in natural scene, the present invention proposes a kind of based on stroke width change The text candidates area screening technique changed, it is as follows the step of specific implementation:
(2-1), using Canny edge detection methods to input scene image I carry out rim detection, obtain input scene figure As I edge graph, while record the gradient direction of each edge pixel point;
(2-2), stroke width conversion is carried out to edge pixel:
(2-2-1), assume that p is an edge pixel point, if dp is edge pixel point p gradient direction, according to dp directions Another matched edge pixel point is found in the edge pixel point of edge graph along route r=p+ndp (n >=0) Q, if dq is edge pixel point q gradient direction, dq and dp be in opposite direction or dq=-dp ± pi/2s;
It is not opposite, route r=p that if p, which does not find matched pixel q or dq with dp direction, + ndp (n≤0) goes out of use, it is necessary to reselect new edge pixel point p and find edge pixel point q on the other side;
If finding the pixel q to match, the stroke width of each pixel corresponded on [p, q] this route Value is each specified as | | p-q | |, | | p-q | | the Euclidean distance between pixel p and pixel q, if [p, q] this route On pixel had a stroke width value S, then take S with | | p-q | | in a less value as the pixel Actual stroke width value;
(2-2-2), (2-2-1) is repeated, the stroke width value until calculating pixel on all routes not gone out of use;
(2-2-3), again traversal either with or without the route being dropped, the stroke for calculating whole pixels on each route is wide Average M is spent, then finds out the pixel that all stroke width values on this route are more than M, then the stroke width value these pixels M is set to, after all routes travel through, finally gives input scene image I stroke width figure T;
Need to particularly point out, said process but there may also be dark bottom in practice mainly for the positive text for dark word of putting one's cards on the table The reverse text of bright word, therefore, in (2-2), repeat (2-2-1), (2-2-2) and (2-2-3) once, repeat When, in (2-2-1), found according to dp directions in the edge pixel point of edge graph along route r=p+ndp (n≤0) and its Another edge pixel point q of matching.In addition, as can be seen that pixel to be detected during the stroke width map function Quantity greatly reduces, because the Gradient Features of a pixel, which are only worked as, finds gradient side that another matches and opposite with it To pixel when just it is effective;
It is corresponding to find step 1 (2-3), on the basis of the stroke width figure T for the input scene image I that step (2-3) obtains The obtained text filed set S of candidate, then set S text filed to candidate screen, screening rule is as follows:
(2-3-a), by the text filed rejecting of candidate of the Aspect Ratio not between 0.1 to 10;The text filed length of candidate It is wide than being to exist a range of, typically between 0.1 to 10, some Aspect Ratios that condition is not satisfied are excessive or too small Region should be removed;
(2-3-b), by character duration not between W/20 and W pixel, candidate text of the height not between H/20 and H Region rejecting, wherein W and H represent the width and height of image respectively;Character should not be excessive, also should not be too small, the width of character Should be between W/20 and W pixel, highly between H/20 and H, wherein W and H represents the width and height of image respectively, without Meeting the character zone of the condition should be removed
(2-3-c), by area be less than 20 pixels the text filed deletion of candidate;Candidate region area is too small, then is judged to It is set to non-textual region, therefore area is less than the candidates of 20 pixels and text filed should be deleted;
The ratio shared by black pixel point in (2-3-d), a region should not be excessive, also should not be too small, text area Ratio shared by black pixel point carries out binaryzation typically between 0.2 and 0.8 to the text filed set S of candidate, calculates Ratio Rb shared by black pixel point, candidates of the ratio Rb not between 0.2 and 0.8 shared by black pixel point is text filed Reject, Rb definition is
Wherein, f (i, j) represent be (i, j) position in the text filed image of candidate pixel value, w, what h was represented respectively It is the text filed width of candidate and height, what ⊕ was represented is XOR;
The intercrossing of (2-3-e), character area from the intercrossing in non-legible area be it is different, it is generally, non-legible The intercrossing in region is that do not have inerratic, and word is regularly arranged, therefore the intercrossing of character area is certain In the range of, therefore binaryzation is carried out to the text filed set S of candidate, the intercrossing Rcc in the region is calculated, by intercrossing Rcc The text filed rejecting of candidate not between 0.05 and 0.6, intercrossing Rcc definition are:
Wherein, what f (i, j) was represented is the pixel value of (i, j) position in the text filed image of candidate, what f (i, j+1) was represented It is the pixel value of (i, j+1) position in the text filed image of candidate, what w, h were represented respectively is candidate's text filed width and height Degree, what ⊕ was represented is XOR;
Some situation elements similar to text element in (2-3-f), natural scene be present, such as leaf, it is difficult to by they Made a distinction with word;In addition, the stroke width of the word in natural scene differ establish a capital it is equal, possible stroke width not wait but Amplitude of variation is little;General one text filed stroke width variance is no more than the half of the average value of stroke width, and The stroke width ratio of adjacent pixel is no more than 3.0, therefore should be rejected for the candidate region of stroke width change too greatly;This Invention set S text filed to candidate carries out stroke width conversion, obtains the first text filed stroke width of all candidates Figure, will carry out stroke width conversion again after the text filed set S inverses of candidate, obtain text filed second of all candidates Width figure is drawn, if in text filed the first stroke width figure and the second stroke width figure of a certain candidate, stroke width variance More than stroke width average value half, and the stroke width ratio of adjacent pixel is more than 3.0, then by candidate's text Reject in region;
(2-4), text detection export:After the screening of (2-3), obtain it is final text filed, then according to each Text filed position relationship, it is ranked up and numbers according to rule from top to bottom, from left to right, will after sequence is completed Text area exports, and the result of output is positioning result figure L1;
Step 3, identification pretreatment, are specifically included:
(3-1), the text filed carry out contrast enhancing to positioning result figure L1;In order to save operation time, the present invention Using text filed the contrasting that computing is simple and the algorithm of histogram equalization of positive effect is to positioning result figure L1 Degree enhancing, algorithm of histogram equalization is conventional algorithm of the prior art, and enhanced new images add grey scale pixel value Dynamic range, so as to reach enhancing picture contrast effect;
(3-2), to enhanced text filed carry out medium filtering, 3 × 3 rectangular slide templates are used in of the invention to increasing Text filed carry out medium filtering after strong, this method is also conventional method of the prior art, i.e., --- with 3 × 3 rectangles Sleiding form, the pixel in template is ranked up according to the size of pixel value, generates monotone increasing or the 2-D data of decline Sequence, then the value with each pixel in the intermediate value replacement template of this group, are then exported, the image after medium filtering is not only good The marginal information of original image is saved, and the gray scale of image is become more smooth;
(3-3), to the text filed carry out binaryzation after medium filtering;The present invention considers the efficiency of algorithm performs and wanted Not text filed situations such as there may be uneven illumination is treated, employs maximum variance between clusters, maximum variance between clusters are existing Conventional algorithm in technology,
(3-4), to the text filed carry out character cutting after binaryzation;Present invention employs one kind to project cutting method pair Word is split, and this method is also conventional method of the prior art, and this method needs to obtain text before cutting is carried out Edge image, projection cutting is then carried out to edge image, it is probably comprising multirow that one is text filed, it is also possible to is wrapped Containing multiple row, therefore when carrying out cutting, it is necessary to enter every trade cutting and row cutting, the algorithm complex of this method is smaller, Perform speed;
Step 4, the monocase after cutting is standardized and the extraction of directional element features operation, specifically include:
(4-1), each character after cutting is cut, the white background gone around dropping character, to every after cutting out Its size is uniformly transformed to N × N by width character picture using bilinear interpolation;The present invention needs to cut out each character area Cut, the white background gone around dropping character, because word is of different sizes, the feature of same word also can be different, therefore, , it is necessary to which every width character picture is normalized before extraction character feature, character area of different sizes is transformed to size one All single character areas are normalized to 64 × 64 rectangular area by the character area of cause, the present invention;
The profile of single character after (4-2), extraction uniform sizes, and calculate its directional element features;
(4-3), according to directional element features, each character is identified using distance classifier, obtains each character Immediate X character;
Step 5, the disaggregated classification based on Gabor characteristic, are specifically included:
(5-1), its size is uniformly transformed to M × M by each character using bilinear interpolation, M takes 40 here.
(5-2), Gabor transformation is carried out to the character after uniform sizes, extract Gabor characteristic;
(5-3), on the basis of obtaining the immediate X character of each character after (4-3) identification, distance classifier is utilized It is identified again, draws the recognition result of each character.
The present invention is using euclid-distance classifier to directional element features when being classified, the first discrimination be only 45% with On, and the discrimination of preceding 100 candidate characters is more than 89%, therefore X values take it is 100 proper, then in this 100 candidates The further disaggregated classification of Gabor characteristic is used in character;Gabor characteristic is the subdivision category feature of the present invention, is had to Chinese character preferable Discrimination, when using grader of the m-cosine angle as Gabor characteristic, the first discrimination up to more than 78%, so, The present invention can use grader of the m-cosine angle as Gabor characteristic, from 100 candidates of upper level grader classification As a result one result most matched of middle selection, is then exported this result as final recognition result.Therefore the present invention uses The mode of cascade, first pass through extraction directional element features and rough sort is carried out to Chinese character to be identified, by candidate's range shorter of identification To 100, then in this small range, accurately identified by extracting the thin features of Gabor, export final identification knot Fruit;Compared with Gabor characteristic identifies, there is small size decline on recognition performance, but have on time performance and be substantially improved, it is single The recognition time of character averagely shortens about 41%, because present invention employs directional line element feature as thick feature by candidate word Symbol is reduced to 100 from 3755;With merely using directional element features compared with, the accuracy rate of character recognition is substantially improved, this It is because present invention employs Gabor characteristic as thin feature, the separating capacity of character is enhanced;The present invention fully combines The two-fold advantage for extracting the rapidity of directional element features and the accuracy of Gabor characteristic identification has taken into account speed and accuracy rate, Therefore it is more suitable for using in mobile platform.

Claims (4)

  1. A kind of 1. intelligent mobile terminal scene literal processing method, it is characterised in that:Comprise the following steps:
    Step 1:Text rough detection based on edge, is specifically included:
    (1-1), using color image edge detection method in intelligent mobile terminal input scene image I carry out edge inspection Survey, obtain the first intermediate processed images;
    (1-2), to the first intermediate processed images carry out morphology operations, connect the first intermediate processed images in fracture word with And adjacent word, obtain the second intermediate processed images;
    (1-3), by finding the method for connected domain the second intermediate processed images are handled, so as to obtain input scene image The I text filed set S of candidate;
    Each candidate's text in the stroke width figure T of step 2, acquisition input scene image I, set S text filed to candidate Region carries out stroke width and geometrical Characteristics Analysis, rejects undesirable non-textual region, finally exports positioning result figure L1;
    Step 3, identification pretreatment, are specifically included:
    (3-1), the text filed carry out contrast enhancing to positioning result figure L1;
    (3-2), to enhanced text filed carry out medium filtering;
    (3-3), to the text filed carry out binaryzation after medium filtering;
    (3-4), to the text filed carry out character cutting after binaryzation;
    Step 4, the monocase after cutting is standardized and the extraction of directional element features operation, specifically include:
    (4-1), each character after cutting is cut, the white background gone around dropping character, to every width word after cutting out Its size is uniformly transformed to N × N by symbol image using bilinear interpolation;
    The profile of single character after (4-2), extraction uniform sizes, and calculate its directional element features;
    (4-3), using distance classifier each character is identified, obtains the immediate X character of each character;
    Step 5, the disaggregated classification based on Gabor characteristic, are specifically included:
    (5-1), its size is uniformly transformed to M × M by each character using bilinear interpolation;
    (5-2), Gabor transformation is carried out to the character after uniform sizes, extract Gabor characteristic;
    (5-3), on the basis of obtaining the immediate X character of each character after (4-3) identification, using distance classifier again It is identified, draws the recognition result of each character.
  2. 2. intelligent mobile terminal scene literal processing method according to claim 1, it is characterised in that:The step 2 has Body includes:
    (2-1), using Canny edge detection methods to input scene image I carry out rim detection, obtain input scene image I Edge graph, while record the gradient direction of each edge pixel point;
    (2-2), stroke width conversion is carried out to edge pixel:
    (2-2-1), assume p be an edge pixel point, if dp be edge pixel point p gradient direction, according to dp directions along Route r=p+ndp (n≤0) finds matched another edge pixel point q in the edge pixel point of edge graph, if Dq is edge pixel point q gradient direction, and dq and dp be in opposite direction or dq=-dp ± pi/2s;
    It is not opposite, route r=p+n that if p, which does not find matched pixel q or dq with dp direction, Dp (n≤0) goes out of use, it is necessary to reselect new edge pixel point p and find edge pixel point q on the other side;
    If finding the pixel q to match, the stroke width value of each pixel corresponded on [p, q] this route is equal It is designated as | | p-q | |, | | p-q | | the Euclidean distance between pixel p and pixel q, if on [p, q] this route Pixel has had a stroke width value S, then take S with | | p-q | | in reality of the less value as the pixel Stroke width value;
    (2-2-2), (2-2-1) is repeated, the stroke width value until calculating pixel on all routes not gone out of use;
    (2-2-3), again traversal either with or without the route being dropped, the stroke width for calculating whole pixels on each route is equal Value M, then find out all stroke width values on this route and be more than M pixel, then the stroke width value of these pixels is set to M, after all routes travel through, finally give input scene image I stroke width figure T;
    (2-3), on the basis of the stroke width figure T for the input scene image I that step (2-2) obtains, correspondingly find step 1 and obtain The text filed set S of candidate, then set S text filed to candidate screen, screening rule is as follows:
    (2-3-a), by the text filed rejecting of candidate of the Aspect Ratio not between 0.1 to 10;
    (2-3-b), by character duration not between W/20 and W pixel, candidate of the height not between H/20 and H is text filed Rejecting, wherein W and H represent the width and height of image respectively;
    (2-3-c), by area be less than 20 pixels the text filed deletion of candidate;
    (2-3-d), set S text filed to candidate carry out binaryzation, the ratio Rb shared by black pixel point are calculated, by black picture The text filed rejecting of candidates of the ratio Rb not between 0.2 and 0.8 shared by vegetarian refreshments, Rb definition are
    Wherein, what f (i, j) was represented is the pixel value of (i, j) position in the text filed image of candidate, and what w, h were represented respectively is to wait The width and height of selection one's respective area, what ⊕ was represented is XOR;
    (2-3-e), set S text filed to candidate carry out binaryzation, calculate the intercrossing Rcc in the region, by intercrossing Rcc not The text filed rejecting of candidate between 0.05 and 0.6, intercrossing Rcc definition are:
    Wherein, what f (i, j) was represented is the pixel value of (i, j) position in the text filed image of candidate, and what f (i, j+1) was represented is to wait The pixel value of (i, j+1) position in selection local area area image, what w, h were represented respectively is the text filed width of candidate and height, What ⊕ was represented is XOR;
    (2-3-f), set S text filed to candidate carry out stroke width conversion, obtain by all candidates it is text filed first Stroke width figure, stroke width conversion will be carried out again after the text filed set S inverses of candidate, obtained all candidates are text filed The second stroke width figure, if in text filed the first stroke width figure and the second stroke width figure of a certain candidate, stroke Width variance exceedes the half of the average value of stroke width, and the stroke width ratio of adjacent pixel then should more than 3.0 The text filed rejecting of candidate;
    (2-4), text detection export:After the screening of (2-3), obtain it is final text filed, then according to each text The position relationship in region, it is ranked up and numbers according to rule from top to bottom, from left to right, after sequence is completed, by text Area exports.
  3. 3. intelligent mobile terminal scene literal processing method according to claim 1, it is characterised in that:In (3-1) Using text filed carry out contrast enhancing of the algorithm of histogram equalization to positioning result figure L1;(3-2) middle use 3 × 3 rectangular slide templates carry out medium filtering to enhanced region, i.e., using 3 × 3 rectangular slide templates, by the pixel in template It is ranked up according to the size of pixel value, generates the 2-D data sequence of monotone increasing or decline, then replaced with the intermediate value of this group The value of each pixel, is then exported in template;The step (3-3) is using maximum variance between clusters to the region after medium filtering Carry out binaryzation.
  4. 4. intelligent mobile terminal scene literal processing method according to claim 2, it is characterised in that:In (2-2), (2-2-1), (2-2-2) and (2-2-3) is repeated once, when repeating, in (2-2-1), according to dp directions along route r =p+ndp (n≤0) finds matched another edge pixel point q in the edge pixel point of edge graph.
CN201410581464.6A 2014-10-27 2014-10-27 Intelligent mobile terminal scene literal processing method Active CN104408449B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410581464.6A CN104408449B (en) 2014-10-27 2014-10-27 Intelligent mobile terminal scene literal processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410581464.6A CN104408449B (en) 2014-10-27 2014-10-27 Intelligent mobile terminal scene literal processing method

Publications (2)

Publication Number Publication Date
CN104408449A CN104408449A (en) 2015-03-11
CN104408449B true CN104408449B (en) 2018-01-30

Family

ID=52646080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410581464.6A Active CN104408449B (en) 2014-10-27 2014-10-27 Intelligent mobile terminal scene literal processing method

Country Status (1)

Country Link
CN (1) CN104408449B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104980765B (en) * 2015-06-15 2018-07-27 北京维鲸视界科技有限公司 A kind of plain text frame monitoring method
CN105046254A (en) * 2015-07-17 2015-11-11 腾讯科技(深圳)有限公司 Character recognition method and apparatus
CN106156767A (en) 2016-03-02 2016-11-23 平安科技(深圳)有限公司 Driving license effect duration extraction method, server and terminal
CN106127118A (en) * 2016-06-15 2016-11-16 珠海迈科智能科技股份有限公司 A kind of English word recognition methods and device
CN107545261A (en) * 2016-06-23 2018-01-05 佳能株式会社 The method and device of text detection
CN106845475A (en) * 2016-12-15 2017-06-13 西安电子科技大学 Natural scene character detecting method based on connected domain
CN107516004A (en) * 2017-07-06 2017-12-26 贵阳朗玛信息技术股份有限公司 The identifying processing method and device of medical image picture
CN108509860A (en) * 2018-03-09 2018-09-07 西安电子科技大学 HOh Xil Tibetan antelope detection method based on convolutional neural networks
CN108596250B (en) * 2018-04-24 2019-05-14 深圳大学 Characteristics of image coding method, terminal device and computer readable storage medium
CN109409356B (en) * 2018-08-23 2021-01-08 浙江理工大学 Multi-direction Chinese print font character detection method based on SWT
CN111783781B (en) * 2020-05-22 2024-04-05 深圳赛安特技术服务有限公司 Malicious term recognition method, device and equipment based on product agreement character recognition
CN112101324B (en) * 2020-11-18 2021-03-16 鹏城实验室 Multi-view image coexisting character detection method, equipment and computer storage medium
CN113642556A (en) * 2021-08-04 2021-11-12 五八有限公司 Image processing method and device, electronic equipment and storage medium
CN117894030A (en) * 2024-01-18 2024-04-16 广州宏途数字科技有限公司 Text recognition method and system for campus smart pen

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1459761A (en) * 2002-05-24 2003-12-03 清华大学 Character identification technique based on Gabor filter set
CN1581159A (en) * 2003-08-04 2005-02-16 中国科学院自动化研究所 Trade-mark searching method
CN101615252A (en) * 2008-06-25 2009-12-30 中国科学院自动化研究所 A kind of method for extracting text information from adaptive images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1459761A (en) * 2002-05-24 2003-12-03 清华大学 Character identification technique based on Gabor filter set
CN1581159A (en) * 2003-08-04 2005-02-16 中国科学院自动化研究所 Trade-mark searching method
CN101615252A (en) * 2008-06-25 2009-12-30 中国科学院自动化研究所 A kind of method for extracting text information from adaptive images

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
多部件自适应融合的手写体阿拉伯字符识别;许亚美;《西安电子科技大学学报》;20121231;第39卷(第6期);全文 *
自然场景下的文字分割及识别研究;葛巧瑞;《中国优秀硕士论文电子期刊网》;20120731;全文 *

Also Published As

Publication number Publication date
CN104408449A (en) 2015-03-11

Similar Documents

Publication Publication Date Title
CN104408449B (en) Intelligent mobile terminal scene literal processing method
CN109154978B (en) System and method for detecting plant diseases
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
CN103577475B (en) A kind of picture mechanized classification method, image processing method and its device
CN107491730A (en) A kind of laboratory test report recognition methods based on image procossing
CN113128442B (en) Chinese character handwriting style identification method and scoring method based on convolutional neural network
CN111401372A (en) Method for extracting and identifying image-text information of scanned document
CN109740572A (en) A kind of human face in-vivo detection method based on partial color textural characteristics
Tian et al. Natural scene text detection with MC–MR candidate extraction and coarse-to-fine filtering
CN104504383B (en) A kind of method for detecting human face based on the colour of skin and Adaboost algorithm
CN109242400A (en) A kind of logistics express delivery odd numbers recognition methods based on convolution gating cycle neural network
CN108197644A (en) A kind of image-recognizing method and device
CN110298376A (en) A kind of bank money image classification method based on improvement B-CNN
CN103295013A (en) Pared area based single-image shadow detection method
CN112163511A (en) Method for identifying authenticity of image
CN110956167B (en) Classification, discrimination, strengthening and separation method based on positioning characters
CN112883926B (en) Identification method and device for form medical images
CN109086772A (en) A kind of recognition methods and system distorting adhesion character picture validation code
Huang et al. Text detection and recognition in natural scene images
CN112446259A (en) Image processing method, device, terminal and computer readable storage medium
CN107169996A (en) Dynamic human face recognition methods in a kind of video
CN108288061A (en) A method of based on the quick positioning tilt texts in natural scene of MSER
CN114581928A (en) Form identification method and system
CN109741351A (en) A kind of classification responsive type edge detection method based on deep learning
CN110766001B (en) Bank card number positioning and end-to-end identification method based on CNN and RNN

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant