CN110458158A - A kind of text detection and recognition methods for blind person's aid reading - Google Patents

A kind of text detection and recognition methods for blind person's aid reading Download PDF

Info

Publication number
CN110458158A
CN110458158A CN201910501311.9A CN201910501311A CN110458158A CN 110458158 A CN110458158 A CN 110458158A CN 201910501311 A CN201910501311 A CN 201910501311A CN 110458158 A CN110458158 A CN 110458158A
Authority
CN
China
Prior art keywords
word
image
frame
text
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910501311.9A
Other languages
Chinese (zh)
Other versions
CN110458158B (en
Inventor
毋超
郭璠
刘丽珏
马润洲
何汉东
刘嘉熙
康天硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201910501311.9A priority Critical patent/CN110458158B/en
Publication of CN110458158A publication Critical patent/CN110458158A/en
Application granted granted Critical
Publication of CN110458158B publication Critical patent/CN110458158B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/235Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on user input or interaction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • G09B21/001Teaching or communicating with blind persons
    • G09B21/006Teaching or communicating with blind persons using audible presentation of the information

Abstract

The invention discloses a kind of text detections and recognition methods for blind person's aid reading, and the method includes the steps of: step 1: scene detection, the step predominantly detect whether camera shot image is that finger is placed on the scene read on text;Step 2: finger positioning, the step realize the positioning to finger tip, and the cursor detected using this finger tip as follow-up text;Step 3: Text Feature Extraction, the step mainly include the extraction operation of each word in the extraction and line of text of line of text;Step 4: word tracking, the step track its word frame using template matching method mainly to the word correctly identified.The method of the present invention speed of service is fast, and effect is good, can not only recognize the word of user's finger tip meaning very accurately, and cost price is low, has very strong versatility, can be widely applied to the intellectual products such as wearable blind person's aid reading ring.

Description

A kind of text detection and recognition methods for blind person's aid reading
Technical field
The invention belongs to the application field of computer vision, in particular to a kind of text detection for blind person's aid reading With recognition methods.
Background technique
With eye disease up to 3.14 hundred million within the scope of world today.Wherein 2.69 hundred million people suffer from low visual acuity, blind person Number is 0.45 hundred million people.In Chinese population with visual disabilities at present up to 8,770,000 people, accounts for about the 19.5% of global blind person's sum, account for about The 0.7% of China's total population.7,005 will be broken through according to blind person's quantity of the analysis in relation to authoritative institution, six Nian Yihou, China Million.Therefore, blind person how is helped to overcome the difficulty of daily learning life, especially most basic reading problem has very big Researching value and social effect, it may have wide application prospect.
Occur a plurality of blind person's aid reading products, such as a reading aid for blindmen being worn on finger on the market at present (Touch Reader).Scanner built in the product can automatically scan and recognize the text skimmed over, then pass through one These text conversions are protrusion, concave braille by dot matrix.Since dot matrix is distributed in the inner layer of fingerstall, so finger can be felt Its change in shape should be arrived, so that friend blind person be allowed to identify these brailles.Similar therewith, another money touching-type blind-man is read Common text can be read and export braille information through inner cylindrical array by the bottom of device, then be gone out on the panel at top Existing column outstanding, formation can be with the braille of touch recognition.EyeRing finger ring is by embedded miniature image scanning collection device Acquire books in word content, by collector back close to finger place set by " braille point display " in real time become Blind spot combination is changed, achievees the effect that friend blind person is allowed to identify text by finger.But these above-mentioned products were not for learning The people of braille is difficult with these devices.In addition, other products such as this wearable device of OrCam are small on glasses by tying up Type camera and a set of processing system composition.The product can pass through operation computer vision algorithms make, the thing progress to seeing Parsing, the content and information for then telling blind person, amblyopia group etc. to be seen by bone conduction voice.But the product involves great expense, And blind person is different that scanning ophthalmoscope is correctly surely directed at reading matter, therefore use it is more inconvenient.It can be seen that above-mentioned existing Product perhaps needs user to learn braille or expensive and inconvenient for use.
In terms of towards blind reading method patent, Qiu Hong et al. (patent publication No. CN108492682A) provides one Kind reading aid for blindmen, this reader send image recognition processor for the pictorial information that camera obtains and identify, and will Recognition result is fed back in driving circuit in a manner of level signal to drive braille dot matrix component to export corresponding braille word Symbol.A kind of blind person's finger ring type reader that Wang Lu (patent publication No. is by CN106601081A) is invented can be by being set to Camera on finger ring identifies the printing type face on normal books while changing into braille.But the master of these above-mentioned patented methods Problem is wanted to be that it can use braille there is still a need for user.In addition, Li Chongzhou et al. (patent publication No. CN103077625A) is mentioned Go out a kind of blind with electronic reader and helping blind reading method.This helps blind reading method to pass through scanning first or take pictures paper Matter text is converted into electronic picture format data, is then identified as electronic text document by OCR identification technology, finally Electronic text document voice data is converted to using TTS speech synthesis technique to play.But this patented method is by whole picture figure Text as in it is disposable it is bright read out, and fail to provide user and refer to which this convenient and fast personalized function which reads.
In this context, a kind of strong robustness is studied, accuracy is high, at low cost and can be to blind person or low view user's finger Signified text carries out automatic detection to be just particularly important with method for distinguishing is known
Summary of the invention
The technical problem to be solved by the invention is to provide a kind of text detections for blind person's aid reading and identification side Method so that friend blind person or it is low can also read general books depending on crowd, solve friend blind person or low visual acuity crowd's reading difficulty The problem of.
The technical solution adopted in the present invention is as follows:
A kind of text detection and recognition methods for blind person's aid reading, comprising the following steps:
Step 1: for the image sequence of camera shooting, judging whether the scene in present image is that finger is placed on reading text In sheet, if then carrying out step 2, the frame image is otherwise skipped, using next frame image as present image, carries out above-mentioned judgement And processing;
Step 2: user's finger tip is positioned in present image;
Step 3: according to the position of user's finger tip, determining the line of text of user's instruction;
Step 4: extracting the word in the line of text of user's instruction, be converted into voice output.
Further, judge whether the scene in present image is that finger is placed on the side read on text in the step 1 Method is as follows:
Step 11 shoots some typical text filed figures comprising user's finger and its place by camera in advance Picture is stored in database;
Step 12, multiple images shot using present image and in the image front are as sample image;
The rgb color space of image and sample image in database is normalized in step 13 respectively;
Step 14, for each sample, it is each in red channel image after calculating separately its normalized and database The Euclidean distance of image normalization treated red channel image, using the minimum value in result as the matching of the sample Score;Seek the mean μ of all sample matches scoresImWith variances sigmaImIf μImIm< Th then assert the field in present image Scape is that finger is placed on reading text, and wherein Th is threshold value, is empirical parameter.
Further, in the step 14, all images is all narrowed down to and are sized, then calculate Euclidean distance.
Further, the step 2 the following steps are included:
Step 21, the candidate region that user's finger tip is found using K-means;
Firstly, being filtered using Gaussian filter to present image;
Then, three two-dimensional matrixes are generated according to the image in three channels in filtered image, in each two-dimensional matrix Element value be respective channel image on respective point pixel value;
To each two-dimensional matrix, its all column summation is averaged again, obtains the column vector m of row × 1c_ave;It will Its all row summation is averaged again, obtains the row vector m of a 1 × colr_ave;Present image is converted to three as a result, Column vector and three row vectors, wherein col indicates that total columns of two-dimensional matrix, row indicate total line number of two-dimensional matrix;
Using each dimension of column vector as a longitudinal data point, using the component of three column vector identical dimensionals as phase Three features of the longitudinal data point answered, constitute the feature vector of the longitudinal data point, the number of longitudinal data point be equal to column to The dimension of amount, i.e. row;Using each dimension of row vector as a lateral data point, point of three row vector identical dimensionals Three features as corresponding lateral data point are measured, the feature vector of the transverse direction data point, the number of lateral data point are constituted Equal to the dimension of row vector, i.e. col;
Secondly, being clustered respectively to longitudinal data point and lateral data point using K-means, clusters number is 2;
Again, the result of the cluster of longitudinal data point is expressed as a longitudinal label vector, is row × 1 Column vector, the representation in components of each dimension are the label of the longitudinal data point of respective dimensions, and value is 0 or 1;By lateral data The result of the cluster of point is expressed as a lateral label vector, is the row vector of a 1 × col, the subscale of each dimension It is shown as the label of the lateral data point of respective dimensions, value is 0 or 1;
Mean filter is first carried out to longitudinal label vector and lateral label vector respectively, then carries out threshold process, if certain is tieed up Spend element value be greater than or equal to given threshold, then be set to 1, be otherwise arranged 0, obtain final longitudinal label to Amount and lateral label vector;
By corresponding horizontal line of the separation of element value 0 and 1 in longitudinal label vector in present image and lateral mark The intersection point for signing corresponding vertical line of the left side separation of element value 0 and 1 in vector in present image is left upper apex, is drawn Fixed candidate region of the rectangular area as user's finger tip;
Step 22 positions finger tip by calculating curvature;
Firstly, seeking the edge in the candidate region of user's finger tip using canny operator, connection edge obtains profile;If Multiple profiles are obtained, then only retain the profile for being not less than given threshold comprising pixel number;
Then, the profile remained is smoothed;
Finally, calculating the curvature of each pixel on profile to the profile after smoothing processing, the point that curvature is zero is to use Family fingertip location.
Further, random initial during being clustered using K-means to longitudinal data point/transverse direction data point Change two group cluster centers, clustered twice, assess the compactness of cluster result twice, chooses the good cluster result of compactness As final cluster result.
Further, the step 3 the following steps are included:
Step 31, text filed extraction;
Firstly, present image is converted into gray level image;
Then, binary conversion treatment is carried out to gray level image, obtains the foreground area and background region of image;
The non-textual region in background region is excluded again, the method is as follows:
Connected region all in background region is first extracted, set C is constitutedR;Then to set CRIn each connected region Domain seeks it and rotates rectangle, is denoted as ζ (ο, θ, w, h), and wherein o represents the center of rotation rectangle, and θ represents rotation rectangle and deflected Angle, be that trunnion axis rotates counterclockwise, with encounter rotation rectangle a line angle, w and h respectively represent rotation Two adjacent sides of rectangle;After filtering CRThe area of middle rotation rectangle and the angle deflected do not meet the company of constraint condition Remaining connected region is further filtered, specific implementation includes in logical region based on the relationship between text filed Following steps:
3.1) using the left upper apex of present image as origin O, present image length direction is y-axis, and right direction is taken to be positive, when Preceding picture traverse direction is x-axis, removes direction and is positive;It is paid close attention to the rotation rectangular centre of each connected region R as one Every in present image is crossed the straight line of focus, is expressed as form by point:
xcosθSL+ysinθSLSL
Wherein, θSLFor the angle of straight line and x-axis, ρSLFor the distance of origin O to straight line, θSLValue range (- pi/2, π/ 2), ρSLValue range be (- D, D), D be camera shooting the cornerwise length of original image;
3.2) (ρSLSL) parameter space is subdivided into multiple accumulator elements, by coordinate (ρkk) at accumulator element Value be denoted as A (ρkk);Whole integrating instrument units are both configured to zero first, then calculate separately each focus (xi,yi) To straight line xcos θk+ysinθkkDistance d:
D=| xicosθk+yi sinθkk|;
To all focus to straight line xcos θk+ysinθkkDistance successively judged often there is a focus pair The distance answered is less than threshold value, then A (ρkk) value adds 1, after having judged, obtain final A (ρkk) value;If A (ρkk) value Being considered as respective straight higher than the threshold value is with reference to straight line, and the reference vertical element number remembered is N;
3.3) text filed, detailed process is found by the method that unsupervised line clusters are as follows:
3.31) input concern point set, initializes reference line set CL, including N reference line, respectively walk The rapid N item 3.2) acquired refers to straight line;
3.32) calculating pays close attention to all focus in point set to set CLIn every reference line distance;For every A focus, takes lowest distance value;Filter out the point that lowest distance value is less than given threshold;To the point of interest marker filtered out The focus that lowest distance value corresponds to same reference line is attributed to same class by classification;
3.33) straight line fitting is carried out to the focus of the same category, and judges that the straight line newly fitted is corresponding with the category The slope of reference line and the difference of intercept whether be less than given threshold, if being less than, set CLThe corresponding base of the middle category Line of collimation remains unchanged, otherwise, by set CLThe corresponding reference line of the middle category is updated to the straight line that will newly fit;If Set C in this stepLIn all reference line remain unchanged, then output has the concern point set C of category labelS, this The region that the corresponding rotation rectangle of a little focus is included is to extract text filed, otherwise return step 3.31);
Step 32 determines line of text;
The fingertip location obtained according to step 2 determines that the area-of-interest of a rectangle, finger tip are located in area-of-interest Bottom edge on;
Step 31 is extracted text filed, seeks the outermost layer profile of wherein each character;Choose each profile The point of lowermost end filters out the datum mark in area-of-interest as datum mark;
For the datum mark filtered out, three adjacent datum marks are chosen every time, carry out straight line fitting, are obtained a plurality of straight Line;
The straight line come is fitted for all, is scored respectively, scoring formula is as follows:
Wherein, d (i) is the distance of i-th of datum mark filtering out to straight line, and n is the datum mark sum filtered out, μscore For score;
The minimum straight line of score is chosen as judgement line;It filters out in the datum mark filtered out and is less than to the distance of judgement line The datum mark of given threshold, then remaining all datum marks are based on, straight line fitting is carried out, the straight line fitted is that user refers to The line of text shown.
Further, the concrete processing procedure of the step 4 is as follows:
Step 41, the first word of identification;
Step 31 is extracted text filed, extract its part being located in area-of-interest, as target text One's respective area;The minimum circumscribed rectangle for seeking each character in target text region respectively, as an alphabetical frame;According to word Difference of the distance at two interior adjacent letters frame centers between word at a distance from two adjacent letters frame centers, to alphabetical frame It is clustered, synthesizes word;Seek belonging to the minimum circumscribed rectangle of all alphabetical frames in a word as word frame;
The line of text and horizontal direction institute indicated according to user in image is angled, carries out angle compensation to image, makes it Middle line of text is rotated to horizontal direction;
The first word frame along line of text is chosen, is then identified using OCR identification technology, what identification was returned Result includes: word, word confidence level and word frame;When the word confidence level of return is greater than threshold value, then it is assumed that word is by just Really identification carries out voice output to the word correctly identified;
Step 42 tracks position of the word frame of the word correctly identified in subsequent image frames using template matching method It sets, to determine the new word region in subsequent image frames, identifies new word in new word region;
Vi. binary conversion treatment is carried out to each frame image;
S=1 is initialized, finger tip speed V is initializedfingertip;J-th of the word identified in l frame image is corresponding Word frame and it includes all alphabetical frames as the word frame and letter frame for needing to track;It will be interested in m frame image Region is as region of search;Initialize m=l+1;
Vii. the word frame/letter frame tracked to needs, tracks it in region of search using the method for template matching Position;
If currently the word frame tracked is needed to track successfully, iii is entered step;Otherwise judge whether to have before tracking at The word frame of function, if so, then first determining one on m frame image and in the newest matching position right for tracking successfully word frame It is overlapped on the right of dead line, dead line and matching position;Judge whether the width on the dead line range image left side is less than setting again Threshold value, if continuing to identify in new word region new then using the target text region on the right of dead line as new word region Word;Otherwise terminate new word identification;If before without tracking successful word frame, using m frame image as current figure Picture executes step 1~step 41 to it, carries out the identification of first word;
Viii. finger tip speed V is updatedfingertip:
Wherein, Vword,jRepresent the water for tracking position of successful j-th of word frame in l frame image and m frame image Flat range difference, Vletter,kSuccessful k-th of tracking alphabetical frame is represented in the water of the position in l frame image and m frame image Flat range difference, N1Represent the word frame number of successful match, N2Represent the alphabetical frame number of successful match;
Ix. s=s+1 is enabled;
To identified in l frame image the corresponding word frame of s-th of word and it includes each of alphabetical frame, according to finger tip Whether the velocity estimated word frame and each alphabetical frame can remove m frame image, if word frame/letter frame on l frame image The abscissa of left upper apex subtracts Vfingertip× (m-l) determines that the word frame/letter frame can remove m frame figure less than zero Picture;
Word frame/letter frame that word frame/letter frame that judgement will not remove current hardwood image is tracked as needs;And A rectangular area delimited as new region of search, the upper left corner abscissa of the rectangular area=upper one is tracked The upper left corner abscissa of word frame-finger tip speed-setting deviant, length=upper word tracked of rectangular area The length of frame+finger tip speed, width+setting deviant of wide=upper one word frame tracked of rectangular area;
The ratio of black picture element and white pixel in new region of search is calculated, if this ratio is less than given threshold, is lost The frame image is abandoned, and enables m=m+1, re-starts aforementioned judgement and processing until black picture element in new region of search and white The ratio of pixel is not less than given threshold, then enters step v;Otherwise it is directly entered step v;
X. return step ii.
The utility model has the advantages that
The invention discloses the text detections and recognition methods that are directed to blind person's aid reading, and the method includes the steps of: Step 1: scene detection, the step predominantly detect whether camera shot image is that finger is placed on the scene read on text;Step 2: finger positioning, the step realize the positioning to finger tip, and the cursor detected using this finger tip as follow-up text;Step 3: text This extraction, the step mainly include the extraction operation of each word in the extraction and line of text of line of text;Step 4: word tracking, The step tracks its word frame using template matching method mainly to the word correctly identified.Using the method, together When combine voice output, can operate with related blind person's aid reading product, even if thus so that user is ignorant of braille also can be conveniently fast The content of text at meaning is known promptly.
Blind reading method is helped compared to others, and more using by shot image text conversion is that friend blind person can The braille form of perception, this requires users must learn braille.And comparatively this programme, using will directly use The mode of family finger meaning word progress voice output.It therefore, also can be convenient and efficient whenever and wherever possible even if user is ignorant of braille Enjoy the enjoyment of reading in ground.Meanwhile this programme method speed of service is fast, effect is good, can not only recognize use very accurately The word of family finger tip meaning, and cost price is low, has very strong versatility, can be widely applied to wearable blind person's auxiliary Read the intellectual products such as ring.
Detailed description of the invention
Fig. 1 is the General Implementing flow diagram of the method for the present invention;
Fig. 2 is the scene detection process of the method for the present invention;Wherein Fig. 2 (a) is the finger data library image established in advance, figure 2 (b) be the normalization of finger-image, and Fig. 2 (c) is that image captured by camera was matched with finger data library image template The schematic diagram of journey;
Fig. 3 is the finger tip detection process of the method for the present invention;Wherein Fig. 3 (a) is originally inputted picture for camera, and Fig. 3 (b) is The RGB channel of picture is inputted, Fig. 3 (c) is column vector made of RGB channel conversion, and Fig. 3 (d) is finger tip area that may be present Domain, Fig. 3 (e) are the finger tip region that may be present of amplification, and Fig. 3 (f) is the fingertip location found;
Fig. 4 is the schematic diagram of the spin moment Shape definition of the method for the present invention;
Fig. 5 is the Text Feature Extraction process of the method for the present invention;Wherein Fig. 5 (a) is that fingertip location color image is carried out gray scale The gray level image obtained after change, Fig. 5 (b) are the two-value for obtain after binary conversion treatment to gray level image shown in Fig. 5 (a) Image, Fig. 5 (c) is filtered to Fig. 5 (b) according to area condition as a result, Fig. 5 (d) is according to angle conditions further to figure 5 (c) filtered results;
Fig. 6 is the text area extraction process of the method for the present invention;Wherein Fig. 6 (a) is to filter obtained figure through over-angle Picture, Fig. 6 (b) are the coordinate representation of all ζ (o), and Fig. 6 (c) is (ρSLSL) matrix three dimensional representation, z-axis be matrix value, figure 6 (d) be striked initialization straight line, and Fig. 6 (e) is obtained text filed by line cluster;
Fig. 7 is the line of text determination process of the method for the present invention;Fig. 7 (a) be by line cluster it is obtained it is text filed and Fingertip location, Fig. 7 (b) is by the schematic diagram of the determined key area of fingertip location;Fig. 7 (c) is the concern area extracted Domain, and mark line of text and abnormal straight line schematic diagram on it.
Fig. 8 is the word identification process of the method for the present invention;Wherein Fig. 8 (a) is the schematic diagram for determining word frame;Fig. 8 (b) is Rotation angle is determined according to line of text;Fig. 8 (c) is the result carried out after angle compensation to image;
Fig. 9 is that the word of the method for the present invention tracks schematic diagram;Fig. 9 (a) is when tracking is read;Fig. 9 (b) is blurred block inspection It measures and is intended to.
Figure 10 is the actual effect figure of the method for the present invention at runtime.
Specific embodiment
The present invention will be further described for explanation with reference to the accompanying drawing:
The present embodiment is to press such as blind this specific application of reading is helped to the detection and identification of text in shot image Lower step carries out, and whole implementation process is as shown in Figure 1.It can be seen that implementing procedure in detail mainly includes following main step It is rapid:
Step 1: judging whether the scene in present image is that finger is placed on reading text by scene detection, if then Step 2 is carried out, otherwise without subsequent step;The concrete processing procedure of this step is as follows:
Step 11 shoots some typical text filed figures comprising user's finger and its place by camera in advance Picture is stored in database, sees Fig. 2 (a);
Step 12, (i.e. camera is continuously shot 19 images shot by present image and in the image front recently 20 images) it is used as sample;
Step 13, color normalization;It is empty to the rgb color of image and sample image in database as shown in Fig. 2 (b) Between be normalized respectively, to reduce the influence of illumination and shade:
Wherein, (r, g, b) represents in original image certain pixel value put, and (R, G, B) is indicated in the image after normalized The pixel value of the point;
Since finger color is the colour of skin, the red channel image of each image after extracting normalized, after progress Continuous images match processing;
Step 14, images match;In order to reduce match time, all picture sizes are all first narrowed down into 50 × 50 pixels Size;For each sample, it is corresponding with each image in database red logical to calculate separately its corresponding red channel image The Euclidean distance of road image, using the minimum value in result as the matching score of the sample;Seek 20 sample matches scores Mean μImWith variances sigmaIm, both being calculated in the present embodiment is respectively 35.08 and 10.01;If μImIm< Th, then assert Scene in present image is that finger is placed on reading text, can carry out subsequent finger tip positioning operation, and wherein Th is threshold value, 150 are set as in the present embodiment.
Step 2: user's finger tip, such as Fig. 2, the cursor as follow-up text detection are positioned in present image;Specific processing Process is as follows:
Step 21, the candidate region that user's finger tip is found using K-means;
Firstly, being filtered using Gaussian filter to present image (shown in such as Fig. 3 (a)), to reduce the dry of abnormal point It disturbs, improves the order of accuarcy of cluster.The size of Gaussian kernel matrix H used by gaussian filtering is (2kG+1)×(2kG+ 1), high The calculation formula of this nuclear matrix H are as follows:
In above formula, H (i, j) is the element value of Gaussian kernel matrix H the i-th row jth column, i, j=1,2 ..., 2kG+1;σGFor height The width parameter of this kernel function controls the radial effect range of function, is set as 3 in the present embodiment;kGTo control Gaussian kernel The parameter of size is set as 15 in the present embodiment;
Then, three two-dimensional matrixes are generated according to the image in three channels in filtered image (RGB image), each two Tie up the pixel value that the element value in matrix is respective point on the image of respective channel;
Each two-dimensional matrix is calculated according to the following formula respectively, obtains a column vector and a row vector:
Wherein, mc_aveFor the obtained column vector for being averaged the summation of all column of two-dimensional matrix again;M (:, j) it is two The jth column of matrix are tieed up, col indicates total columns of two-dimensional matrix;mr_aveFor the summation of all rows of two-dimensional matrix is averaged again Obtained row vector;M (i :) is the i-th row of two-dimensional matrix, and row indicates total line number of two-dimensional matrix;
Present image is converted to three column vectors of row × 1 (as shown in Fig. 3 (b), (c)) and three 1 × col as a result, Row vector;Using each dimension of column vector as a longitudinal data point, using the component of three column vector identical dimensionals as phase Three features of the longitudinal data point answered, constitute the feature vector of the longitudinal data point, the number of longitudinal data point be equal to column to The dimension of amount, i.e. row;Using each dimension of row vector as a lateral data point, point of three row vector identical dimensionals Three features as corresponding lateral data point are measured, the feature vector of the transverse direction data point, the number of lateral data point are constituted Equal to the dimension of row vector, i.e. col;
Secondly, being clustered respectively to longitudinal data point and lateral data point using K-means.In the process, in order to What is prevented is locally optimal solution, cluster process using random initial clusterings center twice, it is being clustered twice as a result, The compactness of this cluster result twice is assessed, chooses the good cluster result of compactness as final cluster result.
The compactness that following formula assesses cluster result twice is respectively adopted:
In above formula, K represents clusters number, is worth and represents certain a kind of number of samples, x for 2, NiRepresent i-th of number of jth class The feature vector at strong point, mjRepresent the cluster centre of jth class.The μ obtained by the formulascoreIt is that each point arrives in a cluster result The sum of the squared-distance of its corresponding cluster centre, is able to reflect the compactness of the secondary cluster result, μscoreBigger, compactedness is got over Difference;Choose μscoreLower cluster result is as final cluster result.
The result of the cluster of longitudinal data point is expressed as a longitudinal label vector, be row × 1 column to Amount, the representation in components of each dimension are the label of the longitudinal data point of respective dimensions, and value is 0 or 1;Due in cluster process Data point related with finger is more likely to be divided into one kind, i.e., 0 and 1 is all collected on together in cluster result, and because finger one As appear in the middle lower section of picture, so the generally top part of the element value of label vector is substantially all 0, lower portion Substantially it is all 1;In order to remove the label of exceptional data point, label vector is filtered with one-dimensional mean filter, that is, is existed A template is given to target data in label vector, template is typically sized to odd number, which includes surrounding closing on number According to (size of such as template is 5, then the data that template includes are that two data and the left side adjacent on the right of target data are adjacent Two data, do not include its own), then target data is replaced with the average value of all data in template;After filtering Label vector carry out threshold process, if certain dimension element value be greater than or equal to given threshold, be set to 1, otherwise will It is arranged 0, and (threshold value is rule of thumb set, and is set as 0.4) obtaining final label vector in the present embodiment.In label vector Corresponding horizontal line of the separation of element value 0 and 1 in present image is the longitudinal cut-off position of user's finger in the picture It sets;
The result of the cluster of lateral data point is expressed as a lateral label vector, the row for being a 1 × col to Amount, the representation in components of each dimension are the label of the lateral data point of respective dimensions, and value is 0 or 1;Since finger generally goes out The middle part of present image, so it is 1 that the element value of the label vector, which is distributed as middle section, two sides 0, therefore selecting should Corresponding vertical line of the left side separation of element value 0 and 1 in present image is user's finger in image in label vector In lateral rest position;
Abundant experimental results show: the intersection point of longitudinal rest position and lateral rest position is on the upper left side of finger;Due to Mean filter was carried out to label vector, 0 and 1 left side separation is will lead to and is deviated to 0 part, make intersection point in hand The upper left side of finger, therefore using intersection point as left upper apex, can delimiting a sufficiently large rectangular area, (rectangular area is specifically big Small related with the size of input picture, input picture size is 480 × 640 in the present embodiment, and rectangular area a length of 320 is arranged, Width is 160) as the candidate region of user's finger tip (i.e. user's finger tip region that may be present), as shown in Fig. 3 (d), (e).
Step 22 positions finger tip by calculating curvature;
The edge in the candidate region of user's finger tip is sought using canny operator first, connection edge obtains profile;If To multiple profiles, then by the profile size threshold value (the experimental results showed that effect is best when this threshold value is set as 100) being arranged, The profile for being less than given threshold comprising pixel number is excluded, it, will be comprising pixel number not to exclude the interference of isolated point Profile less than given threshold retains;
According to the parametric equation of curve:
Γ (t)=(x (t), y (t));
Wherein, t is parameter, and x (t) is equation of the curvilinear abscissa about t, and y (t) is equation of the curve ordinate about t, Γ (t) is equation of the curve about t, then knows the curvature estimation formula of curve are as follows:
Wherein,WithThe first derivative of respectively x (t) and y (t),WithIt is respectively x (t) and y (t) Second dervative.
In the present invention, due to the set that profile is pixel, in order to seek the curvature of each pixel in profile, according to As under type is calculated.
Firstly, need to be smoothed to curve to reduce influence of the noise to curvature measurement.According to one-dimensional Gaussian function Number generates one-dimensional Gaussian kernel;The size for defining one-dimensional Gaussian kernel is M, and wherein M is closest to lesser number, σ in the odd number of 10 σ For the width parameter of one-dimensional Gaussian function, the radial effect range of function is controlled, it is 31 that value, which is 3, M, in an experiment,.Tool Gymnastics does convolution as by the coordinate of pixel each on profile and one-dimensional Gaussian kernel, which can be defined as follows:
Wherein L=(M-1)/2=15, X (npoint),Y(npoint) it is n-th on profilepointThe transverse and longitudinal coordinate of a pixel (establishment of coordinate system are as follows: origin O is the left upper apex in image, and horizontal axis is y-axis, and right direction is taken to be positive, and the longitudinal axis is x-axis, is removed Direction is positive), g (k, σ) is k-th of weighted value of one-dimensional Gaussian kernel, and X (n, σ) and Y (n, σ) are in smoothing processing rear profile the The coordinate value of n pixel, npointValue determined by following formula:
Wherein nsizeIt include the number of pixel, n, n for profilepoint=1,2 ..., nsize.It, can according to the property of convolution With calculate acquire X (n, σ) and Y (n, σ) single order and second dervative it is as follows:
Wherein,WithRespectively indicate one generated according to the single order and second dervative of one-dimensional Gaussian function Tie up k-th of weighted value of Gaussian kernel;
Since profile is as follows in the curvature estimation expression formula of nth pixel point:
Therefore, for the profile remained, the curvature of each pixel of profile can be calculated, as a result such as Fig. 3 (f) Shown, wherein finger tip corresponds to the point that curvature is zero, and user's fingertip location thus can be obtained.
Step 3: according to the position of user's finger tip, determining the line of text of user's instruction;Concrete processing procedure is as follows:
Step 31, text filed extraction;
The color image clapped from camera is converted into gray level image first, sees Fig. 5 (a), then this gray level image is made Binary conversion treatment is carried out to image with Ostu Adaptive Thresholding, sees Fig. 5 (b), obtain this image foreground area (finger) and Background region (text document).Due to that may include non-textual region in background region, it is therefore desirable to further strengthen constraint item Part is to exclude the non-textual region in background region.For this purpose, firstly, extract connected region all in image background region, Constitute set CR;Then, to set CRIn each connected region seek its rotate rectangle (minimum circumscribed rectangle), be denoted as ζ (ο, θ,w,h);Wherein o represents the center of rotation rectangle, and it is trunnion axis that θ, which represents the angle (rotation angle) that rotation rectangle is deflected, (x-axis) rotates counterclockwise, and the angle with a line for the rectangle encountered, in the range of (- pi/2,0), w and h respectively represent rotation Two adjacent sides of torque shape, as shown in Figure 4.Then, according to following constraint filtered set CRIn non-textual region.
1) area filters.By many experiments it can be found that in a certain range, i.e., text filed size is Meet:
tminArea<tmax
Wherein, ζArea=wh, tminAnd tmaxRespectively text filed area threshold up and down, value are according to repeatedly real Determined by the text filed area of test amount, 100 and 1500 are set in the present embodiment;
Rather than the area of one's respective area is random size, thus can carry out the first of non-textual region according to region area Secondary filtering.I.e. for each connected region in set C, if it, which rotates rectangle, is unsatisfactory for constraint condition tminArea<tmax, then The rotation rectangle is regarded as non-textual region, by it from set CRMiddle deletion, set CRMiddle residue connected region constitutes set CR1;As a result as shown in Fig. 5 (c);
2) angle filters.Due to present invention scene of interest be plain text and with pictorial text (such as books) and It is not the text (such as poster) in complicated image, so, in set CR1In, real text filed more non-textual region has Absolute predominance.And it notices in 26 English alphabets in addition to o is that (it rotates the angle that rectangle is deflected to an exception Always zero), the angle that the text filed rotation rectangle of other letters is deflected is not much different, i.e., most of text filed rotation The angle, θ that torque shape is deflected all meets following constraint condition:
|θ-μθ| < σθ
Wherein μθAnd σθRotation rectangle respectively by all connected regions in step 1) treated set C is deflected Angle mean value and variance.
Therefore, for each connected region in set C1, if it, which rotates rectangle, is unsatisfactory for constraint condition | θ-μθ| < σθ, Then the rotation rectangle is regarded as non-textual region, by it from set CR1Middle deletion, set CR1Middle residue connected region is constituted Set CR2.Although constraint condition | ζ (θ)-μθ| < σθIt is possible that text filed removing of the meeting image border, but due to text Detection and identifying processing are primarily directed to the central region of image, therefore this processing will not be accurate to the identification that text is final Property cause very big influence, as a result as shown in Fig. 5 (d).
3) based on the relationship filtering between text filed.In present invention scene of interest, text filed is not independent Occur but will form line of text, all text filed rotation rectangular centres for being consequently belonging to the same line of text can be wired Thus sexual intercourse can fit reference line according to all text filed rotation rectangular centres for belonging to the same line of text, Finally, according to the distance of the rotation rectangular centre in non-textual region and text filed rotation rectangle centre distance reference line Difference is text filed to determine, thus the key of problem becomes how to determine which text filed belongs to the same text in image Current row, determine which it is text filed belong to the same line of text after, can be quasi- according to these text filed rotation rectangular centres Conjunction obtains reference line.Specific implementation the following steps are included:
It 3.1) will set CR2In each connected region R rotation rectangular centre as a focus, will be in image Every is crossed the straight line of focus, and rewriting its linear equation y=kx+b is following form:
Wherein, θSLFor the angle of straight line and x-axis, ρSLFor origin O to straight line distance (origin O be in image upper left top Point, horizontal axis are y-axis, and right direction is taken to be positive, and the longitudinal axis is x-axis, remove direction and are positive),If the parameter of straight line (ρSLSL) treat as unknown quantity, then in (ρSLSL) the parameter space straight line will correspond to a sine curve.Wherein, θSL's Value range (- pi/2, pi/2) (rotation is positive counterclockwise, rotates clockwise and is negative), ρSLValue range be (- D, D), D is phase The cornerwise length of original image of machine shooting;
3.2) (ρSLSL) parameter space is subdivided into multiple accumulator elements, by coordinate (ρkk) at accumulator element Value be denoted as A (ρkk);In the present embodiment, θkIt is taken as the integer degree of (- 90,90);First all by whole integrating instrument units It is set as zero, then calculates separately each focus (xi,yi) arrive straight line xcos θk+ysinθkkDistance d:
D=| xi cosθk+yi sinθkk|;
To all focus to straight line xcos θk+ysinθkkDistance successively judged often there is a focus pair The distance answered is less than threshold value, then A (ρkk) value adds 1, after having judged, obtain final A (ρkk) value, this experiment is finally As a result as shown in Fig. 6 (c).Final A (ρkk) value indicate with straight line xcos θk+ysinθkkFor the strip region institute of axis The number for the focus for including, therefore A (ρkk) value is higher, just represent this straight line be reference line probability it is bigger.Thus A threshold value can be set, is set as 7 in the present embodiment, once A (ρkk) value be higher than the threshold value be considered as respective straight for reference Straight line, the reference vertical element number remembered are N;
3.3) most probable text filed, detailed process is found by the method that unsupervised line clusters are as follows:
3.31) input concern point set, initializes reference line set CL, including N reference line, respectively walk The rapid 3.2 N items acquired refer to straight line;
3.32) calculating pays close attention to all focus in point set to set CLIn every reference line distance;For every A focus, takes lowest distance value;Filter out the point that lowest distance value is less than given threshold;To the point of interest marker filtered out The focus that lowest distance value corresponds to same reference line is attributed to same class by classification;
3.33) straight line fitting is carried out to the focus of the same category, and judges that the straight line newly fitted is corresponding with the category The slope of reference line and the difference of intercept whether be less than given threshold, if being less than, set CLThe corresponding base of the middle category Line of collimation remains unchanged, otherwise, by set CLThe corresponding reference line of the middle category is updated to the straight line that will newly fit;If Set C in this stepLIn all reference line remain unchanged, then output has the concern point set C of category labelS, will The corresponding rotation rectangle region that is included of these focus is most probable text filed, otherwise return step 3.31); It clusters shown in resulting final reference line such as Fig. 6 (d), shown in final resulting point set such as Fig. 6 (e).
Step 32 determines line of text;
The fingertip location obtained according to step 2, determines the area-of-interest of a rectangle, the length of area-of-interest and original The length of image is equal, and the width of area-of-interest is set as fixed value (wide 1/6th for being set as original image), and finger tip is located at On area-of-interest bottom edge;
Step 31 is obtained text filed, as shown in Fig. 8 (a), seeks the outermost layer profile of wherein each character, profile Between consecutive points be 8 connected regions.Since for most of English alphabet, the point of alphabetical lowermost end is all almost at one On straight line, even if being also such when there is rotation.Therefore, the point of the lowermost end of each profile can be chosen as datum mark, then In the region of interest whether according to datum mark, if it was not then filtering out these datum marks, such as Fig. 7 (b), subsequent to sense Datum mark in interest region is operated.It is worth noting that have some special letters, and such as g, q, y, j, the letter such as p Datum mark is all in ideal line of text hereinafter, using the datum mark of these letters as abnormal datum mark.Selection three is adjacent every time Datum mark, using minimize these three adjacent datum marks to straight line sum of the distance as target, carry out straight line fitting, obtain A plurality of straight line.In order to remove those by abnormal datum mark fit Lai straight line, for it is all fit come straight lines, point It does not score, scoring formula is as follows:
Wherein, d (i) is i-th of datum mark filtering out to the distance of straight line, and n is the sum of the datum mark filtered out, μscoreFor score, this score is lower, and to represent result better.Thus the minimum straight line of score can be chosen and be used as and determine line, then root Abnormal datum mark is filtered out according to whether each datum mark filtered out to the distance for determining line is less than given threshold, then based on remaining All datum marks, using minimize these datum marks to straight line sum of the distance as target, carry out straight line fitting, fit Straight line is the line of text of user's instruction, as shown in Fig. 7 (c).
Step 4: extracting the word in the line of text of user's instruction, be converted into voice output.Concrete processing procedure is such as Under:
Step 41, the first word of identification;
Step 31 is extracted text filed, extract its part being located in area-of-interest, as target text One's respective area;The minimum circumscribed rectangle (rotation rectangle) for seeking each character in target text region respectively, as a letter Frame;According to difference of the distance at two adjacent letters frame centers in word between word at a distance from two adjacent letters frame centers It is different, alphabetical frame is clustered, word is synthesized.Along line of text, if current letter frame is less than at a distance from next alphabetical frame Threshold value, then it is assumed that the two alphabetical frames belong to a word, and the threshold value is set as 20 pixels in the present embodiment, repeats the above process Until all alphabetical frames in line of text all pass through this judgement, the minimum for seeking belonging to all alphabetical frames in a word is outer Rectangle is connect as word frame;
In view of the image of input be frequently not in the horizontal direction, and due to letter rotation can be accurate to what is finally identified Rate has larger impact, therefore angled (being determined according to text line slope) according to line of text in image and horizontal direction institute, to figure As carrying out angle compensation, make to do subsequent processing again after wherein line of text is rotated to horizontal direction, as shown in Fig. 8 (b) and figure (c).
The first word frame along line of text is chosen, is then identified using tesseract OCR recognition engine, is known Other returned result includes: word, word confidence level and word frame.When the word confidence level of return is greater than threshold value (in experiment When being set as 80), then it is assumed that word is correctly validated, and carrying out voice output to the word correctly identified, (word can be loud It is bright to read out).
It is worth noting that the fields of interest of above-mentioned identification operation is the central region of image, due to shooting in identification process Homography between image and paper caused by angle problem is influenced due to influencing on identification is substantially no (due to shooting angle It is not vertically to shoot, but shot along the direction of finger tip, the text in captured image out is relative to true Text can be deformed, but area-of-interest be image middle part, be deformed it is small, so to accuracy rate substantially without shadow Ring), therefore the disturbing factor can be ignored.
Step 42 tracks position of the word frame of the word correctly identified in subsequent image frames using template matching method It sets, to determine the new word region in subsequent image frames, identifies new word in new word region;
I. image causes image definition not high due to motion blur etc. when actually identification, this can be to word Correct tracking impacts, therefore first carries out binary conversion treatment to each frame image;
S=1 is initialized, finger tip speed V is initializedfingertip;J-th of the word identified in l frame image is corresponding Word frame and it includes all alphabetical frames as the word frame and letter frame for needing to track;It will be interested in m frame image Region is as region of search;Initialize m=l+1;
Ii. the word frame/letter frame tracked to needs, tracks its position in region of search using the method for template matching It sets;
The present invention uses standard deviation matching algorithm, that is, minimizes such as minor function:
In above formula, T (x', y') is indicated on the word frame/letter frame for needing to track in l frame image at coordinate (x', y') Pixel value, I (x+x', y+y') indicates the pixel value in region of search at coordinate (x+x', y+y'), and (x, y) represents the field of search The left upper apex coordinate in domain.Rsq_diffIt is smaller, indicate that matching is more successful.Therefore, it is given in actual operation when this index value is less than When threshold value, that is, it is considered successful match, i.e. word frame/letter frame tracks successfully;
If currently the word frame tracked is needed to track successfully, iii is entered step;Otherwise judge whether to have before tracking at The word frame of function, if so, then first determining one on m frame image and in the newest matching position right for tracking successfully word frame It is overlapped on the right of dead line, dead line and matching position;Judge whether the width on the dead line range image left side is less than setting again Threshold value (0.6 times of image level width), if then using the target text region on the right of dead line as new word region, after Continue and identify new word in new word region, so that word is read and finger synchronization;Otherwise terminate new word identification, thus The tracking and identification of word can be completed;If before without tracking successful word frame, using m frame image as current figure Picture executes step 1~step 41 to it, carries out the identification of first word;
If continuous a few frame images all do not successfully track any word frame, then it is assumed that tracking failure, such case is often Be as movement it is too fast caused by;
Iii. finger tip speed V is updatedfingertip:
Wherein, Vword,jRepresent the water for tracking position of successful j-th of word frame in l frame image and m frame image Flat range difference, Vletter,kSuccessful k-th of tracking alphabetical frame is represented in the water of the position in l frame image and m frame image Flat range difference, N1Represent the word frame number of successful match, N2Represent the alphabetical frame number of successful match;
Here there are two the effects for calculating finger tip speed, one is can judge the meeting after which frame image of word frame accordingly Image is moved out (if the abscissa of word frame/letter frame left upper apex subtracts V on l frame imagefingertip× (m-l) is small It can determine whether that the word frame/letter frame can remove m frame image in zero).After judging that word frame can remove m frame image, no Entire word can be abandoned immediately, but can retain the remaining letter of word frame still in m frame image, retain these words Female alphabetical frame, because they can have an impact the calculating of finger tip speed according to finger tip speed definition.When judging word After female frame can remove m frame image, then corresponding letters are abandoned, these alphabetical frames are not involved in the calculating of finger tip speed.The second is mentioning The efficiency of height tracking, the i.e. not removal search whole image when carrying out word tracking next time, but one delimited according to finger tip speed Rectangular area is only tracked in the region of search as new region of search;
Iv. s=s+1 is enabled;To identified in l frame image the corresponding word frame of s-th of word and it includes each word Whether female frame can remove m frame image according to the finger tip velocity estimated word frame and each alphabetical frame first;To judge will not Remove word frame/letter frame that word frame/letter frame of current hardwood image is tracked as needs;And delimit a rectangular area As new region of search, upper left corner abscissa=upper word frame tracked upper left corner of the rectangular area is horizontal Coordinate-finger tip speed-setting deviant (is set as 15 pixels) in the embodiment of the present invention, the length of rectangular area=one upper The length of the word frame tracked+finger tip speed, width+setting of wide=upper one word frame tracked of rectangular area Deviant (is set as 30 pixels) in the present embodiment, due to not moved in parallel when actual finger is mobile, but may be adjoint It is to move up downwards, therefore certain deviation is added, deviation is obtained according to experiment;It calculates black in new region of search The ratio of color pixel and white pixel abandons the frame if this ratio is less than given threshold (being set as 20% in the present embodiment) Image does not continue the tracking and identification that carry out word frame in this frame image, and enable m=m+1, again as shown in Fig. 9 (b) Carry out aforementioned judgement and processing until black picture element in new region of search and the ratio of white pixel are not less than given threshold, then It enters step 5);
V. return step ii.
It should be noted that disclosed above is only specific example of the invention, the thought provided according to the present invention, ability The technical staff in domain can think and variation, should all fall within the scope of protection of the present invention.

Claims (7)

1. a kind of text detection and recognition methods for blind person's aid reading, which comprises the following steps:
Step 1: for the image sequence of camera shooting, judging whether the scene in present image is that finger is placed on reading text On, if then carrying out step 2, the frame image is otherwise skipped, using next frame image as present image, carries out above-mentioned judgement and place Reason;
Step 2: user's finger tip is positioned in present image;
Step 3: according to the position of user's finger tip, determining the line of text of user's instruction;
Step 4: extracting the word in the line of text of user's instruction, be converted into voice output.
2. the text detection and recognition methods according to claim 1 for blind person's aid reading, which is characterized in that described Judge whether the scene in present image is that the method that finger is placed on reading text is as follows in step 1:
Step 11 shoots some typical text filed images comprising user's finger and its place by camera in advance, protects It is stored in database;
Step 12, multiple images shot using present image and in the image front are as sample image;
The rgb color space of image and sample image in database is normalized in step 13 respectively;
Step 14, for each sample, each figure in red channel image after calculating separately its normalized and database As the Euclidean distance of the red channel image after normalized, using the minimum value in result as the matching score of the sample; Seek the mean μ of all sample matches scoresImWith variances sigmaImIf μImIm< Th then assert that the scene in present image is hand Finger, which is placed on, to be read on text, and wherein Th is threshold value, is empirical parameter.
3. the text detection and recognition methods according to claim 2 for blind person's aid reading, which is characterized in that described In step 14, all images are all narrowed down to and are sized, then calculate Euclidean distance.
4. the text detection and recognition methods according to claim 1 for blind person's aid reading, which is characterized in that described Step 2 the following steps are included:
Step 21, the candidate region that user's finger tip is found using K-means;
Firstly, being filtered using Gaussian filter to present image;
Then, three two-dimensional matrixes are generated according to the image in three channels in filtered image, the member in each two-dimensional matrix Element value is the pixel value of respective point on the image of respective channel;
To each two-dimensional matrix, its all column summation is averaged again, obtains the column vector m of row × 1c_ave;By its institute There is row summation to be averaged again, obtains the row vector m of a 1 × colr_ave;Present image is converted to three column vectors as a result, With three row vectors, wherein col indicates that total columns of two-dimensional matrix, row indicate total line number of two-dimensional matrix;
Using each dimension of column vector as a longitudinal data point, using the component of three column vector identical dimensionals as corresponding Three features of longitudinal data point, constitute the feature vector of the longitudinal data point, and the number of longitudinal data point is equal to column vector Dimension, i.e. row;Using each dimension of row vector as a lateral data point, the component of three row vector identical dimensionals is made For three features of corresponding lateral data point, the feature vector of the transverse direction data point is constituted, the number of lateral data point is equal to The dimension of row vector, i.e. col;
Secondly, being clustered respectively to longitudinal data point and lateral data point using K-means, clusters number is 2;
Again, the result of the cluster of longitudinal data point is expressed as a longitudinal label vector, be row × 1 column to Amount, the representation in components of each dimension are the label of the longitudinal data point of respective dimensions, and value is 0 or 1;By the poly- of lateral data point The result of class is expressed as a lateral label vector, is the row vector of a 1 × col, and the representation in components of each dimension is phase The label of the lateral data point of dimension is answered, value is 0 or 1;
Mean filter is first carried out to longitudinal label vector and lateral label vector respectively, then carries out threshold process, if certain dimension is first Element value is greater than or equal to given threshold, then is set to 1, is otherwise arranged 0, obtain final longitudinal label vector and cross To label vector;
By corresponding horizontal line of the separation of element value 0 and 1 in longitudinal label vector in present image and lateral label to The intersection point of corresponding vertical line of the left side separation of element value 0 and 1 in present image is left upper apex in amount, delimits one Candidate region of the rectangular area as user's finger tip;
Step 22 positions finger tip by calculating curvature;
Firstly, seeking the edge in the candidate region of user's finger tip using canny operator, connection edge obtains profile;If obtaining Multiple profiles then only retain the profile for being not less than given threshold comprising pixel number;
Then, the profile remained is smoothed;
Finally, calculating the curvature of each pixel on profile, the point that curvature is zero is that user refers to the profile after smoothing processing Sharp position.
5. the text detection and recognition methods according to claim 4 for blind person's aid reading, which is characterized in that use During K-means clusters longitudinal data point/transverse direction data point, two group cluster center of random initializtion carries out two Secondary cluster assesses the compactness of cluster result twice, chooses the good cluster result of compactness as final cluster result.
6. the text detection and recognition methods according to claim 1 for blind person's aid reading, which is characterized in that described Step 3 the following steps are included:
Step 31, text filed extraction;
Firstly, present image is converted into gray level image;
Then, binary conversion treatment is carried out to gray level image, obtains the foreground area and background region of image;
The non-textual region in background region is excluded again, the method is as follows:
Connected region all in background region is first extracted, set C is constitutedR;Then to set CRIn each connected region ask It takes it to rotate rectangle, is denoted as ζ (ο, θ, w, h), wherein o represents the center of rotation rectangle, and θ represents the angle that rotation rectangle is deflected Degree, is that trunnion axis rotates counterclockwise, the angle with a line for the rotation rectangle encountered, and w and h respectively represent rotation rectangle phase Two adjacent sides;After filtering CRThe area of middle rotation rectangle and the angle deflected do not meet the connected region of constraint condition, For remaining connected region, further filtered based on the relationship between text filed, specific implementation the following steps are included:
3.1) using the left upper apex of present image as origin O, present image length direction is y-axis, and right direction is taken to be positive, current to scheme Picture width direction is x-axis, removes direction and is positive;Using the rotation rectangular centre of each connected region R as a focus, will work as Every in preceding image is crossed the straight line of focus, is expressed as form:
xcosθSL+ysinθSLSL
Wherein, θSLFor the angle of straight line and x-axis, ρSLFor the distance of origin O to straight line, θSLValue range (- pi/2, pi/2), ρSL Value range be (- D, D), D be camera shooting the cornerwise length of original image;
3.2) (ρSLSL) parameter space is subdivided into multiple accumulator elements, by coordinate (ρkk) at accumulator element value It is denoted as A (ρkk);Whole integrating instrument units are both configured to zero first, then calculate separately each focus (xi,yi) to directly Line xcos θk+ysinθkkDistance d:
D=| xi cosθk+yi sinθkk|;
To all focus to straight line xcos θk+ysinθkkDistance successively judged often have a focus corresponding Distance is less than threshold value, then A (ρkk) value adds 1, after having judged, obtain final A (ρkk) value;If A (ρkk) value be higher than should It is with reference to straight line that threshold value, which is considered as respective straight, and the reference vertical element number remembered is N;
3.3) text filed, detailed process is found by the method that unsupervised line clusters are as follows:
3.31) input concern point set, initializes reference line set CL, including N reference line, respectively step 3.2) the N item acquired refers to straight line;
3.32) calculating pays close attention to all focus in point set to set CLIn every reference line distance;For each concern Point, takes lowest distance value;Filter out the point that lowest distance value is less than given threshold;It, will to the point of interest marker classification filtered out The focus that lowest distance value corresponds to same reference line is attributed to same class;
3.33) straight line fitting is carried out to the focus of the same category, and judges the straight line base corresponding with the category newly fitted Whether the slope of line of collimation and the difference of intercept are less than given threshold, if being less than, set CLThe corresponding benchmark of the middle category is straight Line remains unchanged, otherwise, by set CLThe corresponding reference line of the middle category is updated to the straight line that will newly fit;If this step Middle set CLIn all reference line remain unchanged, then output has the concern point set C of category labelS, these focus The region that corresponding rotation rectangle is included is to extract text filed, otherwise return step 3.31);
Step 32 determines line of text;
The fingertip location obtained according to step 2 determines that the area-of-interest of a rectangle, finger tip are located at the bottom of area-of-interest Bian Shang;
Step 31 is extracted text filed, seeks the outermost layer profile of wherein each character;Choose the most bottom of each profile The point at end filters out the datum mark in area-of-interest as datum mark;
For the datum mark filtered out, three adjacent datum marks are chosen every time, are carried out straight line fitting, are obtained a plurality of straight line;
The straight line come is fitted for all, is scored respectively, scoring formula is as follows:
Wherein, d (i) is the distance of i-th of datum mark filtering out to straight line, and n is the datum mark sum filtered out, μscoreFor Point;
The minimum straight line of score is chosen as judgement line;It filters out in the datum mark filtered out and is less than setting to the distance of judgement line The datum mark of threshold value, then remaining all datum marks are based on, straight line fitting is carried out, the straight line fitted is the text of user's instruction Current row.
7. the text detection and recognition methods according to claim 6 for blind person's aid reading, which is characterized in that described The concrete processing procedure of step 4 is as follows:
Step 41, the first word of identification;
Step 31 is extracted text filed, its part being located in area-of-interest is extracted, as target text area Domain;The minimum circumscribed rectangle for seeking each character in target text region respectively, as an alphabetical frame;According to two in word Difference of the distance at a adjacent letters frame center between word at a distance from two adjacent letters frame centers, gathers alphabetical frame Class synthesizes word;Seek belonging to the minimum circumscribed rectangle of all alphabetical frames in a word as word frame;
The line of text and horizontal direction institute indicated according to user in image is angled, carries out angle compensation to image, makes its Chinese Current row is rotated to horizontal direction;
The first word frame along line of text is chosen, is then identified using OCR identification technology, identifies returned result It include: word, word confidence level and word frame;When the word confidence level of return is greater than threshold value, then it is assumed that word is correctly known Not, voice output is carried out to the word correctly identified;
Step 42 tracks position of the word frame of the word correctly identified in subsequent image frames using template matching method, with It determines the new word region in subsequent image frames, identifies new word in new word region;
I. binary conversion treatment is carried out to each frame image;
S=1 is initialized, finger tip speed V is initializedfingertip;The corresponding word of j-th of word that will be identified in l frame image Frame and it includes all alphabetical frames as the word frame and letter frame for needing to track;By the area-of-interest in m frame image As region of search;Initialize m=l+1;
Ii. the word frame/letter frame tracked to needs, tracks its position in region of search using the method for template matching;
If currently the word frame tracked is needed to track successfully, iii is entered step;Otherwise judge whether have tracking successful before Word frame, if so, then first determining one article of cut-off on m frame image and in the newest matching position right for tracking successfully word frame It is overlapped on the right of line, dead line and matching position;Judge whether the width on the dead line range image left side is less than given threshold again, if It is then using the target text region on the right of dead line as new word region, continuation identifies new word in new word region; Otherwise terminate new word identification;If before without tracking successful word frame, using m frame image as present image, to it Step 1~step 41 is executed, the identification of first word is carried out;
Iii. finger tip speed V is updatedfingertip:
Wherein, Vword,jRepresent track the position of successful j-th of word frame in l frame image and m frame image it is horizontal away from Deviation, Vletter,kRepresent successful k-th alphabetical frame of tracking the position in l frame image and m frame image it is horizontal away from Deviation, N1Represent the word frame number of successful match, N2Represent the alphabetical frame number of successful match;
Iv. s=s+1 is enabled;
To identified in l frame image the corresponding word frame of s-th of word and it includes each of alphabetical frame, according to finger tip speed Judge whether the word frame and each alphabetical frame can remove m frame image, if word frame/letter frame upper left on l frame image The abscissa on vertex subtracts Vfingertip× (m-l) determines that the word frame/letter frame can remove m frame image less than zero;
Word frame/letter frame that word frame/letter frame that judgement will not remove current hardwood image is tracked as needs;And it delimit One rectangular area is as new region of search, upper left corner abscissa=upper word frame tracked of the rectangular area Upper left corner abscissa-finger tip speed-setting deviant, length=upper word frame tracked length of rectangular area+ Finger tip speed, width+setting deviant of wide=upper one word frame tracked of rectangular area;
The ratio of black picture element and white pixel in new region of search is calculated, if this ratio is less than given threshold, abandoning should Frame image, and m=m+1 is enabled, aforementioned judgement and processing are re-started until black picture element and white pixel in new region of search Ratio be not less than given threshold, then enter step v;Otherwise it is directly entered step v;
V. return step ii.
CN201910501311.9A 2019-06-11 2019-06-11 Text detection and identification method for assisting reading of blind people Active CN110458158B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910501311.9A CN110458158B (en) 2019-06-11 2019-06-11 Text detection and identification method for assisting reading of blind people

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910501311.9A CN110458158B (en) 2019-06-11 2019-06-11 Text detection and identification method for assisting reading of blind people

Publications (2)

Publication Number Publication Date
CN110458158A true CN110458158A (en) 2019-11-15
CN110458158B CN110458158B (en) 2022-02-11

Family

ID=68480723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910501311.9A Active CN110458158B (en) 2019-06-11 2019-06-11 Text detection and identification method for assisting reading of blind people

Country Status (1)

Country Link
CN (1) CN110458158B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597956A (en) * 2020-05-12 2020-08-28 四川久远银海软件股份有限公司 Picture character recognition method based on deep learning model and relative orientation calibration
CN112200738A (en) * 2020-09-29 2021-01-08 平安科技(深圳)有限公司 Method and device for identifying protrusion of shape and computer equipment
CN114419144A (en) * 2022-01-20 2022-04-29 珠海市一杯米科技有限公司 Card positioning method based on external contour shape analysis
CN115019181A (en) * 2022-07-28 2022-09-06 北京卫星信息工程研究所 Remote sensing image rotating target detection method, electronic equipment and storage medium
CN115909342A (en) * 2023-01-03 2023-04-04 湖北瑞云智联科技有限公司 Image mark recognition system and method based on contact point motion track
CN116740721A (en) * 2023-08-15 2023-09-12 深圳市玩瞳科技有限公司 Finger sentence searching method, device, electronic equipment and computer storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090231278A1 (en) * 2006-02-08 2009-09-17 Oblong Industries, Inc. Gesture Based Control Using Three-Dimensional Information Extracted Over an Extended Depth of Field
CN102646194A (en) * 2012-02-22 2012-08-22 大连理工大学 Method for performing printer type evidence obtainment by utilizing character edge features
US20140176689A1 (en) * 2012-12-21 2014-06-26 Samsung Electronics Co. Ltd. Apparatus and method for assisting the visually impaired in object recognition
US20160328604A1 (en) * 2014-01-07 2016-11-10 Arb Labs Inc. Systems and methods of monitoring activities at a gaming venue
CN106650628A (en) * 2016-11-21 2017-05-10 南京邮电大学 Fingertip detection method based on three-dimensional K curvature
CN107209563A (en) * 2014-12-02 2017-09-26 西门子公司 User interface and the method for operating system
CN107949851A (en) * 2015-09-03 2018-04-20 戈斯蒂冈有限责任公司 The quick and robust control policy of the endpoint of object in scene
CN109377834A (en) * 2018-09-27 2019-02-22 成都快眼科技有限公司 A kind of text conversion method and system of helping blind people read

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090231278A1 (en) * 2006-02-08 2009-09-17 Oblong Industries, Inc. Gesture Based Control Using Three-Dimensional Information Extracted Over an Extended Depth of Field
CN102646194A (en) * 2012-02-22 2012-08-22 大连理工大学 Method for performing printer type evidence obtainment by utilizing character edge features
US20140176689A1 (en) * 2012-12-21 2014-06-26 Samsung Electronics Co. Ltd. Apparatus and method for assisting the visually impaired in object recognition
US20160328604A1 (en) * 2014-01-07 2016-11-10 Arb Labs Inc. Systems and methods of monitoring activities at a gaming venue
CN107209563A (en) * 2014-12-02 2017-09-26 西门子公司 User interface and the method for operating system
CN107949851A (en) * 2015-09-03 2018-04-20 戈斯蒂冈有限责任公司 The quick and robust control policy of the endpoint of object in scene
CN106650628A (en) * 2016-11-21 2017-05-10 南京邮电大学 Fingertip detection method based on three-dimensional K curvature
CN109377834A (en) * 2018-09-27 2019-02-22 成都快眼科技有限公司 A kind of text conversion method and system of helping blind people read

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FAN GUO 等: "Parameter Selection of Image Fog Removal Using Artificial Fish Swarm Algorithm", 《INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING ICIC 2018: INTELLIGENT COMPUTING THEORIES AND APPLICATION》 *
ROY SHILKROT 等: "FingerReader: A Wearable Deviceto Explore Printed Text on the Go", 《CHI"15: PROCEEDINGS OF THE 33RD ANNUAL ACM CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS》 *
黄晓林 等: "基于深度信息的实时手势识别和虚拟书写系统", 《计算机工程与应用》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597956A (en) * 2020-05-12 2020-08-28 四川久远银海软件股份有限公司 Picture character recognition method based on deep learning model and relative orientation calibration
CN112200738A (en) * 2020-09-29 2021-01-08 平安科技(深圳)有限公司 Method and device for identifying protrusion of shape and computer equipment
CN114419144A (en) * 2022-01-20 2022-04-29 珠海市一杯米科技有限公司 Card positioning method based on external contour shape analysis
CN115019181A (en) * 2022-07-28 2022-09-06 北京卫星信息工程研究所 Remote sensing image rotating target detection method, electronic equipment and storage medium
CN115909342A (en) * 2023-01-03 2023-04-04 湖北瑞云智联科技有限公司 Image mark recognition system and method based on contact point motion track
CN116740721A (en) * 2023-08-15 2023-09-12 深圳市玩瞳科技有限公司 Finger sentence searching method, device, electronic equipment and computer storage medium
CN116740721B (en) * 2023-08-15 2023-11-17 深圳市玩瞳科技有限公司 Finger sentence searching method, device, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN110458158B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
CN110458158A (en) A kind of text detection and recognition methods for blind person&#39;s aid reading
Adeyanju et al. Machine learning methods for sign language recognition: A critical review and analysis
Yi et al. Assistive text reading from complex background for blind persons
Plamondon et al. Online and off-line handwriting recognition: a comprehensive survey
CN104850825B (en) A kind of facial image face value calculating method based on convolutional neural networks
CN107491730A (en) A kind of laboratory test report recognition methods based on image procossing
CN104063059B (en) A kind of real-time gesture recognition method based on finger segmentation
CN112766159A (en) Cross-database micro-expression identification method based on multi-feature fusion
CN105426890B (en) A kind of graphical verification code recognition methods of character distortion adhesion
EP2629242A1 (en) A user wearable visual assistance device and method
TWI423146B (en) Method and system for actively detecting and recognizing placards
Mohane et al. Object recognition for blind people using portable camera
Pattanaworapan et al. Signer-independence finger alphabet recognition using discrete wavelet transform and area level run lengths
CN106485253A (en) A kind of pedestrian of maximum particle size structured descriptor discrimination method again
Kurita et al. Scale and rotation invariant recognition method using higher-order local autocorrelation features of log-polar image
CN107992483A (en) The method, apparatus and electronic equipment of translation are given directions for gesture
Vo et al. Deep learning for vietnamese sign language recognition in video sequence
Mancas-Thillou et al. A multifunctional reading assistant for the visually impaired
Hashim et al. Kurdish sign language recognition system
Agrawal et al. A Tutor for the hearing impaired (developed using Automatic Gesture Recognition)
CN112651323A (en) Chinese handwriting recognition method and system based on text line detection
Huang et al. A vision-based Taiwanese sign language Recognition
Bains et al. Dynamic features based stroke recognition system for signboard images of Gurmukhi text
Sonoda et al. A letter input system based on handwriting gestures
CN110298236A (en) A kind of braille automatic distinguishing method for image and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant