CN116343229A - Method for recognizing braille characters in natural scene with edge characteristics excavated - Google Patents

Method for recognizing braille characters in natural scene with edge characteristics excavated Download PDF

Info

Publication number
CN116343229A
CN116343229A CN202310027966.3A CN202310027966A CN116343229A CN 116343229 A CN116343229 A CN 116343229A CN 202310027966 A CN202310027966 A CN 202310027966A CN 116343229 A CN116343229 A CN 116343229A
Authority
CN
China
Prior art keywords
braille
edge
character
characters
natural scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310027966.3A
Other languages
Chinese (zh)
Inventor
卢利琼
吴东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lingnan Normal University
Original Assignee
Lingnan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lingnan Normal University filed Critical Lingnan Normal University
Priority to CN202310027966.3A priority Critical patent/CN116343229A/en
Publication of CN116343229A publication Critical patent/CN116343229A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1918Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a natural scene braille character recognition method for excavating edge features. Firstly, the invention can be directly applied to the braille images of the natural scene, and expands the application scene of braille character recognition; secondly, the invention analyzes the characteristics of braille on writing, and discovers that the braille points are arranged at the edges of the braille characters, so that the edge pixels of the braille characters are detected firstly, then the edge pixels are utilized to detect the specific positions of the braille characters, and the braille detection performance is improved; finally, the method can be used for multiple detection after one-time training, and the convolutional neural network model is obtained through training a large number of natural scene images, and only needs to be used again during detection. The invention is used for carrying out performance test on the braille images of the natural scene in the test set, the braille character detection performance Hmean value is 85.9%, and the recognition accuracy is 95.3%.

Description

Method for recognizing braille characters in natural scene with edge characteristics excavated
Technical Field
The invention belongs to the field of braille character recognition in natural scene images, and particularly relates to a method for mining the edge characteristics of braille characters in natural scene images by adopting a convolutional neural network and then recognizing the braille characters by utilizing the edge characteristics.
Background
Braille is an effective medium for visually impaired people to learn knowledge, acquire information and communicate with others. Braille character recognition aims to automatically detect the position of the Braille characters in the whole image by utilizing the technology of artificial intelligence and then recognize the Braille characters. The Braille character recognition technology is a key premise that the application of a plurality of special education products can fall to the ground, such as electronic Braille books, automatic Braille paper reading, barrier-free communication between blind students and normal people and the like.
How to efficiently recognize braille characters in an image has been an important subject for informatization research of special education. The current braille identification method research is mainly focused on braille scanning document images. The method comprises the steps of scanning a Braille text in a document image, wherein the Braille text has a cell with a fixed size and a Braille arrangement mode, and most of the prior methods adopt the method that Braille points are detected first, and then a plurality of Braille points are combined to obtain Braille characters for recognition. The current method for detecting the braille points in the scanned image is mainly divided into two types, namely a braille point detection method based on image segmentation and a braille point detection method combining the characteristic of the excavated braille points with a machine learning classification method. Image segmentation-based method first segments pixels of braille images into three parts of shadow, ray and background using local adaptive threshold values, and then detects braille points by combining rules of the three parts [14-19] . The Braille point detection method is sensitive to a threshold value, and the detection target is obtained through multiple steps and is easy to accumulate errors. In order to avoid the problems, the second type of Braille point detection method directly detects the Braille points through a Braille point feature and classification algorithm, and the common Braille point feature and classification algorithm are haar+SVM, HOG+SVM, haar, LBP, HOG+Adaboost and the like. Although the second method can directly detect the braille points, from the recognition point of view, a plurality of braille points are also required to be combined to obtain braille characters, and the problem that multiple steps cause easy accumulation errors still exists.
The existing braille recognition method is analyzed, and two problems are found to exist. The method has the advantages that the application scene is single, the braille identification effect on the braille document images which are orderly and regularly arranged is good, but the identification performance is quite unsatisfactory for the braille in the natural scene images with large arrangement and size variation; the two braille character excavating means are lack, the using technology is not novel enough, so that the recognition effect is general, and braille characters obtained through multi-step operation are easy to accumulate errors; therefore, the existing Braille recognition method is difficult to effectively support the development and landing of applications such as electronic braille books, automatic braille examination and communication between blind students and normal people, so that the characteristics of Braille characters are analyzed, a convolutional neural network is introduced into braille recognition research, and the research on braille recognition in braille images of natural scenes is a problem which needs to be solved urgently at present.
Disclosure of Invention
The invention aims to expand the application scene of braille recognition, improve the accuracy of braille character recognition, and enable the braille character recognition to be used for automatic braille examination paper, and realize barrier-free communication and communication between blind students and normal people.
In order to achieve the purpose, the invention provides a natural scene braille character recognition method for excavating braille edge characteristics, and the specific implementation of the technical scheme of the invention comprises the following steps:
step 1, collecting images containing braille in a natural scene; the method comprises the steps of carrying out a first treatment on the surface of the
Step 2, marking the Braille character position and semantic classification information in the natural scene image;
step 3, designing a convolutional neural network structure by taking the ResNet 50 as a backbone network, and fusing five characteristic layers with different sizes in the ResNet 50 structure;
step 4, constructing a prediction layer, wherein the prediction layer is used for predicting edge pixels of the Braille characters, rectangular frame positions of the Braille characters corresponding to the edge pixels and semantic types of the Braille characters on a feature layer with the size of 1/4 of the input image size;
step 5, designing a loss function according to the Braille edge pixels, the Braille character rectangular frame positions and the representations corresponding to the Braille character classification in the step 4;
step 6, randomly dividing the data set into a training set and a testing set according to a certain proportion;
step 7, training the training set by using the designed convolutional neural network and the loss function, and obtaining a training Model CNN_model after training is completed;
step 8, obtaining an image in the test set, taking the image as the input of CNN_Model, and outputting an edge pixel score matrix, a Braille character rectangular frame coordinate vector corresponding to the edge pixel and each Braille character classification value by the Model;
step 9, judging the score of the edge pixel matrix, and when the score is greater than a threshold value, considering the score as an effective edge pixel, and then obtaining an effective Braille character position and a corresponding classification from the effective edge pixel;
step 10, obtaining optimal prediction results from a plurality of Braille character rectangular frames by using an NMS algorithm, extracting Braille character classification semantic information and displaying the Braille character classification semantic information;
step 11, repeating the steps 8 to 10 to obtain braille character recognition results of braille images of the natural scene in all the test sets, and calculating the detection and recognition performances of the braille characters in the test sets;
step 12, for a natural scene braille image in a non-data set, taking the natural scene braille image as the input of CNN_model, outputting an edge pixel score matrix, a braille character rectangular frame coordinate vector corresponding to the edge pixel and each braille character classification value, and then executing the step 9 and the step 10 to obtain a prediction result.
Further, step 1 further includes performing an enhancement operation on the braille image, where the enhancement operation includes brightness enhancement, brightness reduction, sharpening, softening, contrast enhancement, and contrast reduction.
Further, in the step 2, a labelme tool is used for marking the braille characters in each original image of the natural scene image dataset, and semantic classification of the braille characters in the marking process is marked according to the sequence given in the national general braille scheme (implementation); the label of each braille character corresponds to a line of records, and the specific data format is (x 1 ,y 1 ,x 2 ,y 2 ,x 3 ,y 3 ,x 4 ,y 4 Class), where (x 1 ,y 1 ) Representing the coordinates of the upper left corner of the Braille rectangular box, (x) 2 ,y 2 ) Representing the upper right angular position, (x) 3 ,y 3 ) Representing the lower right angular position, (x) 4 ,y 4 ) Representing the lower left corner coordinates, class representing the semantics corresponding to the Braille charactersAnd classifying the labels.
Further, the method of fusing the feature layers in the step 3 is as follows;
feature layer f of H/32 XW/32 is extracted from ResNet 50 1 Feature layer f of size H/16 XW/16 2 Feature layer f of size H/8 XW/8 3 Feature layer f of size H/4 XW/4 4 And a feature layer f of size H/2 XW/2 5 Here, H and W represent the height and width of the input image, respectively, and the fused feature layer H is constructed using the following formula 1 ,h 2 ,h 3 ,h 4 And h 5
Figure BDA0004045974600000031
Wherein conv 3×3 Indicating a convolution operation with a convolution kernel of 3 x 3 size, concat indicating a join operation of two matrices, unpooling 2×2 Indicating that the feature layer is downsampled with a 2 x 2 size.
Further, in step 4, the Edge pixel prediction output is a matrix Edge with a size of H/2×w/2; the Braille character rectangular frame position prediction is output as a plurality of five-membered vectors (d 1 ,d 2 ,h 1 ,h 2 Score), where d 1 ,d 2 ,h 1 ,h 2 Indicating distances from edge pixels to four sides of the rectangular frame of the Braille character, sco () indicating the value of this prediction; each braille character semantic classification corresponds to a vector of size 64 of 0 to 63 of the Softmax function, where 1 to 63 corresponds to a braille character classification and 0 corresponds to a non-braille character.
Further, the specific implementation manner of the step 5 is as follows;
step 5.1, according to the input image and the corresponding label file, constructing a GroundTruth value of a prediction layer, reducing the image to H/2 XW/2, wherein H and W respectively represent the height and width of the input image, and the coordinates of the rectangular frames of the Braille characters in the corresponding label file are synchronously reduced;
step 5.2, according to the predicted value and the pair of the predicted layersThe corresponding GroundTruth calculates the Loss value, and the total Loss function is shown in the following formula, wherein alpha=beta is constant, and Loss geometry Loss representing braille character rectangular frame prediction, loss edge Representing Loss of edge pixel prediction, loss cla99 Representing the loss of braille semantic classification prediction;
Loss=Loss geometry +α×Loss edge +β×Loss class
step 5.2.1, the method for calculating the predicted loss of the rectangular frame of the braille character is shown in the following formula,
Figure BDA0004045974600000032
where N represents the number of braille characters,
Figure BDA0004045974600000033
and->
Figure BDA0004045974600000034
The size of the intersection and union area of the ith real braille character rectangular frame and the prediction rectangular frame is represented, the calculation method is respectively shown in the following formula,
A inter =(min(d 1g ,d 1p )+min(d 2g ,d 2p ))*(min(h 1g ,h 1p )+min(h 2g ,h 2p ))
A uni0n =(d 1p +d 2p )*(h 1p +h 2p )+(d 1g +d 2g )*(h 1g +h 2g )-A inter
wherein d is 1g ,d 2g ,h 1g ,h 2g Representing the distance from the edge pixel to four sides in the rectangular frame of the real Braille character, d 1g And d 2g Respectively representing the distance to the left frame and the right frame, h 1g And h 2g Respectively represent the distances to the upper and lower rims, d 1p ,d 2p ,h 1p ,h 2p Representing the edge pixels to four sides in the predicted braille character rectangular boxMin represents the minimum value obtained from a plurality of parameters;
step 5.2.2, loss edge The calculation method of (1) is shown in the following formula T edge And P edge Respectively representing a true value and a predicted value of the edge pixel matrix;
Figure BDA0004045974600000041
step 5.2.3, loss class The calculation method of (1) is shown in the following formula,
Figure BDA0004045974600000042
finding out the value p at the corresponding position in the semantic classification vector according to the real classification value of the Braille character semantic classification i Where y is i =1。
Further, the specific implementation manner of step 5.1 is as follows;
step 5.1.1, for a certain pixel (x, y), if the coordinates of the pixel and a certain rectangular frame of the Braille character satisfy the following formula, the pixel belongs to the Braille character Edge region, the value of the corresponding position in the matrix Edge is set to 1, otherwise, 0, and x in the formula 1 And x 3 The abscissa of pixels at the upper left corner and the lower right corner of the rectangular frame of the Braille character are respectively represented;
Figure BDA0004045974600000043
or/>
Figure BDA0004045974600000044
step 5.1.2, groundTruth (d) of the Braille rectangular frame corresponding to the edge pixel (x, y) 1 ,d 2 ,h 1 ,h 2 ) The calculation method of (2) is shown in the following formula, wherein x in the formula is as follows 1 ,y 1 And x 3 ,y 3 Abscissa and ordinate representing pixels of upper left corner and lower right corner of rectangular frame of braille characterMarking;
d 1 =x-x 1 ,d 2 =x 3 -x,h 1 =y-y 1 ,h 2 =y 3 -y.
and 5.1.3, the Braille character semantic classification corresponds to a vector with the size of 64, if the Braille character semantic classification is 60, the 60 th bit in the vector is 1, the others are 0, and the GroundTruth is sequentially set for the semantic classification values of all the Braille characters.
Further, in the step 7, the convolutional neural network training process is optimized by adopting a random gradient descent method, related parameters are set to be BatchSize=8, the maximum training times are 100,000, and the initial value of the learning rate is 10 -4 The learning rate is then dynamically set using the following formula,
Figure BDA0004045974600000045
the initial_learning rate represents the Initial learning rate, the current_step represents the Current training times, and the convolutional neural network Model CNN_model is obtained after training is completed.
Further, the NMS algorithm in step 10 is an improvement of the conventional NMS, in which the prediction results of the same row of edge pixels are compared in pairs, the conventional NMS is used to obtain the preliminary screening result, and then the multiple rows of screening results are input into the conventional NMS algorithm together to obtain the final prediction result.
Further, in step 11, the accuracy P, the regression rate R and the comprehensive index Hmean used in the text detection field are used to evaluate the performance of braille character detection, the accuracy represents the percentage of the number of correctly predicted rectangular frames of braille characters to the number of all the predicted rectangular frames of braille characters, and if the IOU of a certain rectangular frame of braille characters and the real frame area is greater than 0.5, the rectangular frames of braille characters are considered to be correctly detected; the regression rate represents the percentage of the real frames of the Braille characters which are correctly predicted, the value of the regression rate is the number of the correctly predicted Braille characters divided by the number of the real frames of all Braille characters, hmean is a comprehensive index, the value of the regression rate is calculated by P and R, and the specific calculation method is as follows;
Figure BDA0004045974600000051
the braille character recognition performance is directly calculated by adopting the accuracy, and the accuracy is directly calculated by dividing the number of correctly recognized braille characters by the total number of the detected braille characters, as shown in the following formula,
Figure BDA0004045974600000052
where RP represents the number of braille characters that are correctly recognized and FP represents the number of braille characters that are incorrectly recognized.
Compared with the prior art, the invention has the advantages that: firstly, the method can be directly applied to braille images of natural scenes, and the application scene of braille character recognition is expanded; secondly, the invention analyzes the characteristics of braille on writing, and discovers that the braille points are arranged at the edges of the braille characters, so that the edge pixels of the braille characters are detected firstly, then the edge pixels are utilized to detect the specific positions of the braille characters, and the braille detection performance is improved; finally, the method can be used for multiple detection after one-time training, and the convolutional neural network model is obtained through training a large number of natural scene images, and only needs to be used again during detection. The invention is used for carrying out performance test on the braille images of the natural scene in the test set, the braille character detection performance Hmean value is 85.9%, and the recognition accuracy is 95.3%.
Drawings
FIG. 1 is a diagram showing the correspondence between Braille characters and Pinyin.
Fig. 2 is a flowchart of an implementation of an embodiment of the present invention.
Detailed Description
The present invention will be further described in detail with reference to the drawings and examples, which are described herein for the purpose of illustrating and explaining the present invention, but not limiting the scope of the invention, for the purpose of those of ordinary skill in the art to understand and practice the present invention.
Referring to fig. 2, the implementation steps of the present invention are as follows:
step 1, constructing a braille image dataset of a natural scene
Step 1.1, the image sources of the braille image data set of the natural scene constructed by the invention are mainly two, namely, the image sources are downloaded from the Internet; secondly, the intelligent shooting equipment is used for shooting. There are 554 images in the dataset, of which 80% are used as training sets and 20% are used as test sets;
step 1.2, calling a function Brightness, sharpness, contrast corresponding to the class imageenhancement in the Python language PIL module to realize image enhancement from the aspects of brightness enhancement, brightness reduction, sharpening, softening, contrast enhancement and contrast reduction; after image enhancement is realized, 554 original acquired braille images are added with 6 images generated after each original image data enhancement, and the braille images in the data set share: 554+554×6=3878.
Step 2, making a data tag: marking braille characters in the natural scene image from two aspects, namely marking the positions of rectangular frames where the braille characters are positioned, marking the semantic classification of the braille characters, and finally synthesizing the two marks into a txt file;
step 2.1, marking the braille characters in each original image of the natural scene image dataset by using a labelme tool, wherein the semantic classification of the braille characters in the marking process is marked according to the sequence given in fig. 1 (numerals 1 to 63). Here, the image after image enhancement can share a label file with the corresponding original image without marking, and 554 label files (xml files) are required to be provided for the corresponding original image in the data set, and the label files can be named as the corresponding original image, and the suffix name is xml.
Step 2.2 converting 554. Xml files into. Txt files of the same name using Python code, in each. Txt file, the tag of each Braille character corresponds to a row of records, the specific data format is (x 1 ,y 1 ,x 2 ,y 2 ,x 3 ,y 3 ,x 4 ,y 4 Class), where (x 1 ,y 1 ) Representing the coordinates of the upper left corner of the Braille rectangular box, (x) 2 ,y 2 ) Representing the upper right angular position, (x) 3 ,y 3 ) Representing the lower right angular position, (x) 4 ,y 4 ) And the lower left corner coordinates are represented, and Class represents the semantic classification label corresponding to the braille character.
Step 2.3, according to the corresponding relation between Braille characters and pinyin shown in fig. 1, writing the corresponding relation between Braille characters and pinyin into a file braille_seg.txt, wherein the file has 63 rows in total, and each row corresponds to one Braille character. Each line of information starts with a classification value (1 to 63) for the braille character, followed by a colon (":"), the pinyin corresponding to the braille character is written after the colon, and if there are multiple pinyins corresponding to one braille character, the middle is separated by a comma.
Step 3, constructing a convolutional neural network
The invention is developed based on a Tensorflow 1.4 platform, and the construction of a backbone network is finished according to a convolution layer, a Pooling layer and various activation functions in a ResNet 50 structure; the feature layer f of size (H/32 XW/32) was then extracted for ResNet 50 1 Feature layer f of size (H/16 XW/16) 2 Feature layer f of size (H/8 XW/8) 3 Feature layer f of size (H/4 XW/4) 4 And a feature layer f of size (H/2 XW/2) 5 The fused feature layer h is constructed using the following formula 1 ,h 2 ,h 3 ,h 4 And h 5 . Here, H and W represent the height and width of the input image, respectively.
Figure BDA0004045974600000071
Conv in the formula 3×3 Indicating a convolution operation with a convolution kernel of 3 x 3 size, concat indicating a join operation of two matrices, unpooling 2×2 Indicating that the feature layer is downsampled with a 2 x 2 size, these operations can be done by directly calling the function.
Step 4, at feature layer h 5 Upper structureThe prediction layer is used for edge pixel prediction, braille character rectangular frame position prediction and Braille character semantic classification prediction. The predicted output of the Edge pixels is a matrix Edge with the size of (H/2 XW/2); the Braille character rectangular frame position prediction is output as a plurality of five-membered vectors (d 1 ,d 2 ,h 1 ,h 2 Score), where d 1 ,d 2 ,h 1 ,h 2 Representing distances from edge pixels to four sides of the rectangular frame of the Braille character, score representing the value of the prediction at this time; each braille character semantic classification corresponds to a vector of size 64 of 0 to 63 of the Softmax function, where 1 to 63 correspond to the braille character classification shown in fig. 1 and 0 corresponds to a non-braille character.
Step 5, construction loss calculation method
And 5.1, constructing a GroundTruth value of a prediction layer according to the input image (jpg file) and a corresponding tag file (txt file), and reducing the image to (H/2 XW/2), wherein the coordinates of the Braille character rectangular frame in the corresponding tag file are synchronously reduced. This process may be subdivided into the following sub-steps:
step 5.1.1, for a certain pixel (x, y), if the coordinates of the pixel and a certain braille character rectangular frame satisfy the following formula, the pixel belongs to the Edge area of the braille character, the value of the corresponding position in the matrix Edge is set to be 1, otherwise, the value is set to be 0. X in the formula 1 And x 3 The abscissa of the pixels in the upper left and lower right corners of the rectangular frame of the braille character is indicated, respectively.
Figure BDA0004045974600000072
or/>
Figure BDA0004045974600000073
Step 5.1.2, groundTruth (d) of the Braille rectangular frame corresponding to the edge pixel (x, y) 1 ,d 2 ,h 1 ,h 2 ) The calculation method of (2) is shown in the following formula, wherein x in the formula is as follows 1 ,y 1 And x 3 ,y 3 Respectively representing images of the upper left corner and the lower right corner of a rectangular frame of the Braille characterThe abscissa and ordinate of the element.
d 1 =x-x 1 ,d 2 =x 3 -x,h 1 =y-y 1 ,h 2 =y 3 -y.
And 5.1.3, the Braille character semantic classification corresponds to a vector with the size of 64, if the Braille character semantic classification is 60, the 60 th bit (the least bit is 0) in the vector is 1, the other bits are 0, and GroundTruth is sequentially set for the semantic classification values of all the Braille characters.
Step 5.2, calculating a loss value according to the predicted value of the predicted layer and the corresponding GroundTruth, wherein the total loss function is shown in the following formula, alpha=beta=0.1, and loss geome2ry Loss representing braille character rectangular frame prediction, loss edge Representing Loss of edge pixel prediction, loss class Representing the loss of braille semantic classification prediction.
Loss=Loss geome2ry +α×Loss edge +β×Loss class
Step 5.2.1, the method for calculating the predicted loss of the rectangular frame of the braille character is shown in the following formula,
Figure BDA0004045974600000081
where N represents the number of braille characters,
Figure BDA0004045974600000082
and->
Figure BDA0004045974600000083
The size of the intersection and union area of the ith real braille character rectangular frame and the prediction rectangular frame is represented, the calculation method is respectively shown in the following formula,
A in2er =(min(d 1 g,d 1p )+min(d 2 g,d 2p ))*(min(h 1 g,h 1p )+min(h 2 g,h 2p ))
A uni0n =(d 1p +d 2p )*(h 1p +h 2p )+(d 1g +d 2g )*(h 1g +h 2g )-A inter
wherein d is 1g ,d 2g ,h 1g ,h 2g Representing the distance from the edge pixel to four sides in the rectangular frame of the real Braille character, d 1g And d 2g Respectively representing the distance to the left frame and the right frame, h 1g And h 2g Representing the distance to the upper and lower rims, respectively. d, d 1p ,d 2p ,h 1p ,h 2p The distance of the edge pixels to the four sides in the predicted braille character rectangular box is represented. min represents the minimum value obtained from a plurality of parameters.
Step 5.2.2, loss edge The calculation method of (1) is shown in the following formula T edge And P edge Representing the actual and predicted values of the edge pixel matrix, respectively.
Figure BDA0004045974600000084
Step 5.2.3, loss class As shown in the following formula, the value p at the corresponding position in the semantic classification vector is found according to the real classification value (0 to 63) of the Braille character semantic classification i Where y is i =1。
Figure BDA0004045974600000085
And 6, calling a Python function to randomly divide the braille images in the natural scene image data set into a training set and a testing set, wherein 80% of the braille images are used as the training set, and 20% of the braille images are used as the testing set.
And 7, starting training by utilizing a convolutional neural network, a prediction layer representation and loss calculation method and images in a natural scene braille image training set which are constructed in advance, wherein a platform involved in the training process is TensorFlow 1.4, programming language is Python, main hardware is an HP image server, and a display card is GTX 2080Ti. During training, all input images are adjusted to be 512 multiplied by 512 in size, and the convolutional neural networkThe training process adopts a random gradient descent method (Stochastic Gradient Descent, SGD) to optimize, related parameters are set as BatchSize=8, the maximum training times are 100,000, and the initial value of learning rate is 10 -4 Then dynamically setting a dynamic learning rate by adopting the following formula, wherein the initial_learning rate represents the Initial learning rate, the current_step represents the Current training times, and the convolutional neural network Model CNN_model is obtained after training is completed.
Figure BDA0004045974600000086
And 8, acquiring an image from the test set, adjusting the size of the image to 1024, and adjusting the height of the image under the condition that the aspect ratio is kept unchanged. The image is then used as input to a cnn_model, which outputs an edge pixel prediction matrix, a braille character rectangular frame prediction vector, and a braille character semantic classification prediction vector.
And 9, judging the value in the edge pixel matrix, and when the value is greater than or equal to 0.8, considering the value as an effective edge pixel, and then acquiring a corresponding Braille character rectangular frame and semantic classification information according to the effective edge pixel.
And step 10, screening the predicted result, and obtaining the optimal predicted result from a plurality of Braille character rectangular frames by using an improved NMS algorithm. With this improvement, the time complexity can be preferably selected from the group consisting of O (n 2 ) Down to O (n). Aiming at the filtered Braille characters, extracting Braille character classification semantic information from Braille_seg.txt and displaying the Braille character classification semantic information;
and 11, repeating the steps 8 to 10 to obtain braille character recognition results of the braille images of the natural scene in the test set, and calculating the detection and recognition performances of the braille characters in the test set. The invention uses the accuracy (P), regression rate (R) and comprehensive index (Hmean) used in the text detection field to evaluate the braille character detection performance. The accuracy represents the percentage of the number of correctly predicted rectangular frames of the Braille characters to the number of all the predicted rectangular frames of the Braille characters. If the IOU of the area of a certain braille character detection frame and the real frame area is larger than 0.5, the braille character rectangular frame is considered to be correctly detected. The regression rate represents the percentage of the real frames of the Braille characters that are correctly predicted, and its value is the number of correctly predicted Braille characters divided by the number of the real frames of all Braille characters. Hmean is a comprehensive index, the value of which is calculated by P and R, and the specific calculation method is as follows;
Figure BDA0004045974600000091
the Braille character recognition performance is directly calculated by adopting the accuracy, namely the recognition accuracy is obtained by directly dividing the number of correctly recognized Braille characters by the total number of detected Braille characters, wherein RP represents the number of correctly recognized Braille characters and FP represents the number of wrongly recognized Braille characters;
Figure BDA0004045974600000092
step 12, for a braille image of a natural scene in the data set, adjusting the size of the image to 1024, and adjusting the height of the image under the condition that the aspect ratio is kept unchanged, then, as the input of CNN_model, outputting an edge pixel score matrix, a braille character rectangular frame coordinate vector corresponding to the edge pixel and each braille character semantic classification vector, and then executing the steps 9 and 10 to obtain a prediction result.
The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims (10)

1. A natural scene braille character recognition method for excavating braille edge features is characterized by comprising the following steps:
step 1, collecting images containing braille in a natural scene;
step 2, marking the Braille character position and semantic classification information in the natural scene image;
step 3, designing a convolutional neural network structure by taking the ResNet 50 as a backbone network, and fusing five characteristic layers with different sizes in the ResNet 50 structure to obtain a fused characteristic layer;
step 4, constructing a prediction layer, wherein the prediction layer is used for predicting edge pixels of the Braille characters, rectangular frame positions of the Braille characters corresponding to the edge pixels and semantic types of the Braille characters on a fusion feature layer with the size of 1/4 of the input image size;
step 5, designing a loss function according to the Braille edge pixels, the Braille character rectangular frame positions and the representations corresponding to the Braille character classification in the step 4;
step 6, randomly dividing the data set into a training set and a testing set according to a certain proportion;
step 7, training the training set by using the designed convolutional neural network and the loss function, and obtaining a training Model CNN_model after training is completed;
step 8, obtaining an image in the test set, taking the image as the input of CNN_Model, and outputting an edge pixel score matrix, a Braille character rectangular frame coordinate vector corresponding to the edge pixel and each Braille character classification value by the Model;
step 9, judging the score of the edge pixel matrix, and when the score is greater than a threshold value, considering the score as an effective edge pixel, and then obtaining an effective Braille character position and a corresponding classification from the effective edge pixel;
step 10, obtaining optimal prediction results from a plurality of Braille character rectangular frames by using an NMS algorithm, extracting Braille character classification semantic information and displaying the Braille character classification semantic information;
step 11, repeating the steps 8 to 10 to obtain braille character recognition results of braille images of the natural scene in all the test sets, and calculating the detection and recognition performances of the braille characters in the test sets;
step 12, for a natural scene braille image in a non-data set, taking the natural scene braille image as the input of CNN_model, outputting an edge pixel score matrix, a braille character rectangular frame coordinate vector corresponding to the edge pixel and each braille character classification value, and then executing the step 9 and the step 10 to obtain a prediction result.
2. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: step 1 further comprises performing an enhancement operation on the braille image, the enhancement operation comprising brightness enhancement, brightness reduction, sharpening, softening, contrast enhancement and contrast reduction.
3. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: marking braille characters in each original image of the natural scene image dataset by using a labelme tool, wherein semantic classification of the braille characters in the marking process is marked according to the sequence given in the national general braille scheme (implementation); the label of each braille character corresponds to a line of records, and the specific data format is (x 1 ,y 1 ,x 2 ,y 2 ,x 3 ,y 3 ,x 4 ,y 4 Class), where (x 1 ,y 1 ) Representing the coordinates of the upper left corner of the Braille rectangular box, (x) 2 ,y 2 ) Representing the upper right angular position, (x) 3 ,y 3 ) Representing the lower right angular position, (x) 4 ,y 4 ) And the lower left corner coordinates are represented, and Class represents the semantic classification label corresponding to the braille character.
4. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: the method for fusing the feature layers with different scales in the step 3 is as follows;
feature layer f of H/32 XW/32 is extracted from ResNet 50 1 Feature layer f of size H/16 XW/16 2 Feature layer f of size H/8 XW/8 3 Feature layer f of size H/4 XW/4 4 And a feature layer f of size H/2 XW/2 5 Here, H and W represent the height and width of the input image, respectively, and the fused feature layer H is constructed using the following formula 1 ,h 2 ,h 3 ,h 4 And h 5
Figure FDA0004045974590000021
Wherein conv 3×3 Indicating a convolution operation with a convolution kernel of 3 x 3 size, concat indicating a join operation of two matrices, unpooling 2×2 Indicating that the feature layer is downsampled with a 2 x 2 size.
5. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: in step 4, the predicted output of the Edge pixels is a matrix Edge with the size of H/2 XW/2; the Braille character rectangular frame position prediction is output as a plurality of five-membered vectors (d 1 ,d 2 ,h 1 ,h 2 Score), where d 1 ,d 2 ,h 1 ,h 2 Representing distances from edge pixels to four sides of the rectangular frame of the Braille character, score representing the value of the prediction at this time; each braille character semantic classification corresponds to a vector of size 64 of 0 to 63 of the Softmax function, where 1 to 63 correspond to braille character classifications, labeled according to the order given in national general braille scheme (practice), and 0 corresponds to non-braille characters.
6. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: the specific implementation mode of the step 5 is as follows;
step 5.1, according to the input image and the corresponding label file, constructing a GroundTruth value of a prediction layer, reducing the image to H/2 XW/2, wherein H and W respectively represent the height and width of the input image, and the coordinates of the rectangular frames of the Braille characters in the corresponding label file are synchronously reduced;
step 5.2, calculating a Loss value according to the predicted value of the predicted layer and the corresponding GroundTruth, wherein the total Loss function is shown in the following formula, alpha=beta is constant, and Loss geometry Loss representing braille character rectangular frame prediction, loss edge Representing Loss of edge pixel prediction, loss class Representing the loss of braille character semantic classification prediction;
Loss=Loss geometry +α×Loss edge +β×Loss class
step 5.2.1, the method for calculating the predicted loss of the rectangular frame of the braille character is shown in the following formula,
Figure FDA0004045974590000031
where N represents the number of braille characters,
Figure FDA0004045974590000032
and->
Figure FDA0004045974590000033
The size of the intersection and union area of the ith real braille character rectangular frame and the prediction rectangular frame is represented, the calculation method is respectively shown in the following formula,
A inter =(min(d 1g ,d 1p )+min(d 2g ,d 2p ))*(min(h 1g ,h 1p )+min(h 2g ,h 2p ))
A union =(d 1p +d 2p )*(h 1p +h 2p )+(d 1g +d 2g )*(h 1g +h 2g )-A inter
wherein d is 1g ,d 2g ,h 1g ,h 2g Representing middle edge of rectangular frame of real Braille characterThe distance from the edge pixel to the four edges, d 1g And d 2g Respectively representing the distance to the left frame and the right frame, h 1g And h 2g Respectively represent the distances to the upper and lower rims, d 1p ,d 2p ,h 1p ,h 2p The distance from the edge pixel to four sides in the predicted braille character rectangular frame is represented, and min represents the minimum value obtained from a plurality of parameters;
step 5.2.2, loss edge The calculation method of (1) is shown in the following formula T edge And P edge Respectively representing a true value and a predicted value of the edge pixel matrix;
Figure FDA0004045974590000034
step 5.2.3, loss class The calculation method of (1) is shown in the following formula,
Figure FDA0004045974590000035
finding out the value p at the corresponding position in the semantic classification vector according to the real classification value of the Braille character semantic classification i Where y is i =1。
7. The method for recognizing braille characters in natural scene with edge feature mining according to claim 6, wherein: the specific implementation mode of the step 5.1 is as follows;
step 5.1.1, for a certain pixel (x, y), if the coordinates of the pixel and a certain rectangular frame of the Braille character satisfy the following formula, the pixel belongs to the Braille character Edge region, the value of the corresponding position in the matrix Edge is set to 1, otherwise, 0, and x in the formula 1 And x 3 The abscissa of pixels at the upper left corner and the lower right corner of the rectangular frame of the Braille character are respectively represented;
Figure FDA0004045974590000036
step 5.1.2, groundTruth (d) of the Braille rectangular frame corresponding to the edge pixel (x, y) 1 ,d 2 ,h 1 ,h 2 ) The calculation method of (2) is shown in the following formula, wherein x in the formula is as follows 1 ,y 1 And x 3 ,y 3 The abscissa and the ordinate of pixels at the upper left corner and the lower right corner of the rectangular frame of the Braille character are respectively represented;
d 1 =x-x 1 ,d 2 =x 3 -x,h 1 =y-y 1 ,h 2 =y 3 -y.
and 5.1.3, marking the Braille character semantic classification corresponding to a vector with the size of 64 according to the sequence given in the national general Braille scheme (implementation), if the Braille character semantic classification is 60, setting the 60 th bit in the vector as 1 and the other bits as 0, and setting GroundTruth for the semantic classification values of all Braille characters in sequence.
8. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: in the step 7, the convolutional neural network training process is optimized by adopting a random gradient descent method, related parameters are set to be BatchSize=8, the maximum training times are 100,000, and the initial value of the learning rate is 10 -4 The learning rate is then dynamically set using the following formula,
Figure FDA0004045974590000041
the initial_learning rate represents the Initial learning rate, the current_step represents the Current training times, and the convolutional neural network Model CNN_model is obtained after training is completed.
9. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: in step 10, the NMS algorithm is improved on the conventional NMS, the prediction results of the same column of edge pixels are compared in pairs, the conventional NMS is used to obtain the result after preliminary screening, and then the results after multiple columns of screening are input into the conventional NMS algorithm together to obtain the final prediction result.
10. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: in the step 11, the accuracy P, the regression rate R and the comprehensive index Hmean used in the text detection field are used for evaluating the braille character detection performance, the accuracy represents the percentage of the number of correctly predicted rectangular frames of braille segments to the number of all the predicted rectangular frames of braille segments, and if the IOU of the area of a certain braille character detection frame and the real frame area is larger than 0.5, the rectangular frames of braille characters are considered to be correctly detected; the regression rate represents the percentage of the real frames of the Braille characters which are correctly predicted, the value of the regression rate is the number of correctly predicted Braille segments divided by the number of the real frames of all Braille characters, hmean is a comprehensive index, the value of the regression rate is calculated by P and R, and the specific calculation method is as follows;
Figure FDA0004045974590000042
the braille character recognition performance is directly calculated by adopting the accuracy, and the accuracy is directly calculated by dividing the number of correctly recognized braille characters by the total number of the detected braille characters, as shown in the following formula,
Figure FDA0004045974590000043
where RP represents the number of braille characters that are correctly recognized and FP represents the number of braille characters that are incorrectly recognized.
CN202310027966.3A 2023-01-09 2023-01-09 Method for recognizing braille characters in natural scene with edge characteristics excavated Pending CN116343229A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310027966.3A CN116343229A (en) 2023-01-09 2023-01-09 Method for recognizing braille characters in natural scene with edge characteristics excavated

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310027966.3A CN116343229A (en) 2023-01-09 2023-01-09 Method for recognizing braille characters in natural scene with edge characteristics excavated

Publications (1)

Publication Number Publication Date
CN116343229A true CN116343229A (en) 2023-06-27

Family

ID=86891935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310027966.3A Pending CN116343229A (en) 2023-01-09 2023-01-09 Method for recognizing braille characters in natural scene with edge characteristics excavated

Country Status (1)

Country Link
CN (1) CN116343229A (en)

Similar Documents

Publication Publication Date Title
JP6831480B2 (en) Text detection analysis methods, equipment and devices
US10410353B2 (en) Multi-label semantic boundary detection system
CN111414906A (en) Data synthesis and text recognition method for paper bill picture
CN112307919B (en) Improved YOLOv 3-based digital information area identification method in document image
US20240161449A1 (en) Apparatus and methods for converting lineless talbes into lined tables using generative adversarial networks
CN114005123A (en) System and method for digitally reconstructing layout of print form text
JPH07168948A (en) Improving method for quality of document image
US20230025450A1 (en) Information processing apparatus and information processing method
CN113822116A (en) Text recognition method and device, computer equipment and storage medium
CN112446259A (en) Image processing method, device, terminal and computer readable storage medium
CN113901952A (en) Print form and handwritten form separated character recognition method based on deep learning
CN110956167A (en) Classification discrimination and strengthened separation method based on positioning characters
CN112241730A (en) Form extraction method and system based on machine learning
CN111626145A (en) Simple and effective incomplete form identification and page-crossing splicing method
CN114330234A (en) Layout structure analysis method and device, electronic equipment and storage medium
CN112508000B (en) Method and equipment for generating OCR image recognition model training data
JP2007058882A (en) Pattern-recognition apparatus
CN111832497B (en) Text detection post-processing method based on geometric features
CN111274863A (en) Text prediction method based on text peak probability density
CN116343229A (en) Method for recognizing braille characters in natural scene with edge characteristics excavated
JP4176175B2 (en) Pattern recognition device
Baloun et al. ChronSeg: Novel Dataset for Segmentation of Handwritten Historical Chronicles.
Bruces et al. Two-Cell Contractions of Filipino Braille Recognition Using Extreme Learning Machine
CN114187445A (en) Method and device for recognizing text in image, electronic equipment and storage medium
CN113435441A (en) Bi-LSTM mechanism-based four-fundamental operation formula image intelligent batch modification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination