CN116343229A

CN116343229A - Method for recognizing braille characters in natural scene with edge characteristics excavated

Info

Publication number: CN116343229A
Application number: CN202310027966.3A
Authority: CN
Inventors: 卢利琼; 吴东
Original assignee: Lingnan Normal University
Current assignee: Lingnan Normal University
Priority date: 2023-01-09
Filing date: 2023-01-09
Publication date: 2023-06-27

Abstract

The invention discloses a natural scene braille character recognition method for excavating edge features. Firstly, the invention can be directly applied to the braille images of the natural scene, and expands the application scene of braille character recognition; secondly, the invention analyzes the characteristics of braille on writing, and discovers that the braille points are arranged at the edges of the braille characters, so that the edge pixels of the braille characters are detected firstly, then the edge pixels are utilized to detect the specific positions of the braille characters, and the braille detection performance is improved; finally, the method can be used for multiple detection after one-time training, and the convolutional neural network model is obtained through training a large number of natural scene images, and only needs to be used again during detection. The invention is used for carrying out performance test on the braille images of the natural scene in the test set, the braille character detection performance Hmean value is 85.9%, and the recognition accuracy is 95.3%.

Description

Method for recognizing braille characters in natural scene with edge characteristics excavated

Technical Field

The invention belongs to the field of braille character recognition in natural scene images, and particularly relates to a method for mining the edge characteristics of braille characters in natural scene images by adopting a convolutional neural network and then recognizing the braille characters by utilizing the edge characteristics.

Background

Braille is an effective medium for visually impaired people to learn knowledge, acquire information and communicate with others. Braille character recognition aims to automatically detect the position of the Braille characters in the whole image by utilizing the technology of artificial intelligence and then recognize the Braille characters. The Braille character recognition technology is a key premise that the application of a plurality of special education products can fall to the ground, such as electronic Braille books, automatic Braille paper reading, barrier-free communication between blind students and normal people and the like.

How to efficiently recognize braille characters in an image has been an important subject for informatization research of special education. The current braille identification method research is mainly focused on braille scanning document images. The method comprises the steps of scanning a Braille text in a document image, wherein the Braille text has a cell with a fixed size and a Braille arrangement mode, and most of the prior methods adopt the method that Braille points are detected first, and then a plurality of Braille points are combined to obtain Braille characters for recognition. The current method for detecting the braille points in the scanned image is mainly divided into two types, namely a braille point detection method based on image segmentation and a braille point detection method combining the characteristic of the excavated braille points with a machine learning classification method. Image segmentation-based method first segments pixels of braille images into three parts of shadow, ray and background using local adaptive threshold values, and then detects braille points by combining rules of the three parts ^[14-19] . The Braille point detection method is sensitive to a threshold value, and the detection target is obtained through multiple steps and is easy to accumulate errors. In order to avoid the problems, the second type of Braille point detection method directly detects the Braille points through a Braille point feature and classification algorithm, and the common Braille point feature and classification algorithm are haar+SVM, HOG+SVM, haar, LBP, HOG+Adaboost and the like. Although the second method can directly detect the braille points, from the recognition point of view, a plurality of braille points are also required to be combined to obtain braille characters, and the problem that multiple steps cause easy accumulation errors still exists.

The existing braille recognition method is analyzed, and two problems are found to exist. The method has the advantages that the application scene is single, the braille identification effect on the braille document images which are orderly and regularly arranged is good, but the identification performance is quite unsatisfactory for the braille in the natural scene images with large arrangement and size variation; the two braille character excavating means are lack, the using technology is not novel enough, so that the recognition effect is general, and braille characters obtained through multi-step operation are easy to accumulate errors; therefore, the existing Braille recognition method is difficult to effectively support the development and landing of applications such as electronic braille books, automatic braille examination and communication between blind students and normal people, so that the characteristics of Braille characters are analyzed, a convolutional neural network is introduced into braille recognition research, and the research on braille recognition in braille images of natural scenes is a problem which needs to be solved urgently at present.

Disclosure of Invention

The invention aims to expand the application scene of braille recognition, improve the accuracy of braille character recognition, and enable the braille character recognition to be used for automatic braille examination paper, and realize barrier-free communication and communication between blind students and normal people.

In order to achieve the purpose, the invention provides a natural scene braille character recognition method for excavating braille edge characteristics, and the specific implementation of the technical scheme of the invention comprises the following steps:

step 1, collecting images containing braille in a natural scene; the method comprises the steps of carrying out a first treatment on the surface of the

Step 2, marking the Braille character position and semantic classification information in the natural scene image;

step 3, designing a convolutional neural network structure by taking the ResNet 50 as a backbone network, and fusing five characteristic layers with different sizes in the ResNet 50 structure;

step 4, constructing a prediction layer, wherein the prediction layer is used for predicting edge pixels of the Braille characters, rectangular frame positions of the Braille characters corresponding to the edge pixels and semantic types of the Braille characters on a feature layer with the size of 1/4 of the input image size;

step 5, designing a loss function according to the Braille edge pixels, the Braille character rectangular frame positions and the representations corresponding to the Braille character classification in the step 4;

step 6, randomly dividing the data set into a training set and a testing set according to a certain proportion;

step 7, training the training set by using the designed convolutional neural network and the loss function, and obtaining a training Model CNN_model after training is completed;

step 8, obtaining an image in the test set, taking the image as the input of CNN_Model, and outputting an edge pixel score matrix, a Braille character rectangular frame coordinate vector corresponding to the edge pixel and each Braille character classification value by the Model;

step 9, judging the score of the edge pixel matrix, and when the score is greater than a threshold value, considering the score as an effective edge pixel, and then obtaining an effective Braille character position and a corresponding classification from the effective edge pixel;

step 10, obtaining optimal prediction results from a plurality of Braille character rectangular frames by using an NMS algorithm, extracting Braille character classification semantic information and displaying the Braille character classification semantic information;

step 11, repeating the steps 8 to 10 to obtain braille character recognition results of braille images of the natural scene in all the test sets, and calculating the detection and recognition performances of the braille characters in the test sets;

step 12, for a natural scene braille image in a non-data set, taking the natural scene braille image as the input of CNN_model, outputting an edge pixel score matrix, a braille character rectangular frame coordinate vector corresponding to the edge pixel and each braille character classification value, and then executing the step 9 and the step 10 to obtain a prediction result.

Further, step 1 further includes performing an enhancement operation on the braille image, where the enhancement operation includes brightness enhancement, brightness reduction, sharpening, softening, contrast enhancement, and contrast reduction.

Further, in the step 2, a labelme tool is used for marking the braille characters in each original image of the natural scene image dataset, and semantic classification of the braille characters in the marking process is marked according to the sequence given in the national general braille scheme (implementation); the label of each braille character corresponds to a line of records, and the specific data format is (x ₁ ,y ₁ ,x ₂ ,y ₂ ,x ₃ ,y ₃ ,x ₄ ,y ₄ Class), where (x ₁ ,y ₁ ) Representing the coordinates of the upper left corner of the Braille rectangular box, (x) ₂ ,y ₂ ) Representing the upper right angular position, (x) ₃ ,y ₃ ) Representing the lower right angular position, (x) ₄ ,y ₄ ) Representing the lower left corner coordinates, class representing the semantics corresponding to the Braille charactersAnd classifying the labels.

Further, the method of fusing the feature layers in the step 3 is as follows;

feature layer f of H/32 XW/32 is extracted from ResNet 50 ₁ Feature layer f of size H/16 XW/16 ₂ Feature layer f of size H/8 XW/8 ₃ Feature layer f of size H/4 XW/4 ₄ And a feature layer f of size H/2 XW/2 ₅ Here, H and W represent the height and width of the input image, respectively, and the fused feature layer H is constructed using the following formula ₁ ，h ₂ ，h ₃ ，h ₄ And h ₅ ，

Wherein conv _3×3 Indicating a convolution operation with a convolution kernel of 3 x 3 size, concat indicating a join operation of two matrices, unpooling _2×2 Indicating that the feature layer is downsampled with a 2 x 2 size.

Further, in step 4, the Edge pixel prediction output is a matrix Edge with a size of H/2×w/2; the Braille character rectangular frame position prediction is output as a plurality of five-membered vectors (d ₁ ，d ₂ ，h ₁ ，h ₂ Score), where d ₁ ，d ₂ ，h ₁ ，h ₂ Indicating distances from edge pixels to four sides of the rectangular frame of the Braille character, sco () indicating the value of this prediction; each braille character semantic classification corresponds to a vector of size 64 of 0 to 63 of the Softmax function, where 1 to 63 corresponds to a braille character classification and 0 corresponds to a non-braille character.

Further, the specific implementation manner of the step 5 is as follows;

step 5.1, according to the input image and the corresponding label file, constructing a GroundTruth value of a prediction layer, reducing the image to H/2 XW/2, wherein H and W respectively represent the height and width of the input image, and the coordinates of the rectangular frames of the Braille characters in the corresponding label file are synchronously reduced;

step 5.2, according to the predicted value and the pair of the predicted layersThe corresponding GroundTruth calculates the Loss value, and the total Loss function is shown in the following formula, wherein alpha=beta is constant, and Loss _geometry Loss representing braille character rectangular frame prediction, loss _edge Representing Loss of edge pixel prediction, loss _cla99 Representing the loss of braille semantic classification prediction;

Loss＝Loss _geometry +α×Loss _edge +β×Loss _class

step 5.2.1, the method for calculating the predicted loss of the rectangular frame of the braille character is shown in the following formula,

where N represents the number of braille characters,

and->

The size of the intersection and union area of the ith real braille character rectangular frame and the prediction rectangular frame is represented, the calculation method is respectively shown in the following formula,

A _inter ＝(min(d _1g ,d _1p )+min(d _2g ,d _2p ))*(min(h _1g ,h _1p )+min(h _2g ,h _2p ))

A _uni0n ＝(d _1p +d _2p )*(h _1p +h _2p )+(d _1g +d _2g )*(h _1g +h _2g )-A _inter

wherein d is _1g ,d _2g ,h _1g ,h _2g Representing the distance from the edge pixel to four sides in the rectangular frame of the real Braille character, d _1g And d _2g Respectively representing the distance to the left frame and the right frame, h _1g And h _2g Respectively represent the distances to the upper and lower rims, d _1p ,d _2p ,h _1p ,h _2p Representing the edge pixels to four sides in the predicted braille character rectangular boxMin represents the minimum value obtained from a plurality of parameters;

step 5.2.2, loss _edge The calculation method of (1) is shown in the following formula T _edge And P _edge Respectively representing a true value and a predicted value of the edge pixel matrix;

step 5.2.3, loss _class The calculation method of (1) is shown in the following formula,

finding out the value p at the corresponding position in the semantic classification vector according to the real classification value of the Braille character semantic classification _i Where y is _i ＝1。

Further, the specific implementation manner of step 5.1 is as follows;

step 5.1.1, for a certain pixel (x, y), if the coordinates of the pixel and a certain rectangular frame of the Braille character satisfy the following formula, the pixel belongs to the Braille character Edge region, the value of the corresponding position in the matrix Edge is set to 1, otherwise, 0, and x in the formula ₁ And x ₃ The abscissa of pixels at the upper left corner and the lower right corner of the rectangular frame of the Braille character are respectively represented;

or/>

step 5.1.2, groundTruth (d) of the Braille rectangular frame corresponding to the edge pixel (x, y) ₁ ，d ₂ ，h ₁ ，h ₂ ) The calculation method of (2) is shown in the following formula, wherein x in the formula is as follows ₁ ，y ₁ And x ₃ ，y ₃ Abscissa and ordinate representing pixels of upper left corner and lower right corner of rectangular frame of braille characterMarking;

d ₁ ＝x-x ₁ ,d ₂ ＝x ₃ -x,h ₁ ＝y-y ₁ ,h ₂ ＝y ₃ -y.

and 5.1.3, the Braille character semantic classification corresponds to a vector with the size of 64, if the Braille character semantic classification is 60, the 60 th bit in the vector is 1, the others are 0, and the GroundTruth is sequentially set for the semantic classification values of all the Braille characters.

Further, in the step 7, the convolutional neural network training process is optimized by adopting a random gradient descent method, related parameters are set to be BatchSize=8, the maximum training times are 100,000, and the initial value of the learning rate is 10 ^-4 The learning rate is then dynamically set using the following formula,

the initial_learning rate represents the Initial learning rate, the current_step represents the Current training times, and the convolutional neural network Model CNN_model is obtained after training is completed.

Further, the NMS algorithm in step 10 is an improvement of the conventional NMS, in which the prediction results of the same row of edge pixels are compared in pairs, the conventional NMS is used to obtain the preliminary screening result, and then the multiple rows of screening results are input into the conventional NMS algorithm together to obtain the final prediction result.

Further, in step 11, the accuracy P, the regression rate R and the comprehensive index Hmean used in the text detection field are used to evaluate the performance of braille character detection, the accuracy represents the percentage of the number of correctly predicted rectangular frames of braille characters to the number of all the predicted rectangular frames of braille characters, and if the IOU of a certain rectangular frame of braille characters and the real frame area is greater than 0.5, the rectangular frames of braille characters are considered to be correctly detected; the regression rate represents the percentage of the real frames of the Braille characters which are correctly predicted, the value of the regression rate is the number of the correctly predicted Braille characters divided by the number of the real frames of all Braille characters, hmean is a comprehensive index, the value of the regression rate is calculated by P and R, and the specific calculation method is as follows;

the braille character recognition performance is directly calculated by adopting the accuracy, and the accuracy is directly calculated by dividing the number of correctly recognized braille characters by the total number of the detected braille characters, as shown in the following formula,

where RP represents the number of braille characters that are correctly recognized and FP represents the number of braille characters that are incorrectly recognized.

Compared with the prior art, the invention has the advantages that: firstly, the method can be directly applied to braille images of natural scenes, and the application scene of braille character recognition is expanded; secondly, the invention analyzes the characteristics of braille on writing, and discovers that the braille points are arranged at the edges of the braille characters, so that the edge pixels of the braille characters are detected firstly, then the edge pixels are utilized to detect the specific positions of the braille characters, and the braille detection performance is improved; finally, the method can be used for multiple detection after one-time training, and the convolutional neural network model is obtained through training a large number of natural scene images, and only needs to be used again during detection. The invention is used for carrying out performance test on the braille images of the natural scene in the test set, the braille character detection performance Hmean value is 85.9%, and the recognition accuracy is 95.3%.

Drawings

FIG. 1 is a diagram showing the correspondence between Braille characters and Pinyin.

Fig. 2 is a flowchart of an implementation of an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to the drawings and examples, which are described herein for the purpose of illustrating and explaining the present invention, but not limiting the scope of the invention, for the purpose of those of ordinary skill in the art to understand and practice the present invention.

Referring to fig. 2, the implementation steps of the present invention are as follows:

step 1, constructing a braille image dataset of a natural scene

Step 1.1, the image sources of the braille image data set of the natural scene constructed by the invention are mainly two, namely, the image sources are downloaded from the Internet; secondly, the intelligent shooting equipment is used for shooting. There are 554 images in the dataset, of which 80% are used as training sets and 20% are used as test sets;

step 1.2, calling a function Brightness, sharpness, contrast corresponding to the class imageenhancement in the Python language PIL module to realize image enhancement from the aspects of brightness enhancement, brightness reduction, sharpening, softening, contrast enhancement and contrast reduction; after image enhancement is realized, 554 original acquired braille images are added with 6 images generated after each original image data enhancement, and the braille images in the data set share: 554+554×6=3878.

Step 2, making a data tag: marking braille characters in the natural scene image from two aspects, namely marking the positions of rectangular frames where the braille characters are positioned, marking the semantic classification of the braille characters, and finally synthesizing the two marks into a txt file;

step 2.1, marking the braille characters in each original image of the natural scene image dataset by using a labelme tool, wherein the semantic classification of the braille characters in the marking process is marked according to the sequence given in fig. 1 (numerals 1 to 63). Here, the image after image enhancement can share a label file with the corresponding original image without marking, and 554 label files (xml files) are required to be provided for the corresponding original image in the data set, and the label files can be named as the corresponding original image, and the suffix name is xml.

Step 2.2 converting 554. Xml files into. Txt files of the same name using Python code, in each. Txt file, the tag of each Braille character corresponds to a row of records, the specific data format is (x ₁ ,y ₁ ,x ₂ ,y ₂ ,x ₃ ,y ₃ ,x ₄ ,y ₄ Class), where (x ₁ ,y ₁ ) Representing the coordinates of the upper left corner of the Braille rectangular box, (x) ₂ ,y ₂ ) Representing the upper right angular position, (x) ₃ ,y ₃ ) Representing the lower right angular position, (x) ₄ ,y ₄ ) And the lower left corner coordinates are represented, and Class represents the semantic classification label corresponding to the braille character.

Step 2.3, according to the corresponding relation between Braille characters and pinyin shown in fig. 1, writing the corresponding relation between Braille characters and pinyin into a file braille_seg.txt, wherein the file has 63 rows in total, and each row corresponds to one Braille character. Each line of information starts with a classification value (1 to 63) for the braille character, followed by a colon (":"), the pinyin corresponding to the braille character is written after the colon, and if there are multiple pinyins corresponding to one braille character, the middle is separated by a comma.

Step 3, constructing a convolutional neural network

The invention is developed based on a Tensorflow 1.4 platform, and the construction of a backbone network is finished according to a convolution layer, a Pooling layer and various activation functions in a ResNet 50 structure; the feature layer f of size (H/32 XW/32) was then extracted for ResNet 50 ₁ Feature layer f of size (H/16 XW/16) ₂ Feature layer f of size (H/8 XW/8) ₃ Feature layer f of size (H/4 XW/4) ₄ And a feature layer f of size (H/2 XW/2) ₅ The fused feature layer h is constructed using the following formula ₁ ，h ₂ ，h ₃ ，h ₄ And h ₅ . Here, H and W represent the height and width of the input image, respectively.

Conv in the formula _3×3 Indicating a convolution operation with a convolution kernel of 3 x 3 size, concat indicating a join operation of two matrices, unpooling _2×2 Indicating that the feature layer is downsampled with a 2 x 2 size, these operations can be done by directly calling the function.

Step 4, at feature layer h ₅ Upper structureThe prediction layer is used for edge pixel prediction, braille character rectangular frame position prediction and Braille character semantic classification prediction. The predicted output of the Edge pixels is a matrix Edge with the size of (H/2 XW/2); the Braille character rectangular frame position prediction is output as a plurality of five-membered vectors (d ₁ ，d ₂ ，h ₁ ，h ₂ Score), where d ₁ ，d ₂ ，h ₁ ，h ₂ Representing distances from edge pixels to four sides of the rectangular frame of the Braille character, score representing the value of the prediction at this time; each braille character semantic classification corresponds to a vector of size 64 of 0 to 63 of the Softmax function, where 1 to 63 correspond to the braille character classification shown in fig. 1 and 0 corresponds to a non-braille character.

Step 5, construction loss calculation method

And 5.1, constructing a GroundTruth value of a prediction layer according to the input image (jpg file) and a corresponding tag file (txt file), and reducing the image to (H/2 XW/2), wherein the coordinates of the Braille character rectangular frame in the corresponding tag file are synchronously reduced. This process may be subdivided into the following sub-steps:

step 5.1.1, for a certain pixel (x, y), if the coordinates of the pixel and a certain braille character rectangular frame satisfy the following formula, the pixel belongs to the Edge area of the braille character, the value of the corresponding position in the matrix Edge is set to be 1, otherwise, the value is set to be 0. X in the formula ₁ And x ₃ The abscissa of the pixels in the upper left and lower right corners of the rectangular frame of the braille character is indicated, respectively.

or/>

Step 5.1.2, groundTruth (d) of the Braille rectangular frame corresponding to the edge pixel (x, y) ₁ ，d ₂ ，h ₁ ，h ₂ ) The calculation method of (2) is shown in the following formula, wherein x in the formula is as follows ₁ ，y ₁ And x ₃ ，y ₃ Respectively representing images of the upper left corner and the lower right corner of a rectangular frame of the Braille characterThe abscissa and ordinate of the element.

d ₁ ＝x-x ₁ ,d ₂ ＝x ₃ -x,h ₁ ＝y-y ₁ ,h ₂ ＝y ₃ -y.

And 5.1.3, the Braille character semantic classification corresponds to a vector with the size of 64, if the Braille character semantic classification is 60, the 60 th bit (the least bit is 0) in the vector is 1, the other bits are 0, and GroundTruth is sequentially set for the semantic classification values of all the Braille characters.

Step 5.2, calculating a loss value according to the predicted value of the predicted layer and the corresponding GroundTruth, wherein the total loss function is shown in the following formula, alpha=beta=0.1, and loss _geome2ry Loss representing braille character rectangular frame prediction, loss _edge Representing Loss of edge pixel prediction, loss _class Representing the loss of braille semantic classification prediction.

Loss＝Loss _geome2ry +α×Loss _edge +β×Loss _class

where N represents the number of braille characters,

and->

A _in2er ＝(min(d ₁ g,d _1p )+min(d ₂ g,d _2p ))*(min(h ₁ g,h _1p )+min(h ₂ g,h _2p ))

wherein d is _1g ,d _2g ,h _1g ,h _2g Representing the distance from the edge pixel to four sides in the rectangular frame of the real Braille character, d _1g And d _2g Respectively representing the distance to the left frame and the right frame, h _1g And h _2g Representing the distance to the upper and lower rims, respectively. d, d _1p ,d _2p ,h _1p ,h _2p The distance of the edge pixels to the four sides in the predicted braille character rectangular box is represented. min represents the minimum value obtained from a plurality of parameters.

Step 5.2.2, loss _edge The calculation method of (1) is shown in the following formula T _edge And P _edge Representing the actual and predicted values of the edge pixel matrix, respectively.

Step 5.2.3, loss _class As shown in the following formula, the value p at the corresponding position in the semantic classification vector is found according to the real classification value (0 to 63) of the Braille character semantic classification _i Where y is _i ＝1。

And 6, calling a Python function to randomly divide the braille images in the natural scene image data set into a training set and a testing set, wherein 80% of the braille images are used as the training set, and 20% of the braille images are used as the testing set.

And 7, starting training by utilizing a convolutional neural network, a prediction layer representation and loss calculation method and images in a natural scene braille image training set which are constructed in advance, wherein a platform involved in the training process is TensorFlow 1.4, programming language is Python, main hardware is an HP image server, and a display card is GTX 2080Ti. During training, all input images are adjusted to be 512 multiplied by 512 in size, and the convolutional neural networkThe training process adopts a random gradient descent method (Stochastic Gradient Descent, SGD) to optimize, related parameters are set as BatchSize=8, the maximum training times are 100,000, and the initial value of learning rate is 10 ^-4 Then dynamically setting a dynamic learning rate by adopting the following formula, wherein the initial_learning rate represents the Initial learning rate, the current_step represents the Current training times, and the convolutional neural network Model CNN_model is obtained after training is completed.

And 8, acquiring an image from the test set, adjusting the size of the image to 1024, and adjusting the height of the image under the condition that the aspect ratio is kept unchanged. The image is then used as input to a cnn_model, which outputs an edge pixel prediction matrix, a braille character rectangular frame prediction vector, and a braille character semantic classification prediction vector.

And 9, judging the value in the edge pixel matrix, and when the value is greater than or equal to 0.8, considering the value as an effective edge pixel, and then acquiring a corresponding Braille character rectangular frame and semantic classification information according to the effective edge pixel.

And step 10, screening the predicted result, and obtaining the optimal predicted result from a plurality of Braille character rectangular frames by using an improved NMS algorithm. With this improvement, the time complexity can be preferably selected from the group consisting of O (n ² ) Down to O (n). Aiming at the filtered Braille characters, extracting Braille character classification semantic information from Braille_seg.txt and displaying the Braille character classification semantic information;

and 11, repeating the steps 8 to 10 to obtain braille character recognition results of the braille images of the natural scene in the test set, and calculating the detection and recognition performances of the braille characters in the test set. The invention uses the accuracy (P), regression rate (R) and comprehensive index (Hmean) used in the text detection field to evaluate the braille character detection performance. The accuracy represents the percentage of the number of correctly predicted rectangular frames of the Braille characters to the number of all the predicted rectangular frames of the Braille characters. If the IOU of the area of a certain braille character detection frame and the real frame area is larger than 0.5, the braille character rectangular frame is considered to be correctly detected. The regression rate represents the percentage of the real frames of the Braille characters that are correctly predicted, and its value is the number of correctly predicted Braille characters divided by the number of the real frames of all Braille characters. Hmean is a comprehensive index, the value of which is calculated by P and R, and the specific calculation method is as follows;

the Braille character recognition performance is directly calculated by adopting the accuracy, namely the recognition accuracy is obtained by directly dividing the number of correctly recognized Braille characters by the total number of detected Braille characters, wherein RP represents the number of correctly recognized Braille characters and FP represents the number of wrongly recognized Braille characters;

step 12, for a braille image of a natural scene in the data set, adjusting the size of the image to 1024, and adjusting the height of the image under the condition that the aspect ratio is kept unchanged, then, as the input of CNN_model, outputting an edge pixel score matrix, a braille character rectangular frame coordinate vector corresponding to the edge pixel and each braille character semantic classification vector, and then executing the steps 9 and 10 to obtain a prediction result.

The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims

1. A natural scene braille character recognition method for excavating braille edge features is characterized by comprising the following steps:

step 1, collecting images containing braille in a natural scene;

step 3, designing a convolutional neural network structure by taking the ResNet 50 as a backbone network, and fusing five characteristic layers with different sizes in the ResNet 50 structure to obtain a fused characteristic layer;

step 4, constructing a prediction layer, wherein the prediction layer is used for predicting edge pixels of the Braille characters, rectangular frame positions of the Braille characters corresponding to the edge pixels and semantic types of the Braille characters on a fusion feature layer with the size of 1/4 of the input image size;

2. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: step 1 further comprises performing an enhancement operation on the braille image, the enhancement operation comprising brightness enhancement, brightness reduction, sharpening, softening, contrast enhancement and contrast reduction.

3. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: marking braille characters in each original image of the natural scene image dataset by using a labelme tool, wherein semantic classification of the braille characters in the marking process is marked according to the sequence given in the national general braille scheme (implementation); the label of each braille character corresponds to a line of records, and the specific data format is (x ₁ ，y ₁ ，x ₂ ，y ₂ ，x ₃ ，y ₃ ，x ₄ ，y ₄ Class), where (x ₁ ，y ₁ ) Representing the coordinates of the upper left corner of the Braille rectangular box, (x) ₂ ，y ₂ ) Representing the upper right angular position, (x) ₃ ，y ₃ ) Representing the lower right angular position, (x) ₄ ，y ₄ ) And the lower left corner coordinates are represented, and Class represents the semantic classification label corresponding to the braille character.

4. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: the method for fusing the feature layers with different scales in the step 3 is as follows;

5. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: in step 4, the predicted output of the Edge pixels is a matrix Edge with the size of H/2 XW/2; the Braille character rectangular frame position prediction is output as a plurality of five-membered vectors (d ₁ ，d ₂ ，h ₁ ，h ₂ Score), where d ₁ ，d ₂ ，h ₁ ，h ₂ Representing distances from edge pixels to four sides of the rectangular frame of the Braille character, score representing the value of the prediction at this time; each braille character semantic classification corresponds to a vector of size 64 of 0 to 63 of the Softmax function, where 1 to 63 correspond to braille character classifications, labeled according to the order given in national general braille scheme (practice), and 0 corresponds to non-braille characters.

6. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: the specific implementation mode of the step 5 is as follows;

step 5.2, calculating a Loss value according to the predicted value of the predicted layer and the corresponding GroundTruth, wherein the total Loss function is shown in the following formula, alpha=beta is constant, and Loss _geometry Loss representing braille character rectangular frame prediction, loss _edge Representing Loss of edge pixel prediction, loss _class Representing the loss of braille character semantic classification prediction;

Loss＝Loss _geometry +α×Loss _edge +β×Loss _class

where N represents the number of braille characters,

and->

A _inter ＝(min(d _1g ，d _1p )+min(d _2g ，d _2p ))*(min(h _1g ，h _1p )+min(h _2g ，h _2p ))

A _union ＝(d _1p +d _2p )*(h _1p +h _2p )+(d _1g +d _2g )*(h _1g +h _2g )-A _inter

wherein d is _1g ，d _2g ，h _1g ，h _2g Representing middle edge of rectangular frame of real Braille characterThe distance from the edge pixel to the four edges, d _1g And d _2g Respectively representing the distance to the left frame and the right frame, h _1g And h _2g Respectively represent the distances to the upper and lower rims, d _1p ，d _2p ，h _1p ，h _2p The distance from the edge pixel to four sides in the predicted braille character rectangular frame is represented, and min represents the minimum value obtained from a plurality of parameters;

7. The method for recognizing braille characters in natural scene with edge feature mining according to claim 6, wherein: the specific implementation mode of the step 5.1 is as follows;

step 5.1.2, groundTruth (d) of the Braille rectangular frame corresponding to the edge pixel (x, y) ₁ ，d ₂ ，h ₁ ，h ₂ ) The calculation method of (2) is shown in the following formula, wherein x in the formula is as follows ₁ ，y ₁ And x ₃ ，y ₃ The abscissa and the ordinate of pixels at the upper left corner and the lower right corner of the rectangular frame of the Braille character are respectively represented;

d ₁ ＝x-x ₁ ，d ₂ ＝x ₃ -x，h ₁ ＝y-y ₁ ，h ₂ ＝y ₃ -y.

and 5.1.3, marking the Braille character semantic classification corresponding to a vector with the size of 64 according to the sequence given in the national general Braille scheme (implementation), if the Braille character semantic classification is 60, setting the 60 th bit in the vector as 1 and the other bits as 0, and setting GroundTruth for the semantic classification values of all Braille characters in sequence.

8. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: in the step 7, the convolutional neural network training process is optimized by adopting a random gradient descent method, related parameters are set to be BatchSize=8, the maximum training times are 100,000, and the initial value of the learning rate is 10 ^-4 The learning rate is then dynamically set using the following formula,

9. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: in step 10, the NMS algorithm is improved on the conventional NMS, the prediction results of the same column of edge pixels are compared in pairs, the conventional NMS is used to obtain the result after preliminary screening, and then the results after multiple columns of screening are input into the conventional NMS algorithm together to obtain the final prediction result.

10. The method for recognizing braille characters in natural scene with edge feature mining according to claim 1, wherein: in the step 11, the accuracy P, the regression rate R and the comprehensive index Hmean used in the text detection field are used for evaluating the braille character detection performance, the accuracy represents the percentage of the number of correctly predicted rectangular frames of braille segments to the number of all the predicted rectangular frames of braille segments, and if the IOU of the area of a certain braille character detection frame and the real frame area is larger than 0.5, the rectangular frames of braille characters are considered to be correctly detected; the regression rate represents the percentage of the real frames of the Braille characters which are correctly predicted, the value of the regression rate is the number of correctly predicted Braille segments divided by the number of the real frames of all Braille characters, hmean is a comprehensive index, the value of the regression rate is calculated by P and R, and the specific calculation method is as follows;