CN113936181B

CN113936181B - Recognition method for adhering handwritten English characters

Info

Publication number: CN113936181B
Application number: CN202110877654.2A
Authority: CN
Inventors: 付鹏斌; 宋冬雪; 杨惠荣
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-08-01
Filing date: 2021-08-01
Publication date: 2024-03-26
Anticipated expiration: 2041-08-01
Also published as: CN113936181A

Abstract

The invention discloses a recognition method for adhering handwritten English characters, which is used for cutting and recognizing the adhering characters. Firstly, finding candidate segmentation points based on structural features of an image, taking the candidate segmentation points as initial segmentation points of a segmentation algorithm, and designing segmentation rules to determine segmentation paths; precisely segmenting different types of adhered characters in a multi-strategy segmentation mode; and finally, determining an optimal segmentation path by constructing a segmentation path evaluation method. And (3) sending the segmented handwritten English characters into a recognition model trained by using a convolutional neural network to recognize, and finally obtaining recognition results of the adhered handwritten English characters.

Description

Recognition method for adhering handwritten English characters

Technical Field

The invention relates to the fields of image processing, character recognition and deep learning, in particular to a recognition method for adhering handwritten English characters.

Background

Handwriting English recognition is an important task in computer vision, and particularly has important expression in an automatic review system. The research shows that the English writing modes at home and abroad have obvious differences. The handwriting recognition research is carried out based on domestic English writing habit, and has great significance for improving the handwriting English recognition rate. In the process of handwriting recognition development in recent years, the main recognition modes are firstly segmentation and then recognition, and the other mode is non-segmentation recognition. Due to phenomena of non-ideal recognition effect and the like caused by blindness and non-standardization in the writing process of people, handwriting English recognition is still a great difficulty in artificial intelligence research at present.

For the recognition of handwriting English, words are usually recognized, and most of word recognition is based on single character recognition, so that correct segmentation of words and correct recognition of characters are key for improving word recognition rate. In practical application, adhesion, scratch and the like often occur in the handwritten English characters, so that the model has the trouble of a segmentation technology. Meanwhile, english character types written by different people are greatly changed, and the uncertainty and difficulty of recognition are increased.

The existing vertical projection segmentation algorithm and the existing connected domain segmentation algorithm can be used for well segmenting non-sticky characters of an image, and for the segmentation of sticky characters, the existing dripping algorithm forms a segmentation path of the sticky characters by simulating a trajectory of water drops segmented from high to low, but the selection of the segmentation points of the algorithm takes extreme points on a contour line as the segmentation points, and then the shortest path in the vertical direction is obtained in a connecting stroke area at the position until white pixels are encountered. When encountering characters like the "open" characters "x" and "y", the selection method includes the depressions of the strokes at the segmentation points, so that the wrong segmentation position is generated, and the segmentation error is caused. At present, the recognition accuracy of the handwritten English single character has reached a very high recognition accuracy, but the problem of recognition of the adhered character has not found a generalizable solution, so that a method for effectively separating the adhered character to improve the recognition rate is needed.

Disclosure of Invention

Aiming at the problem of difficult segmentation in the identification of the current sticky characters, candidate segmentation points are found based on the structural characteristics of images by combining the morphological characteristics of the sticky characters, the candidate segmentation points are used as initial segmentation points of a segmentation algorithm, segmentation rules are designed to determine segmentation paths, meanwhile, different types of sticky characters are precisely segmented in a multi-strategy segmentation mode, finally, an optimal segmentation path is determined by constructing an evaluation method of the segmentation paths, single characters after segmentation are obtained, and the identification of the single characters is carried out by utilizing a convolutional neural network training identification model for the segmented handwritten English characters, so that the identification method of the sticky handwritten English characters is realized.

The method for realizing the invention mainly comprises the following steps: firstly, finding out candidate cutting points of the adhesion character image by analyzing the topological structure of the adhesion image and combining the image characteristics of the adhesion area, and taking the candidate cutting points as initial cutting points; then, designing a segmentation rule, and determining a next segmentation point by calculating the segmentation rule satisfied by the pixel values of the five pixel points of the neighborhood of the current segmentation point; then aiming at different types of adhesion characters, adopting a segmentation concept of segmenting and merging firstly, and combining the characteristics of the adhesion characters to accurately segment; and for the segmented paths, after the evaluation score of the overall distribution and the recognition probability is integrated on each path is calculated, selecting a group of paths with the minimum cost in all path subsets as optimal paths, and thus obtaining single characters according to the segmentation of the segmented paths. And finally, training a classification model for the segmented single characters by using a convolutional neural network, so as to obtain a sticky character recognition result.

A recognition method for adhering handwritten English characters comprises the following steps:

step one, extracting candidate cutting points based on structural features, wherein the method specifically comprises the following steps: the key point of determining the blocking character segmentation path is to find the correct starting point and the correct end point of the path, and select the current pixel point with the eight neighborhood pixel points larger than 3 and the projection extremum point of the crest point of the upper outline projection graph and the trough point of the lower outline projection graph of the blocking character in the thinned blocking character image as candidate segmentation points by combining the morphological characteristics of larger stroke density at the blocking position and obvious stroke change trend at the blocking position of the handwritten English blocking image.

Step two, determining a segmentation path according to a segmentation rule, wherein the method specifically comprises the following steps: and sequentially taking the obtained candidate segmentation points as initial segmentation points, judging the pixel values of five pixels which are three pixels below the current segmentation point and two pixels on the left and right of the current segmentation point, substituting the values of the pixels into a designed segmentation rule, and selecting the pixels conforming to the rule as next segmentation points. And connecting all the segmentation points to form a segmentation path until the determined segmentation points reach the marked end point of the character image and the segmentation is finished.

Thirdly, multi-strategy character segmentation is carried out according to different adhesion types, specifically: for single-point adhesion, after determining a cutting path by using a cutting rule on the area above the candidate cutting point, vertically cutting the area below the candidate cutting point, and then merging the two sections of cutting paths. And for common stroke adhesion, extracting the central point of each row of the common stroke section to form a central line as a segmentation path. For multi-point adhesion, as the cutting path of the multi-point adhesion device necessarily passes through a closed area formed by a plurality of adhesion points, and two candidate cutting points of the closed area are necessarily positioned on the same side, the edge parts of the closed area contained between the upper cutting point and the lower cutting point are positioned, the edge points of the closed area in the interval are sequentially connected, and the connecting edge points form the cutting path.

Determining an optimal segmentation path, and segmenting the adhered character according to the segmentation path, wherein the optimal segmentation path comprises the following specific steps of: and after calculating the evaluation score integrating the overall distribution and the recognition probability on each path, selecting a group with the minimum cost in all path subsets as an optimal path set. And sequentially segmenting the adhered characters according to each segmentation path contained in the determined optimal segmentation path set, so as to obtain segmented single character images.

Step five, establishing and expanding a handwriting English character data set, which specifically comprises the following steps: handwritten English data comprising authors of different age groups are collected, and a total 38 classes of handwritten English character data sets comprising case and case are formed, wherein characters with the same case and case writing form are combined into the same class of characters, and the characters are { C/C, F/F, K/K, L/L, M/M, O/O, P/P, S/S, U/U, V/V, W/W, X/X, Y/Y, Z/Z }.

Step six, character recognition, specifically: and (3) dividing the handwritten English character data set constructed in the data preparation stage into a training set and a testing set according to the ratio of 4:1 after screening and normalizing. And constructing an identification model by using a convolutional neural network, respectively sending the single character images obtained by segmentation into the identification model for identification, and finally obtaining an identification result of adhering the handwritten English characters.

Compared with the prior art, the method has the following advantages:

aiming at the problem that the adhered characters in the handwritten English are difficult to cut, a new method for cutting the adhered characters is provided, candidate cutting points can be determined more accurately, the accuracy of cutting is guaranteed, meanwhile, aiming at different types of adhered characters, the design of cutting rules is more targeted, and the cutting accuracy of the adhered characters is improved on the basis of combining morphological characteristics of the characters and path evaluation rules. And then, the convolutional neural network is utilized to identify the single character, so that the overall identification rate of the handwriting English is obviously improved.

Drawings

FIG. 1 is a system architecture diagram of a method in accordance with the present invention;

FIG. 2 is a graph of candidate segmentation points determined from neighborhood pixel values of a current pixel; (a) artwork; (b) cutting the division point; (c) slicing path 1.

FIG. 3 is a graph of candidate segmentation points determined from extreme points of the projection of the upper and lower contours of the sticky character; (a) artwork; (b) cutting the division point; (c) splitting path 1; (d) slicing path 2.

FIG. 4 is a diagram of a segmentation rule for designing sticky characters;

FIG. 5 is a diagram of a segmentation step of a single point sticky character; (a) artwork; (b) cutting the division point; (c) splitting path 1; (d) dicing path 2; (e) merging the split paths.

FIG. 6 is a diagram of a segmentation step of common stroke sticky characters; (a) artwork; (b) cutting the division point; (c) splitting path 1; (d) dicing path 2; (e) merging the split paths.

FIG. 7 is a diagram of a segmentation step of a multi-point sticky character; (a) artwork; (b) cutting the division point; (c) splitting path 1; (d) dicing path 2; (e) merging the split paths.

Fig. 8 is a diagram of determining an optimal slicing path.

Fig. 9 is a diagram of character segmentation according to an optimal path.

Fig. 10 is a diagram showing a partial data example of a handwriting english character data set.

FIG. 11 is a diagram of a convolutional neural network recognition model network.

Detailed Description

The invention is further described below with reference to the drawings and detailed description.

The flow of the method related by the invention comprises the following steps:

(1) Candidate segmentation point extraction based on structural features

Determining the adhesion points of the adhesion characters and taking the adhesion points as candidate segmentation points, wherein the specific method comprises the following steps of:

a. feature point location area analysis

In the thinned character image, the current pixel point 8 neighbor can be calculatedAnd analyzing the change condition of pixels in the domain neighborhood range to obtain the stroke connection condition of the point, thereby obtaining the characteristic point in the image. Let the current pixel point be P _(x,y) Then x in formula 1 _{i(i＝1,2…8)} Is its neighborhood point, and P _(x,y) ＝x ₉ . Wherein when x is _i When the time of=1, the current pixel point is regarded as the background pixel, x _i And when=0, represents a black pixel. When N (P _(x,y) ) When=1, P _(x,y) Is an endpoint; when N (P) _(x,y) ) When not less than 3, P _(x,y) Is a bifurcation point. .

By analyzing the appearance position of the segmentation point of the adhesion character, the position is necessarily the junction of the stroke paths of a plurality of characters, namely the definition of the segmentation point in the characteristic point is satisfied, so that N (P _(x,y) ) And adding candidate cutting points into the pixel points which are not less than 3.

b. Image contour curve analysis

And extracting a contour curve of the image, and extracting feature points according to the curvature value and the change direction on the contour line. Defining the projection value of the upper outline of the character as OLU (x, y'), scanning the adhesion image I (x, y) column by column from top to bottom, and enabling the coordinate value P of the initial black pixel point encountered by each column to be the coordinate value P _(x,y) Composition, expressed as:

OLU(x,y′)＝min{y|P _(x,y) ∈I(x,y),x＝0,1...n-1} (2)

where n is the width of the image, i.e. the number of columns. Similarly, the image is scanned column by column from below, and the lower contour projection values OLD (x, y') are obtained as:

OLD(x,y′)＝max{y|P _(x,y) ∈I(x,y),x＝0,1...n-1} (3)

combining morphological characteristics of English characters, adhering points of the adhering characters can appear in a set of extreme points of the outline projection graph, and a crest point appearing in the upper outline projection graph and a trough point appearing in the lower outline projection graph are selected to serve as projection extreme points to be added into candidate cut-off points after analysis.

(2) Determining a segmentation path according to segmentation rules

According to the pixel values of three pixel points below the current pixel point and the left and right adjacent pixel points, selecting a segmentation rule meeting the condition to determine the next segmentation point, wherein the specific method comprises the following steps:

a. list of determined candidate cut points

b. Traversing the list Initial to obtain the current segmentation pointDefine the initial cut point as +.>The current cutting point is n _(x,y) The next splitting coordinate point of the splitting path is n _(x',y') ；

c. Dividing the current point n _(x,y) According to the segmentation rule shown in FIG. 4, the neighborhood n is calculated _1→5 Pixel values of the dots, wherein 0 represents a black pixel value, 1 represents a white pixel value, and x represents that the current pixel is an arbitrary pixel value. The obtained pixel value is calculated as n ₁ →n ₅ Sequentially performing coding and recording into a list D;

d. judging the numerical value contained in the list D, if D= { n ₃ =1 }, then n _(x',y') ＝n ₃ (x, y+1), i.e., the next division point is the pixel point directly below; if d= {1,0, n ₅ Either = 1} or D = {0, n ₅ =1 }, then n _(x',y') ＝n ₅ (x+1, y), i.e. the next split point is the right pixel point; if D= { n ₁ =1, 0, ×0}, then n _(x',y') ＝n ₁ (x-1, y), i.e. the next split point is the left pixel point; if d= {1, n ₂ =1, 0,1} or d= {1, n ₂ =1, 0, then n _(x',y') ＝n ₂ (x-1, y+1), i.e., the next split point is the lower left pixel point; if d= { x, 0, n ₄ =1, 1}, then n _(x',y') ＝n ₄ (x+1,y+1) Namely, the next dividing point is the pixel point at the lower right;

e. recursively calculate and record n _(x',y') When n is _(x',y') When y' =mark end point, the path ends, all recorded segmentation points n _(x',y') And the connection forms a segmentation path.

(3) Segmentation of multiple types of sticky characters

In order to avoid damage to strokes of characters in the leakage process and to enable the strokes of the characters to be broken, the concept of segmenting and merging before determining segmentation paths for adhering the characters is adopted.

According to the difference of the connection of the adjacent two character adhesion positions, the adhesion characters can be divided into the following types:

single point adhesion: only one adhesion part exists between two characters, and the single-point adhesion character segmentation step is designed as follows:

a. traversing the candidate segmentation point list to obtain the current segmentation pointAnd set the point y _i To mark the end point, define the initial cut point as +.>The next divided coordinate point is n _(x',y') ；

b. For y E [0, y _i ]The segmentation rule of the area application design is used for segmentation until n _(x',y') Middle y' =mark end point y _i Obtaining a segmentation path 1;

c. when the marked end point y is reached _i Then define d as stroke width, for y ε [ y ] _i ,y _i +d]Vertically cutting the region and penetrating the adhesion point; define h as the maximum height of the sticky character, for y ε [ y ] _i +d,h]The segmentation rule of the area application design is used for segmentation until n _(x',y') Middle y' =h, resulting in a slicing path 2;

d. merging all obtained paths;

the entire segmentation flow of single point blocking is shown in fig. 5.

Common stroke adhesion: the two characters share the adhesion caused by the common pen section part, and the segmentation steps are as follows:

a. defining a pen section between the upper and lower segmentation points as a shared part, and recording the segmentation pointsTo share the pen segment starting point, y _i For the upper marker end point, the upper initial cut point is +.>Lower cutting point->The method comprises the steps of sharing a pen section end point and a lower initial cutting point; h is the lower mark end point;

b. for y E [0, y _i ]And y is E [ y ] _j ,h]Dividing the region by applying a designed dividing rule to obtain a dividing path 1;

c. for the shared pen segment, sequentially calculating y E [ y ] _i ,y _j ]From the current pixel point in each row of (3)The abscissa values of the first background pixel point encountered by the first decrementing and incrementing to the left and right are respectively marked as x _i',1 ,x _i',2 Then->Y is the abscissa of the center point of the row _i ' is a vertical coordinate value, and the calculated center point is sequentially recursively calculated as the current pixel point. Calculating all center points which are sequentially connected to form a center line as a segmentation path 2 of the shared pen section;

d. combining the two segmentation paths as a segmentation path of the whole common adhesion part;

the entire segmentation flow for common stroke blocking is shown in fig. 6.

Multipoint adhesion: two or more adhesion parts are arranged between two characters, and the segmentation steps are as follows:

a. scanning the adhesion image to obtain a point coordinate set in the closed region, traversing a candidate segmentation point list, and defining the two segmentation points as the same segmentation point if the two segmentation points belong to the same closed region; record the cutting pointMiddle y _i For the upper marker end point, the upper initial cut point is +.>Lower cutting point->Is the lower initial cutting point; h is the lower mark end point;

c. for y E [ y ] _i ,y _j ]The region is positioned to the edge part of the closed region contained between the upper and lower segmentation points, and the edge points of the closed region in the region are sequentially connected and connected to form a segmentation path 2;

d. combining the two segmentation paths to be used as segmentation paths of the whole multi-point adhesion part;

the entire segmentation flow of the multi-point adhesion is shown in fig. 7.

(4) Determining an optimal slicing path

The initial segmentation point is determined according to the candidate segmentation points, so that a plurality of segmentation paths are obtained, but due to the complexity of the character adhesion condition, it is extremely difficult to obtain all correct segmentation paths at one time. It is therefore necessary to select an optimal slicing path from among a plurality of slicing paths. And after the evaluation score of the overall distribution and the recognition probability is integrated on each path is calculated, selecting a group with the minimum cost in all path subsets as an optimal path set. Firstly, defining a start path and an end path of each available slicing path as a first column and a last column of an input image, and secondly, representing all slicing paths as c= { C ₀ ，C ₁ …C _n-1 Each path C _i Is defined as (x) _ci ,y _ci ). The specific calculation steps are as follows:

a. determining minimum value points from the upper contour projection graph to obtain m trough points, wherein the coordinate width value corresponding to the trough points is xi, and recording the total distance among the m trough points subtracted in turnAveraging>The value is marked as a threshold value T;

b. cutting path C ₀ Set as root node, search x _ci ∈[x _c0 +0.3T，x _c0 +1.3T]Cut path in range as C ₀ Is a child node of (a). If one or more slicing paths exist, adding corresponding nodes into the tree, otherwise setting the current node as a leaf node;

c. the operation is performed iteratively for each non-leaf node until all nodes are added. While according to a complete slicing path, all should end with the last column of the image, its leaf node is not C _n-1 Branch pruning of (2). Recording the depth of each leaf node as depth;

d. in order to obtain an optimal segmentation path, setting each branch score with evaluation scores containing segmentation and identification information in a complete segmentation tree. For each character formed by combining the segment paths, its recognition probability is calculated and expressed as a recognition score W using the average probability value _S . At the same time, on the assumption that the width of each character in the sticky characters is almost equal, let W be _r For the segmentation score, the difference of the width between the characters of the real segmentation path and the difference of the threshold value are very small, and W is calculated by the formula 4 _r ：

e. And calculating and selecting the segmentation path set with the highest W score as the optimal segmentation path by the formula 5.

W＝W _s (i)+W _r (i)，i＝1,2…depth-1； (5)

As shown in FIG. 8, all the calculated slicing paths and the final determined optimal slicing path are shown, i.e. the calculated optimal path is { C } ₀ ，C ₁ ，C ₃ ，C ₆ ，C ₈ ，C ₉ }. After the optimal segmentation path is obtained, as shown in fig. 9, each character is segmented according to the optimal segmentation path, so that a single character with complete character morphology is obtained.

(5) Establishment and expansion of handwriting English character data set

In order to verify the performance of the constructed recognition network model and the recognition accuracy of handwritten English characters, based on published NIST, chars74K and other data sets, handwritten English data comprising students of different ages and partial adults are collected, and based on the handwritten English data, 38 types of handwritten English character data sets comprising case and case are formed through digital image processing algorithms such as translation, rotation, scaling, corrosion, expansion, noise addition and the like, wherein the case and the case are combined into the same type of characters in the same form, and the characters are { C/C, F/F, K/K, L/L, M/M, O/O, P/P, S/S, U/U, V/V, W/W, X/X, Y/Y and Z/Z }. About 10000 handwritten English character images of each type in the data set, and 373352 handwritten English character data sets in total. The partial acquisition data is shown in fig. 10.

(6) Training handwriting English character recognition model based on Convolutional Neural Network (CNN) model

The constructed recognition network model is shown in fig. 11. Where C1 and C3 are convolutional layers, S2 and S4 are pooling layers, and F5 and F6 are fully connected layers. All the convolution layers adopt 5 multiplied by 5 convolution kernels, and all the convolution layers are connected with a pooling layer to reduce the characteristic dimension, wherein the pooling window size is 2 multiplied by 2. In order to enhance the generalization performance of the model, a Dropout layer is added after the full connection layer, a nonlinear activation unit ReLU is added as an activation function, and finally, softMax is used for classification. The standard input to the network is an image of 28 x 28 pixels.

Dividing the data set into a training set and a testing set according to the ratio of 4:1, sending the training set and the testing set into a recognition model for training and verification, respectively sending the single character images obtained by segmentation into the recognition model for recognition, and finally obtaining the recognition result of adhering the handwritten English characters.

Claims

1. The recognition method for adhering the handwritten English characters is characterized by comprising the following steps:

step one, extracting candidate cut points based on structural features, and selecting a current pixel point with an eight-neighborhood pixel point larger than 3 and a wave crest point in an upper outline projection graph of a sticky character and a wave trough point in a lower outline projection graph as candidate cut points in a thinned sticky character image;

step two, determining a segmentation path according to segmentation rules, sequentially taking the obtained candidate segmentation points as initial segmentation points, judging the pixel values of five pixels which are three pixel points below the current pixel point and two pixel points on the left and right of the current pixel point, substituting the values of the pixel points into the designed segmentation rules, and selecting the pixel points conforming to the rules as next segmentation points; until the determined segmentation points reach the marking end point of the character image, the segmentation is finished, and all segmentation points are connected to form a segmentation path;

thirdly, multi-strategy character segmentation is carried out according to different adhesion types, specifically: for single-point adhesion, determining a cutting path by utilizing a cutting rule on the area above the candidate cutting point, vertically cutting the area below the candidate cutting point, and merging two sections of cutting paths; for common stroke adhesion, extracting the center point of each row of the common stroke section to form a central line as a segmentation path; for multi-point adhesion, positioning the edge parts of the closed region contained between the upper and lower segmentation points, sequentially connecting the edge points of the closed region in the interval, and forming a segmentation path by the connecting edge points;

determining optimal segmentation paths, segmenting the adhered characters according to the segmentation paths, firstly calculating an evaluation score of overall distribution and recognition probability on each segmentation path, then selecting a group with the minimum cost in all path subsets as an optimal path set, and finally segmenting the adhered characters in sequence according to each segmentation path contained in the determined optimal segmentation path set to obtain segmented single character images;

step five, establishing and expanding a handwriting English character data set, collecting the published handwriting English character data set, adding handwriting English data of domestic authors in different age periods, and carrying out image deformation expansion data set on the collected handwriting data;

step six, character recognition, namely dividing a handwritten English character data set constructed in a data preparation stage into a training set and a testing set after screening and normalization, constructing a recognition model by utilizing a convolutional neural network, respectively sending single character images obtained by segmentation into the recognition model for recognition, and finally obtaining a recognition result of adhering the handwritten English characters.

2. The method for recognizing stuck handwritten english characters according to claim 1, wherein the method for extracting candidate cut points based on structural features in step one is as follows:

a. in the thinned character image, calculating the change condition of pixels in the neighborhood range of the current pixel point 8, analyzing to obtain the stroke connection condition of the pixel point, and obtaining the characteristic point in the image; let the current pixel point be P _(x，y) X is as follows _{i(i＝1，2...8)} Is its neighborhood point, and P _(x，y) ＝x ₉ The method comprises the steps of carrying out a first treatment on the surface of the Wherein when x is _i When the time of=1, the current pixel point is regarded as the background pixel, x _i Represented as black pixel when=0; when N (P _(x，y) ) When=1, P _(x，y) Is an endpoint; when N (P) _(x，y) ) When not less than 3, P _(x，y) Is a bifurcation point;

selecting a block-thinned image of the sticky character to satisfy N (P _(x，y) ) Adding candidate cutting points into pixel points which are more than or equal to 3;

b. extracting a contour curve of an image, and extracting feature points according to a curvature value and a change direction on the contour line; and selecting a crest point of the upper contour projection graph and a trough point of the lower contour projection graph as projection extreme points to be added into the candidate cut points.

3. The method for recognizing adhered handwritten English characters according to claim 1, wherein the method for determining the segmentation path according to the segmentation rule in the second step is as follows:

a. traversing the determined candidate cut point list to obtain the current cut point n ₀ Calculating five directions n of the neighborhood ₁ ，n ₂ ，n ₃ ，n ₄ ，n ₅ Pixel values of the pixel points, wherein 0 represents a black pixel value, 1 represents a white pixel value, and the obtained pixel values are calculated according to n ₁ →n ₅ Sequentially performing coding and recording into a list;

b. judging the numerical values contained in the list, and selecting the segmentation rules meeting the conditions to determine the next segmentation point according to different segmentation rules corresponding to different numerical values;

c. and recursively calculating and recording the segmentation points, ending the path when the next segmentation point reaches the marking end point, and connecting all recorded segmentation points to form a segmentation path.

4. The method for recognizing adhered handwritten English characters according to claim 1, wherein the method for multi-strategy character segmentation according to the difference of adhered types in the third step is as follows:

single point adhesion: there is only one adhesion between two characters:

a. traversing the candidate cut point list to obtain a current cut point;

b. dividing the area above the current dividing point by applying a designed dividing rule to obtain a dividing path 1;

c. vertically cutting the area below the cutting point to obtain a cutting path 2;

d. merging all obtained paths;

common stroke adhesion: two characters share the adhesion caused by the common pen segment part:

a. defining a pen section between an upper segmentation point and a lower segmentation point as a shared part;

b. dividing the upper and lower areas of the shared pen section by applying a designed dividing rule to obtain a dividing path 1;

c. for the shared pen segment, calculating the current pixel point in each row of the pen segment in turnThe abscissa values of the first background pixel point encountered by the first decrementing and incrementing to the left and right are respectively marked as x _i′，1 ，x _i′2 Then->Y is the abscissa of the center point of the row _i ' is a vertical coordinate value, and the calculated central point is used as a current pixel point to be sequentially recursively calculated; sequentially connecting all calculated center points to form a center line as a segmentation path 2 of a shared pen section;

multipoint adhesion: two or more adhesion parts are arranged between two characters:

a. scanning the adhesion image to obtain a point coordinate set in the closed region, traversing a candidate segmentation point list, and defining the two segmentation points as the same segmentation point if the two segmentation points belong to the same closed region;

b. respectively dividing the upper and lower areas of the pair of dividing points by applying a designed dividing rule to obtain a dividing path 1;

c. for the region between two cutting points, positioning the edge part of the closed region between the upper cutting point and the lower cutting point, sequentially connecting the edge points of the closed region in the region, and connecting the points to form a cutting path 2;

d. and merging the two segmentation paths to be used as the segmentation path of the whole multi-point adhesion part.

5. The method for recognizing adhered handwritten English characters according to claim 1, wherein the method for determining the optimal segmentation path for segmentation in the fourth step is as follows:

a. determining minimum value points by the upper contour projection graph, calculating the average distance between wave valley points, and setting the average distance as a threshold value T;

b. starting from the starting column of the sticky image, all the split paths are denoted as c= { C ₀ ，C ₁ ...C _n-1 Defining each path C _i Is defined as (x) _ci ，y _ci ) The method comprises the steps of carrying out a first treatment on the surface of the Sequentially searching x _ci ∈[x _c0 +0.3T，x _c0 +1.3T]The segmentation path in the range is used as the next segmentation path until the last search path is the last column of the adhered image;

c. for the characters formed by combining each segmentation path, calculating the recognition probability of the characters and representing the recognition probability by using an average probability value, and simultaneously, on the assumption that the width of each character in the adhered characters is almost equal, calculating the segmentation score obtained by comparing the width difference value between the characters formed after segmentation with a threshold value;

d. selecting a segmentation path set with the highest score as an optimal segmentation path;

e. after the optimal segmentation path is obtained, each character is segmented according to the optimal segmentation path, so that single characters with complete character morphology are obtained.

6. The method for recognizing adhered handwritten english characters according to claim 1, wherein the method for creating and expanding the handwritten english character data set in step five is as follows:

a. collecting published NIST, chars74k datasets;

b. collecting handwriting English data of students and partial adults in different age groups;

c. based on the collected data, through digital image processing algorithms such as translation, rotation, scaling, corrosion, expansion, noise addition and the like, 38 types of handwriting English character data sets containing case and case are formed, wherein the same case and case writing forms are combined into the same type of characters.

7. The method for recognizing stuck handwritten english characters according to claim 1, wherein the character recognition method in step six is as follows:

a. the model adopts an identification model trained by a convolutional neural network, and the specific network structure is as follows: the first layer is a convolution layer, the second layer is a pooling layer, the third layer is a convolution layer, the fourth layer is a pooling layer, and the fifth layer and the sixth layer are full-connection layers; all the convolution layers adopt convolution kernels of 5 multiplied by 5, and a pooling layer is connected behind all the convolution layers to reduce the characteristic dimension, wherein the pooling window size is 2 multiplied by 2; in order to enhance the generalization performance of the model, a Dropout layer is added after a full connection layer, a nonlinear activation unit ReLU is added as an activation function, and finally, softMax is used for classification;

b. dividing the collected data set into a training set and a testing set according to the ratio of 4:1, and sending the training set and the testing set into an identification model for training and verification;

c. and respectively sending the single character images obtained by segmentation into a recognition model for recognition, and finally obtaining a recognition result of adhering the handwritten English characters.