CN112507914A

CN112507914A - OCR (optical character recognition) method and recognition system based on bankbook and bill characters

Info

Publication number: CN112507914A
Application number: CN202011482590.8A
Authority: CN
Inventors: 孔飞; 张文强; 褚建民; 李卫国
Original assignee: Jiangsu Guoguang Electronic Information Technology Co Ltd
Current assignee: Jiangsu Guoguang Electronic Information Technology Co Ltd
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2021-03-16

Abstract

The invention discloses an OCR (optical character recognition) method and an OCR recognition system based on bankbook and bill characters, and belongs to the technical field of pattern recognition and computer vision. The method comprises the following steps: step 1, taking bankbook images at any angle, and preprocessing to obtain new bankbook images with corrected angles; step 2, carrying out angle correction on the orientation of the passbook image, and adjusting the orientation to a 0-degree state; step 3, positioning the position of the area to be identified in the corrected passbook image, and constructing a corresponding label of the area to be identified; and 4, recognizing the region to be recognized by adopting an OCR recognition model with an indefinite length, and outputting a recognition result. According to the invention, automatic extraction of the bankbook information is realized through OCR recognition, so that the time cost and labor cost for manually checking the information and inputting the information are reduced, and the working efficiency is greatly improved; by adopting the deep learning model to perform OCR recognition, the information recognition speed and the recognition accuracy are increased, and the method has higher robustness for different printing fonts.

Description

OCR (optical character recognition) method and recognition system based on bankbook and bill characters

Technical Field

The invention belongs to the technical field of pattern recognition and computer vision, and particularly relates to an OCR recognition method and a recognition system based on bankbook and bill characters.

Background

In recent years, computer vision technology is rapidly developed, OCR (optical character recognition) of pictures and characters becomes a popular direction, at present, OCR recognition under complex backgrounds such as natural scenes, financial bills and the like is researched more and has already been applied maturely, and the work efficiency is greatly improved through automatic recognition. The system is designed to be automated aiming at the problems that the efficiency is low in the bank passbook information extraction aspect, the accuracy rate of manually extracting information is reduced and the like.

Through long-term practice and research, the applicant finds that at least the following problems exist in the prior art: 1. the manual mode for extracting the bankbook information is low in efficiency, and the accuracy rate of the manual mode is reduced along with the time increase. 2. The existing OCR recognition system has low compatibility aiming at different application scenes, has higher requirements on the offset angle of a character picture to be recognized and the picture quality, and needs to manually acquire a picture to be recognized in a fixed direction.

Disclosure of Invention

The purpose of the invention is as follows: an OCR recognition method and system based on bankbook and bill characters are provided to solve the problems involved in the background art.

The technical scheme is as follows: an OCR recognition method based on bankbook and bill characters comprises the following steps:

step 1, taking bankbook images at any angle, and preprocessing to obtain new bankbook images with corrected angles;

step 2, carrying out angle correction on the orientation of the passbook image, and adjusting the orientation to a 0-degree state;

step 3, positioning the position of the area to be identified in the corrected passbook image, and constructing a corresponding label of the area to be identified;

and 4, recognizing the region to be recognized by adopting an OCR recognition model with an indefinite length, and outputting a recognition result.

Further, the pretreatment method in step 1 comprises the following steps:

step 11, obtaining a passbook image of any angle shot by a high-speed shooting instrument;

step 12, sharpening and Gaussian smoothing are carried out on the passbook image, and the contrast between the image edge and the surrounding background is enhanced;

step 13, detecting the image edge by using a Sobel operator, performing convolution operation on the image from the transverse direction and the longitudinal direction to obtain an approximate value on a gradient, calculating an image contour, storing the obtained image contour in a point set form, calculating a convex hull of a contour point set and a circumscribed rectangle of the convex hull, and further calculating four vertex coordinates of the contour;

step 14, correcting the bankbook image with the angle offset through perspective transformation, firstly determining coordinates of a new corrected image, and constructing a perspective matrix by the new coordinates and the coordinates of the original bankbook image, wherein the perspective matrix is a 3 x 3 matrix, so that linear transformation, translation transformation and perspective transformation from the original image to the new image are realized;

and step 15, obtaining a new image after angle correction after the passbook image is subjected to perspective matrix transformation.

Further, the angle correction method in step 2 is as follows:

and carrying out angle detection on the image after perspective transformation by using the trained SVM classifier, and turning the image according to different detection angles to obtain a passbook image with the image in a 0-degree state.

Further, the training method of the SVM classifier in step 2 includes the following steps:

step 21, firstly, collecting preset numbers of four types of text pictures turned over by 0 degrees, 90 degrees, 180 degrees and 270 degrees; classifying and identifying the four directions by an SVM classifier, constructing a picture and labels corresponding to classification, wherein the

labels

1, 2, 3 and 4 correspond to 0 degrees, 90 degrees, 180 degrees and 270 degrees respectively;

step 22, extracting HOG characteristics of a gradient histogram of the picture, wherein the HOG characteristics reflect gradient change information of the picture, and the gradient information of characters deflected at different angles is different;

step 23, reducing the dimension of the HOG characteristic by adopting a Principal Component Analysis (PCA) method because the HOG characteristic dimension is higher and is not beneficial to the training of a classifier;

and 24, training the SVM classifier by taking the HOG features subjected to dimensionality reduction as input features to obtain the trained SVM classifier.

Further, the Principal Component Analysis (PCA) method comprises the following steps:

step 241, recording s pieces of d-dimensional HOG characteristic data, and combining the data into a data matrix X with s rows and d columns;

step 242, calculating the mean value of each row of the matrix to form a 1-row d-column matrix

Subtracting the mean value of each row of X or obtaining a new matrix with zero equalization, and marking as X';

step 243, calculate covariance matrix of new matrix X

And calculating the characteristic value and the characteristic vector thereof;

and 244, arranging the eigenvectors into a matrix from top to bottom according to the size of the corresponding eigenvalue, taking the first K rows to form a matrix P, and reducing the dimension to K dimension to obtain the HOG eigenvector matrix Y ═ PX.

Further, the positioning method comprises the following steps: and taking the vertex coordinate of the upper left corner of the angle-corrected passbook image as a fixed point, and positioning each area coordinate to be extracted by adding the offset to the fixed point coordinate as the fixed point position and the position offset relation of the area to be extracted in the passbook, and intercepting the area to be the identification area.

Further, the indefinite length OCR recognition model is based on a densenet network structure, and the model input is a picture needing OCR recognition; firstly, carrying out batch normalization processing on a BN layer, then sending the processed BN layer into a first layer of 3 x 3 convolutional layer, wherein an activation function of the layer is a Relu function, picture features extracted by the convolutional layer are sent into a denseblock layer, the model is provided with three denseblock layers, the middle parts of the denseblock layers are connected through a transition layer, each denseblock layer comprises the BN layer, the Relu activation function and the 3 x 3 convolutional layer, the feature graphs of the layers are consistent in size, and the input of each layer is from the input of all the previous layers; the Transition layer is connected with the two denseblock layers, the size of the characteristic diagram is reduced, and the model is compressed; the layers include a 1 x 1 convolutional layer and a 2 x 2 average pooling layer; and finally, outputting the characteristics output by the third denseblock layer through the BN layer and the full connection layer.

Further, the training method of the OCR recognition model with the indefinite length comprises the following steps: firstly, a corpus is constructed according to character types to be identified, a training data set and a label file of the data set are generated by the corpus, the label file comprises training data names and position information of Chinese characters in the data in the corpus, and the training set is generated; and training the network structure by using a training set, automatically modifying the weight value of the model through the forward calculation result of the training set in the model, stopping training after all training sets have high recognition rate after multiple times of training, and storing the weight value into a model file.

Further, the recognition method of the indefinite length OCR recognition model is: and during identification, calling the network model and the model file through a program, calculating and outputting the label files with the highest probability classification through a softmax function, and outputting a final identification result after retrieving the label files.

The invention also provides a recognition system based on the bankbook and bill character OCR recognition method, which comprises the following steps: the device comprises an image preprocessing module, a detection-oriented module, a positioning module and an OCR recognition module.

The image preprocessing module is used for shooting the passbook image at any angle and processing the passbook image to obtain a new passbook image after angle correction;

the face detection module is used for carrying out angle correction on the face of the passbook image and adjusting the face to a 0-degree state;

the positioning module is used for positioning the position of the area to be identified in the corrected passbook image and constructing a corresponding label of the area to be identified;

and the OCR recognition module is used for recognizing the area to be recognized by adopting an OCR recognition model with an indefinite length and outputting a recognition result.

Has the advantages that: the invention relates to an OCR (optical character recognition) method and an OCR recognition system based on bankbook and bill characters, which have the following advantages compared with the prior art:

1. the bankbook image is sharpened and subjected to Gaussian smoothing processing through an image preprocessing module, the contrast between the image edge and the surrounding background is enhanced, and the recognizable quality of the bankbook image is improved; the passbook images with the angle offset are corrected through perspective transformation, automatic identification of the offset angle of the characters and the pictures is achieved, information can be extracted from the passbook images obtained at any angle, the pictures to be identified do not need to be collected in a manual fixed direction, and using requirements are simplified.

2. The HOG features of the gradient histogram of the picture are extracted, the Principal Component Analysis (PCA) method is adopted to reduce the dimension of the HOG features, training of an SVM classifier is facilitated, the condition that the bill is turned over by 90 degrees or 180 degrees when being sent to be identified possibly exists in practical application, and therefore the calculation speed and accuracy of the classifier for detecting the bill character direction and correcting the direction are improved.

4. The OCR recognition module realizes automatic extraction of the bankbook information, reduces the time cost and labor cost for manually checking and inputting the information, and greatly improves the working efficiency.

5. The self-labeled data is adopted to train the deep learning model for OCR recognition, the model trained by adding the self-labeled practical application data is proved to be more accurate in OCR recognition of bills and passbooks compared with the conventional OCR recognition model while the higher information recognition speed is kept. The addition of the practical application data improves the diversity of the data and ensures that the trained model has higher robustness to different printing forms of the same Chinese character.

6. The image edge method adopted by the invention has a smoothing effect on noise, can provide more accurate calculation edge information, reduces the number of templates required by operation, reduces the complexity of calculation, and has better robustness in the aspect of noise resistance.

In conclusion, the deep learning model is adopted for OCR recognition, so that the labor cost is reduced, the use requirement is simplified, the information recognition speed and the recognition accuracy are increased, and the working efficiency is greatly improved; and has higher robustness for different printing fonts.

Drawings

FIG. 1 is a system flow diagram of the identification system of the present invention.

Figure 2 is a pre-processed passbook picture of the present invention.

Figure 3 is a post-outline passbook picture of the present invention.

Figure 4 is a perspective transformed passbook picture of the present invention.

Figure 5 is a passbook information location interception diagram of the present invention.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the invention.

As shown in fig. 1, a passbook-based, ticket character OCR recognition system includes: the device comprises an image preprocessing module, a detection-oriented module, a positioning module and an OCR recognition module.

The image preprocessing module is used for shooting a passbook image at any angle and processing the passbook image to obtain a new passbook image after angle correction; the orientation detection module is used for carrying out angle correction on the orientation of the passbook image and adjusting the orientation to a 0-degree state; the positioning module is used for positioning the position of the area to be identified in the corrected passbook image and constructing a corresponding label of the area to be identified; and the OCR recognition module recognizes the area to be recognized by adopting an OCR recognition model with an indefinite length and outputs a recognition result.

The identification method is further described based on a bankbook and bill character OCR identification system, and the method specifically comprises the following steps:

step 1, taking bankbook images of any angle, and preprocessing to obtain new bankbook images with corrected angles.

The pretreatment method in the step 1 comprises the following steps:

and 11, acquiring the passbook image at any angle shot by the high-speed shooting instrument.

And step 12, carrying out sharpening and Gaussian smoothing on the passbook image to enhance the contrast between the image edge and the surrounding background. Specifically, let the picture value after graying be F₀(x, y), wherein (x, y) is the coordinate value of the pixel point in the image. The image after Gaussian smoothing is calculated, firstly, a weight matrix is determined to be calculated, and any point (x) in the image is used₀,y₀) For example, a 3 × 3 matrix Z of image coordinates₀The calculation is carried out in such a way that,

wherein (x)₀,y₀) For the matrix center, a gaussian function G (x, y) is used.

Wherein σ is the variance of x; multiplying each point in the matrix to obtain a new 3 x 3 matrix, and performing weighted average on the new matrix to obtain a weight matrix Z with smooth Gaussian₁Is marked as

The image pixel point value matrix corresponding to the coordinates is as follows:

the final Gaussian processed pixel point value is F₁(x₀,y₀)＝∑∑Z₁D₀And calculating the values of all points in the image after the Gaussian processing by analogy to obtain a new image F₁(x,y)。

And step 13, detecting the image edge by using a Sobel operator, performing convolution operation on the image from the transverse direction and the longitudinal direction to obtain an approximate value on the gradient, calculating the image contour, storing the obtained image contour in a point set form, calculating a convex hull of a contour point set and a circumscribed rectangle of the convex hull, and further calculating four vertex coordinates of the contour.

Wherein, the operator for convolution in the x direction is D_x，

The operator for convolution in the y direction is D_y，

Respectively calculating the X direction G of the image after Gaussian processing_x＝D_x*F₁In the y direction G_y＝D_y*F₁The image value after edge detection is | F₂|＝|G_x|+|G_yIn the direction of passing through a set threshold f_maxWhen | F₂| is greater than f_maxThe point may be considered a boundary value. Setting the gray value of the point to be 255 or setting the gray value of the point to be 0, and taking the finally obtained image after the edge detection as F₂And (4) showing.

Calculating an image F₂Scanning the whole picture from left to right and from top to bottom from the upper left corner of the picture, judging the picture as a boundary point when the scanned point has 0 pixel point in an 8 connected domain, marking the 8 connected domain as F (i, j),

wherein (i, j) is taken as the center, (i)₂,j₂) And (5) as a starting point, searching whether the 8 neighborhoods of (i, j) have non-0 pixel points or not in the clockwise direction. If a non-0 pixel is found, the order is (i)₁,j₁) The first non-0 pixel in the clockwise direction.Is (i)₃,j₃) Center, in the counterclockwise direction, (i)₂,j₂) For the starting point (i)₃,j₃) Whether or not there is a non-0 pixel in the 8 neighborhoods of (i)₄,j₄) Is the first non-0 pixel in the counter-clockwise direction. If (i)₄,j₄) (ii) and (i, j)₃,j₃)＝(i₁,j₁) I.e. back to the point where the boundary starts, it is determined whether (i, j) at this time is equal to 1 and if not equal to continue scanning. If (i)₄,j₄) (ii) and (i, j)₃,j₃)＝(i₁,j₁) If not, then (i)₂,j₂)←(i₃,j₃)，(i₃,j₃)←(i₄,j₄) And continuing to perform the operation, and finally obtaining all the contour information and storing the contour information in a point set.

Calculating the circumscribed rectangle of the contour point set, and firstly, taking any point with the minimum abscissa in the contour as p₀Connecting peripheral point contour points as a starting point to form line segments, calculating included angles between the line segments and the downward direction, wherein x is 0, arranging the line segments from small to large according to the included angles, and counting the points as p₁，p₂，p₃And the points are calculated to ensure that all contour points are included in the same direction, and the finally formed polygon is a convex hull of the contour. Enumerating the sides of the polygon, making a circumscribed rectangle, comparing the areas of the circumscribed rectangles, and selecting the smallest one as the circumscribed rectangle of the outline. The coordinates of four vertexes of the rectangle are (m)₀,n₀)，(m₁,n₁)，(m₂,n₂)，(m₃,n₃) (ii) a I.e. the four vertex coordinates of the contour.

Step 14, the passbook image with the angle offset is corrected through perspective transformation, firstly, the coordinates of a new corrected image are determined, a perspective matrix is constructed by the new coordinates and the coordinates of the original passbook image, the perspective matrix is a 3 x 3 matrix and is marked as A, wherein,

wherein in the matrix A

Partly a linear transformation operation of the image,

partly an image perspective transformation operation, [ a ]₃₁ a₃₂]Part of the operation is image translation. Due to two-dimensional plane calculation, a₃₃Defaults to 1. The values of the elements in the matrix A can be obtained by substituting the existing coordinate points into a perspective transformation calculation method, and finally perspective transformation correction from the original image to a new image is realized;

the perspective transformation calculation method comprises the following steps:

wherein (M, N) is a coordinate point before transformation (M, N) is a coordinate point after transformation, and u is 1;

calculating the value of the perspective matrix according to the original vertex coordinates, and finally calculating the value of the perspective matrix according to a formula

And

respectively calculating the coordinates of the transformed images to obtain a new image F after rotation₃。

And 2, carrying out angle correction on the orientation of the passbook image, and adjusting the orientation to a 0-degree state.

The specific detection-oriented method comprises the following steps: step 21, firstly, collecting preset numbers of four types of text pictures turned over by 0 degrees, 90 degrees, 180 degrees and 270 degrees; classifying and identifying four directions by an SVM classifier, constructing a picture and a label corresponding to classification, 0Degrees, 90 degrees, 180 degrees, and 270 degrees correspond to

labels

1, 2, 3, and 4, respectively; step 22, extracting HOG characteristics of a gradient histogram of the picture, wherein the HOG characteristics reflect gradient change information of the picture, and the gradient information of characters deflected at different angles is different; step 23, reducing the dimension of the HOG characteristic by adopting a Principal Component Analysis (PCA) method because the HOG characteristic dimension is higher and is not beneficial to the training of a classifier; step 24, training an SVM classifier by taking the HOG features subjected to dimensionality reduction as input features to obtain the trained SVM classifier; the total number of the d-dimensional HOG characteristic data is recorded. The data are combined into a data matrix X of s rows and d columns. Calculating the mean value of each column of the matrix to form a 1-row and d-column matrix

And subtracting the mean value of each column of X or obtaining a new matrix with zero equalization, and marking the new matrix as X'. Calculating covariance matrix of new matrix X

And calculates its eigenvalues and eigenvectors. And arranging the eigenvectors into a matrix from top to bottom according to the size of the corresponding eigenvalue, taking the first K rows to form a matrix P, and reducing the dimension of the HOG eigenvector matrix to K dimensions to be Y (PX). And 25, carrying out angle detection on the image after perspective transformation by using the trained SVM classifier, and turning the image according to different detection angles to obtain a passbook image with the image in a 0-degree state.

Step 3, positioning the position of the area to be identified in the corrected passbook image, and constructing a corresponding label of the area to be identified; specifically, the vertex coordinate of the upper left corner of the bankbook image after the angle correction is used as a fixed point, and the position offset relationship between the fixed point and the information region to be extracted in the bankbook is fixed, so that the coordinates of each region to be extracted are positioned by adding the offset to the fixed point coordinate, and the region is intercepted, namely the region is the identification region.

Specifically, the coordinates of the vertex at the upper left corner are recorded as (x, y), the area to be recognized is a rectangular area, any vertex of the rectangle is recorded as (x ', y'), the distance between two points is represented by Vx-x '-x and Vy-y' -y, and the areas to be recognized, such as passbooks, bills and the like, are determined, wherein the values of Vx and Vy are determined, and all the areas to be recognized can be calculated through the coordinates of the vertex and the values of Vx and Vy and can be intercepted according to the coordinates.

Specifically, the area intercepted by the positioning module contains character characters to be recognized, the number of the characters in actual conditions is different, and an OCR recognition model with an indefinite length is adopted for recognition. Firstly, a recognition model structure is constructed, the model considers the actual OCR recognition needs, and improvement is carried out on the basis of a densenet network structure. The model input is a picture needing OCR recognition, a BN layer is subjected to batch normalization processing and then sent to a first 3 x 3 convolutional layer, the activation function of the layer is a Relu function, picture features extracted by the convolutional layer are sent to a denseblock layer, the model is provided with three denseblock layers, the middle parts of the denseblock layers are connected through a transition layer, the denseblock layer comprises the BN layer, the Relu activation function and the 3 x 3 convolutional layer, the feature graphs of the layers are consistent in size, and the input of each layer is from the input of all the previous layers. The Transition layer connects the two denseblock layers, reducing the feature map size and compressing the model. The layers include a 1 x 1 convolutional layer and a 2 x 2 average pooling layer. And finally, outputting the characteristics output by the third denseblock layer through the BN layer and the full connection layer. After full connection output, calculating by a softmax function, wherein the calculation formula of the softmax function is

Wherein x is_iFeature vector values representing different classes are output for the full connectivity layer, and M is all classification classes. And mapping each output to be between (0,1), wherein the sum of all mapping values is 1, and the maximum output value after calculation is the classification judged by the model, namely the specific Chinese character.

And (3) performing model training after model construction is completed, and firstly constructing a corpus according to the character types to be identified, wherein the corpus comprises 5990 types of common Chinese characters, numbers, letters and symbols. And generating a training data set and a label file of the data set by the corpus, wherein the label file comprises the name of the training data and the position information of the Chinese characters in the data in the corpus. 100 ten thousand training sets are generated, 1 thousand test sets are used for model training, and the training accuracy is about 95%. Training the weight value of the model automatically modified through the forward calculation result of the training set in the model, stopping training after all training sets have high recognition rate after multiple times of training, and storing the weight value into a model file, wherein the model file is the optimal weight value of the model file represented in a binary form. And during identification, calling the network model and the model file through a program, calculating and outputting the label files with the highest probability classification through a softmax function, and outputting a final identification result after retrieving the label files. The method has high identification accuracy and strong robustness in a complex environment.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. The invention is not described in detail in order to avoid unnecessary repetition.

Claims

1. An OCR recognition method based on bankbook and bill characters is characterized by comprising the following steps:

2. A passbook-based, ticket character OCR recognition method according to claim 1 wherein said preprocessing method in step 1 includes the steps of:

step 13, detecting the image edge by using an operator, performing convolution operation on the image from the transverse direction and the longitudinal direction to obtain an approximate value on the gradient, calculating the image contour, storing the obtained image contour in a point set form, calculating a convex hull of a contour point set and a circumscribed rectangle of the convex hull, and further calculating four vertex coordinates of the contour;

3. An OCR recognition method based on passbook and ticket characters as claimed in claim 1, wherein the angle correction method in step 2 is:

4. A bankbook, ticket character based OCR recognition method according to claim 3 wherein the training method of the SVM classifier in step 2 comprises the steps of:

step 21, firstly, collecting preset numbers of four types of text pictures turned over by 0 degrees, 90 degrees, 180 degrees and 270 degrees; classifying and identifying the four directions by an SVM classifier, constructing a picture and labels corresponding to classification, wherein the labels 1, 2, 3 and 4 correspond to 0 degrees, 90 degrees, 180 degrees and 270 degrees respectively;

step 23, reducing the dimension of the HOG characteristic by adopting a Principal Component Analysis (PCA) method;

5. The passbook-based, instrument character OCR recognition method of claim 4 wherein said principal component analysis PCA method includes the steps of:

step 243, calculate covariance matrix of new matrix X

And calculating the characteristic value and the characteristic vector thereof;

6. A passbook-based, ticket character OCR recognition method as recited in claim 1, wherein the method of locating is: and taking the vertex coordinate of the upper left corner of the angle-corrected passbook image as a fixed point, and positioning each area coordinate to be extracted by adding the offset to the fixed point coordinate as the fixed point position and the position offset relation of the area to be extracted in the passbook, and intercepting the area to be the identification area.

7. An OCR recognition method based on bankbook and ticket characters as claimed in claim 1, wherein the OCR recognition model of indefinite length is based on densnet network structure, and the model input is the picture to be OCR recognized; firstly, carrying out batch normalization processing on a BN layer, then sending the processed BN layer into a first layer of 3 x 3 convolutional layer, wherein an activation function of the layer is a Relu function, picture features extracted by the convolutional layer are sent into a denseblock layer, the model is provided with three denseblock layers, the middle parts of the denseblock layers are connected through a transition layer, each denseblock layer comprises the BN layer, the Relu activation function and the 3 x 3 convolutional layer, the feature graphs of the layers are consistent in size, and the input of each layer is from the input of all the previous layers; the Transition layer is connected with the two denseblock layers, the size of the characteristic diagram is reduced, and the model is compressed; the layers include a 1 x 1 convolutional layer and a 2 x 2 average pooling layer; and finally, outputting the characteristics output by the third denseblock layer through the BN layer and the full connection layer.

8. The bankbook-based, ticket character OCR recognition method of claim 7, wherein the training method of the indefinite length OCR recognition model includes the steps of:

firstly, a corpus is constructed according to character types to be identified, a training data set and a label file of the data set are generated by the corpus, the label file comprises training data names and position information of Chinese characters in the data in the corpus, and the training set is generated; and training the network structure by using a training set, automatically modifying the weight value of the model through the forward calculation result of the training set in the model, stopping training after all training sets have high recognition rate after multiple times of training, and storing the weight value into a model file.

9. An OCR recognition method based on passbooks and ticket characters as claimed in claim 8, wherein the recognition method of the OCR recognition model of indefinite length is: and during identification, calling the network model and the model file through a program, calculating and outputting the label files with the highest probability classification through a softmax function, and outputting a final identification result after retrieving the label files.

10. A recognition system based on the bankbook, ticket character OCR recognition method according to any one of claims 1 to 9, comprising: