CN109583423A - A kind of method, apparatus and associated component of Handwritten Digit Recognition - Google Patents

A kind of method, apparatus and associated component of Handwritten Digit Recognition Download PDF

Info

Publication number
CN109583423A
CN109583423A CN201811554544.7A CN201811554544A CN109583423A CN 109583423 A CN109583423 A CN 109583423A CN 201811554544 A CN201811554544 A CN 201811554544A CN 109583423 A CN109583423 A CN 109583423A
Authority
CN
China
Prior art keywords
handwritten
handwritten numeral
liquid crystal
digital
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811554544.7A
Other languages
Chinese (zh)
Inventor
钟宝江
丁娜
顾平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201811554544.7A priority Critical patent/CN109583423A/en
Publication of CN109583423A publication Critical patent/CN109583423A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/333Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

This application discloses a kind of methods of Handwritten Digit Recognition, and the method includes obtaining, font format is the handwritten numeral sample of liquid crystal number in the first predeterminable area on writing analog board;Wherein, the contour line of multiple liquid crystal numbers is equipped in the first predeterminable area of the writing analog board, so that user is according to the number of the contour line writing liquid crystal number format;The characteristic information of each handwritten numeral in the handwritten numeral sample is extracted, and all characteristic informations are matched to obtain matching result with standard digital characteristic information;The recognition result of the handwritten numeral sample is generated according to all matching results.This method can be improved the recognition accuracy of handwriting digital.Disclosed herein as well is a kind of system of Handwritten Digit Recognition, a kind of computer readable storage medium and a kind of electronic equipment, have the above beneficial effect.

Description

A kind of method, apparatus and associated component of Handwritten Digit Recognition
Technical field
This application involves character recognition technologies field, in particular to a kind of method, apparatus of Handwritten Digit Recognition, Yi Zhongji Calculation machine readable storage medium storing program for executing and a kind of electronic equipment.
Background technique
Character recognition is a research hotspot of pattern-recognition, the Handwritten Digit Recognition technology research important as its point Branch has obtained extensive concern and development.Handwritten Digit Recognition is that a kind of utilization computer and some identification equipment are distinguished automatically Recognize the technology of hand-written Arabic numerals.It is common application mainly including mail automatic sorting, financial statement, Bank bills inspection and Processing, data inputting etc..These tasks are typically necessary very high identification accuracy and very fast recognition speed, however, Since the handwriting style of different people is totally different, gap is larger, therefore the format of handwriting digital is difficult to standardize, and has accurately identified larger Difficulty.
In recent years, the development of depth learning technology provides new way, prior art base to solve Handwritten Digital Recognition In being identified to handwriting digital for neural network.Deep learning is based on one group of algorithm, it is intended to by using multiple non-linear Signal processing stages carry out higher level abstract, the expression of selective learning data to data modeling.In deep learning environment, It has successfully been realized based on the pattern recognition task for indicating study.Availability due to data and the high accuracy to classification task With good generalization ability, neural network has become main method .CNN (the Convolutional Neural solved the problems, such as Network, convolutional neural networks) character and number identification on illustrate outstanding discrimination.The advantage of CNN technology is energy Notable feature constant in input character is enough automatically extracted.Although these models precision with higher, in order to realize Fine tuning has been carried out to extract better feature to the weight of neural network, and so that convergence rate is slower than optimal solution needs and pays Some corresponding costs, i.e., it is a large amount of to calculate and complicated system structure exploitation.
Identification for handwriting digital is all based on greatly the hand-written volumetric data set of MNIST and is trained identification, at present if but making The handwriting digital sample other than the database is tested with the model trained on MNIST database, discrimination is very It is low.That is, based on the neural network model that specific set of data is trained, when being identified to the sample in non-database Accuracy rate is unsatisfactory.CNN identifying system is unstable in practical applications at present, and the raising of Generalization Capability is a bottleneck Difficult point.
Therefore, how to improve the recognition accuracy of handwriting digital is the technology that those skilled in the art need to solve at present Problem.
Summary of the invention
The purpose of the application is to provide a kind of method, apparatus of Handwritten Digit Recognition, a kind of computer readable storage medium And a kind of electronic equipment, it can be improved the recognition accuracy of handwriting digital.
In order to solve the above technical problems, the application provides a kind of method of Handwritten Digit Recognition, this method comprises:
Font format is the handwritten numeral sample of liquid crystal number in the first predeterminable area on acquisition writing analog board;Wherein, The contour line of multiple liquid crystal numbers is equipped in first predeterminable area of the writing analog board, so that user is according to the contour line The number of writing liquid crystal number format;
The characteristic information of each handwritten numeral in the handwritten numeral sample is extracted, and by all characteristic informations and is marked Quasi- digital feature information matches to obtain matching result;
The recognition result of the handwritten numeral sample is generated according to all matching results.
Optionally, the writing analog board includes the second predeterminable area, and second predeterminable area is interior to be equipped with font format The standard digital sample of the liquid crystal number.
Optionally, the characteristic information for extracting each handwritten numeral in the handwritten numeral sample includes:
Each hand-written number in the handwritten numeral sample is extracted by the corresponding detection window of each liquid crystal digital stroke The feature vector of word;
Wherein, the detection window is located on the corresponding contour line of each liquid crystal digital stroke and the detection window The size of mouth is within the scope of pre-set dimension.
Optionally, described to be extracted in the handwritten numeral sample by the corresponding detection window of each liquid crystal digital stroke The feature vector of each handwritten numeral, comprising:
The gray value of multiple positions in predeterminable area is obtained by the corresponding detection window of each liquid crystal digital stroke, and Using the corresponding detection window in the maximum position of the gray value as reference windows;
Judge whether the corresponding gray value of the reference windows is less than gray threshold;If so, determining the reference windows Corresponding liquid crystal digital stroke is effective stroke;If it is not, then determining that the corresponding liquid crystal digital stroke of the reference windows is Invalid stroke;
The feature vector of each handwritten numeral is generated according to the distribution situation of all effective strokes.
Optionally, all characteristic informations are matched to obtain matching result with standard digital characteristic information, comprising:
Pass through the feature vector of each handwritten numeral and the feature of each standard digital class based on Bayes classifier The distance between vector determines the posterior probability of each handwritten numeral;
Judge whether the maximum value of the posterior probability is greater than preset value;
If so, using the corresponding standard digital class of the maximum value of the posterior probability as the handwritten numeral corresponding With result.
Optionally, when the maximum value of the posterior probability is not more than preset value, further includes:
Pass through the feature vector of each handwritten numeral and the feature of each fault tolerant digital class based on Bayes classifier The distance between vector determines the fault-tolerant posterior probability of each handwritten numeral;
Judge whether the maximum value of the fault-tolerant posterior probability is greater than preset value;
If so, determine the corresponding fault tolerant digital class of maximum value of the fault-tolerant posterior probability, and by the fault tolerant digital The corresponding standard digital class of class is as the corresponding matching result of the handwritten numeral;
If it is not, it is lack of standardization then to determine that the handwritten numeral sample is write.
Optionally, the standard digital characteristic information is specially that the standard digital that font format is liquid crystal number is corresponding Characteristic information;Wherein, the standard digital includes 0,1,2,3,4,5,6,7,8 and 9.
Present invention also provides a kind of device of Handwritten Digit Recognition, which includes:
Sample acquisition module, for obtaining the handwritten numeral sample that font format is liquid crystal number;
Characteristic matching module, for extracting the characteristic information of each handwritten numeral in the handwritten numeral sample, and by institute There is the characteristic information to match to obtain matching result with standard digital characteristic information;
Identification module, for generating the recognition result of the handwritten numeral sample according to all matching results.
Present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer The step of program realizes above-mentioned Handwritten Digit Recognition method when executing executes.
Present invention also provides a kind of electronic equipment, including memory and processor, calculating is stored in the memory Machine program, the processor realizes above-mentioned Handwritten Digit Recognition method when calling the computer program in the memory execute The step of.
This application provides words in the first predeterminable area in a kind of method of Handwritten Digit Recognition, including acquisition writing analog board Physique formula is the handwritten numeral sample of liquid crystal number;Wherein, multiple liquid are equipped in the first predeterminable area of the writing analog board Number of crystals glyph line, so that user is according to the number of the contour line writing liquid crystal number format;It extracts described hand-written The characteristic information of each handwritten numeral in numeral sample, and all characteristic informations are matched with standard digital characteristic information To matching result;The recognition result of the handwritten numeral sample is generated according to all matching results.
The application obtains the handwritten numeral sample that font format is liquid crystal number, passes through the writing of unified digital book writer Font, so that the handwritten numeral sample obtained has the format of opposite specification, to significantly mitigate the pressure of later period cognitive phase. Since the handwritten numeral sample of the application is all written on writing analog board, there are the contour line of liquid crystal number, profiles on writing analog board Line can effectively assist the number of user's writing liquid crystal number format.Obtaining the hand that font format is liquid crystal number On the basis of writing numeral sample, characteristic information matching can be carried out to hand-written numeral sample and finally obtains recognition result.Due to hand Writing the opposite specification of numeral sample format reduces the complexity of sorting algorithm, and higher knowledge can be realized without largely training Other accuracy rate.The application additionally provides system, a kind of computer readable storage medium and one of a kind of Handwritten Digit Recognition simultaneously Kind electronic equipment, has above-mentioned beneficial effect, details are not described herein.
Detailed description of the invention
In ord to more clearly illustrate embodiments of the present application, attached drawing needed in the embodiment will be done simply below It introduces, it should be apparent that, the drawings in the following description are only some examples of the present application, for ordinary skill people For member, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart of the method for Handwritten Digit Recognition provided by the embodiment of the present application;
Fig. 2 is a kind of schematic layout pattern of writing analog board;
Fig. 3 is the handwritten numeral sample graph obtained from writing analog board;
Fig. 4 is a kind of flow chart of the matching process of handwritten numeral provided by the embodiment of the present application;
Fig. 5 is the distribution schematic diagram of detection window in digital template;
Fig. 6 is that the detection window of hand-written liquid crystal number 3 samples schematic diagram;
Fig. 7 is the clerical error exemplary diagram that human eye can be fault-tolerant;
Fig. 8 is a kind of structural schematic diagram of the system of Handwritten Digit Recognition provided by the embodiment of the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
Different handwritten numeral samples have differences in the stroke depth and stroke weight, and redundancy stroke will form and make an uproar Sound will affect recognition effect.Traditional character identifying method can first carry out series of preprocessing operation (such as Gauss to input picture Blurring, binaryzation etc.), then carry out feature extraction, it is intended to it reduces these differences and noise bring identifies mistake.These pre- places Reason step often complicates algorithmic procedure, and is likely to occur unstable recognition result.Such as in computer marking system Application scenarios in, accuracy rate as high as possible is needed to the identification of student's student number.For example a student number is by 10 bit digital groups At, then need 10 bit digitals that can correctly identify, the student number record could effectively.That is, right in this application scenarios The loss function of individual digit identification is 10 times of the generally recognized task.The digit recognition method of mainstream is that neural network is calculated at present Method.It is used in the neural network model trained on the handwritten form database (MNIST database) of standard to identify it, tie Fruit accuracy only has 62% or so, and effect is very unsatisfactory.If it is desire to improving accuracy, then need to expand or replace training number According to collection, re -training network model, workflow is complicated, and the requirement for professional knowledge and hardware facility is also relatively high.I.e. Just in this way, the discrimination of single character generally can only also be promoted to 99% or so identified at this point for a string of student numbers it is accurate Rate only has 90%, still not ideal enough.In other similar application scenarios, number is occurred with character string forms 's.As it can be seen that the recognition accuracy of single handwriting digital needs further to be promoted.
Based on the above issues, above-mentioned technical problem is solved herein by following embodiment, realizes and improves handwritten numeral The purpose of recognition accuracy.
Below referring to Figure 1, Fig. 1 is a kind of process of the method for Handwritten Digit Recognition provided by the embodiment of the present application Figure.
Specific steps may include:
S101: font format is the handwritten numeral sample of liquid crystal number in the first predeterminable area on acquisition writing analog board;
Number is a kind of mark for indicating number, and international number is Arabic numerals.Due to different people Writing style it is different, therefore handwriting digital can all have notable difference from various aspects such as form, size, structure, the depths, Handwritten numeral even human eye in part is all difficult to.For this kind of scene of computer marking system, to single The recognition accuracy requirement of number is very high, and the variability of traditional handwriting Arabic numerals will make this problem become abnormal tired It is difficult.Although answer card identifying system discrimination is high, due to scribbling the factors such as result is not intuitive, is difficult to check, wrong number is applied The case where also happen occasionally.The present embodiment proposes a kind of new handwriting digital recording mode, i.e. " hand-written liquid crystal number ", Hand-written liquid crystal number is the handwritten numeral that font format is liquid crystal number, that is to say, that passes through writing analog board in the present embodiment The handwritten numeral sample of acquisition is hand-written liquid crystal number.
Wherein, the handwritten numeral sample of the present embodiment default acquisition is number of the user on writing analog board, referring herein to Writing analog board the first predeterminable area in be equipped with multiple liquid crystal numbers contour line, so that user is according to the contour line book Write the number of liquid crystal number format.Refer to Fig. 2, Fig. 2 is a kind of schematic layout pattern of writing analog board, in the first predeterminable area There are the contour line of liquid crystal digital " 8 ", user can write number according to the contour line in the first predeterminable area.It needs Bright, the contour line in the present embodiment only plays the role of that user is assisted to write, and contour line can be shallower (such as color Grey) dotted line, so as to not identified to contour line when carrying out Handwritten Digit Recognition.It can be by user on writing analog board The process write is interpreted as the similar process that liquid crystal font is shown on charactron, since the contour line of writing analog board is deposited In the number that user can be assisted to write out liquid crystal digital font format on writing analog board.
As an alternative embodiment, in writing analog board in the present embodiment liquid crystal number contour line can for The inclined contour line in the upper right corner, to meet the writing style of most people.
Refer to Fig. 2, further improved as to writing analog board, writing analog board in addition to include the first predeterminable area other than, It can also include the second predeterminable area, the criterion numeral that font format is the liquid crystal number is equipped in second predeterminable area Printed words example, so that user's reference standard number sample carries out correct digital writing.
The writing analog board that the present embodiment proposes changes the recording mode of handwriting digital, passes through liquid crystal digital record side Formula is preposition by the committed step solved the problems, such as, and collected numeral sample to be identified has the format of opposite specification, so as to The significant pressure for mitigating later period cognitive phase.The corresponding recognition methods proposed is realized based on Bayes classifier, without prior Training, algorithm complexity are low.
Fig. 3 is referred to, Fig. 3 is the handwritten numeral sample graph obtained from writing analog board, compares common handwritten numeral sample This, the present embodiment collected handwritten numeral sample stable structure, style is relatively uniform, to ensure that the correct of identification Rate.Compared with traditional answering card system, user can intuitively check the writing of oneself as a result, avoiding holding on traditional answering card The full-filling mistake easily occurred.In addition, the image after general scanning machine or mobile phone photograph can be carried out directly in systems Processing and identification, do not need special reading machine, do not need additional special question-answering paper and answering pen yet, have been obviously improved system The convenience used.
S102: the characteristic information of each handwritten numeral in the handwritten numeral sample is extracted, and all features are believed Breath matches to obtain matching result with standard digital characteristic information;
Wherein, Handwritten Digit Recognition operation in the prior art is after acquiring handwritten numeral sample, in order to improve algorithm pair It is needed to the execution of hand-written numeral sample and processing operation (such as image blurring and binaryzation), still in the robustness of picture noise Above-mentioned pre-treatment step can lose the part effective information of image while improving robustness of the algorithm for picture noise. This namely in the prior art, lower one of the reason of Handwritten Digit Recognition result accuracy rate.
The present embodiment in order to guarantee the stability of identification process and result, not using in traditional recognition method to image sample The pretreatment operation of this progress, directly from feature is extracted in S101 in the image of handwritten numeral sample, process is stable and calculates multiple Miscellaneous degree is low.Further, can also be guaranteed simultaneously by the characteristic (being classified based on posterior probability) of Bayesian model The noiseproof feature of algorithm.
Handwritten numeral sample acquired in the present embodiment may include multiple handwritten numerals, and the characteristic information of handwritten numeral is Refer to the information of written handwriting feature of the description handwritten numeral on the contour line in the first predeterminable area of digital template;Standard digital Characteristic information refers to the information of the written handwriting of the standard digital of description liquid crystal number format, such as above-mentioned Fig. 2 be previously mentioned the The handwriting characteristic information of standard digital sample in two predeterminable areas.
The purpose of this step is each handwritten numeral and all standard digitals in the handwritten numeral sample that will acquire Sample (usually 10) is compared one by one, is tied using matching degree or the highest standard digital sample of similarity as matching Fruit.Specific matching process will be described in subsequent other embodiments.
S103: the recognition result of the handwritten numeral sample is generated according to all matching results.
Wherein, matching one by one is carried out to hand-written numeral sample in S102 and obtains the corresponding matching result of each handwritten numeral On the basis of, it can be according to the identification knot that puts in order obtain the handwritten numeral sample of each matching result on writing analog board Fruit.
The present embodiment obtains the handwritten numeral sample that font format is liquid crystal number, passes through the book of unified digital book writer Write body, so that the handwritten numeral sample obtained has the format of opposite specification, to significantly mitigate the pressure of later period cognitive phase Power.Since the handwritten numeral sample of the present embodiment is all written on writing analog board, there are the contour line of liquid crystal number on writing analog board, Contour line can effectively assist the number of user's writing liquid crystal number format.It is liquid crystal number obtaining font format Handwritten numeral sample on the basis of, can to hand-written numeral sample carry out characteristic information matching finally obtain recognition result.By The complexity of sorting algorithm is reduced in the opposite specification of handwritten numeral sample format, can be realized without largely training higher Recognition accuracy.The identification technology that the present embodiment is proposed directly will carry out feature to the digital picture to be identified of input and mention It takes, original image information can be kept as completely as possible, while ensure that the stability of algorithm operation.The embodiment will not Writing style with people normalizes, so that the sample to be identified of input is converted into the digital representation being basically unchanged.It is last real The reliability classification to hand-written liquid crystal number and identification are showed.Compared with the conventional method, the identification process of the present embodiment is simply steady It is fixed, and better recognition effect can be obtained
Fig. 4 is referred to below, and Fig. 4 is a kind of process of the matching process of handwritten numeral provided by the embodiment of the present application Figure;The present embodiment is the materialization description to S102 correlation step in Fig. 1 corresponding embodiment, can be by the present embodiment and Fig. 1 pairs The embodiment answered combines and obtains more preferably embodiment, and the specific steps of the present embodiment may include:
S201: every proficiency in the handwritten numeral sample is extracted by the corresponding detection window of each liquid crystal digital stroke Write the feature vector of number;
Wherein, the detection window is located on the corresponding contour line of each liquid crystal digital stroke and the detection window The size of mouth is within the scope of pre-set dimension.Detection window refers to the window that whether there is written handwriting in the first predeterminable area of detection Mouthful, the corresponding detection window of each liquid crystal digital stroke refers to each liquid crystal digital stroke in the first predeterminable area of detection Position whether there is the window of written handwriting.Specifically, being up to 7 strokes in each liquid crystal Digital Theory, therefore this step Suddenly feature vector is extracted indeed through the corresponding detection window in the position of theoretic seven strokes.
Fig. 5 is referred to, Fig. 5 is the distribution schematic diagram of detection window in digital template, R1, R2, R3, R4, R5, R6 in Fig. 5 7 detection windows corresponding with a handwritten numeral on R7 writing analog board.It, can be according to the detection window distribution mode in Fig. 5 Detection window is correspondingly set on profile line position to sample the stroke being likely to occur.Fig. 6 is referred to, Fig. 6 is hand-written liquid crystal The detection window of number 3 samples schematic diagram, and R1, R2, R3, R5 and R7 have person's handwriting that can collect in figure.Based on sampled result, often A number to be identified can all be represented as a 7 dimensional vector since lateral stroke and vertical stroke have different length, because This can use two kinds of various sizes of windows and be covered each by, as shown in Figure 5 transverse direction stroke corresponding window R1, R2 and R3 with Vertical stroke corresponding window R4, R5, R6 and R7 are two kinds of various sizes of detection windows.It, can if window size is too small Illusion existing for stroke can be formed so as to cause wrong identification by the noise point sampled as real stroke, therefore can be with The window of larger size is selected, to guarantee the reliability of stroke sampling.
The problem of due to write error, these windows may not only cover target stroke on position initially set, A part that adjacent window apertures correspond to stroke is further comprised, at this moment target stroke can be by misrepresentation.In order to solve this problem, may be used To allow the position of detection window to float in a certain range, multiple sampled results are obtained, it is final according to the sum of gray value maximum The detection window of (minimum comprising black pixel point) judges that stroke whether there is.Therefore as in Fig. 2 corresponding embodiment S201's advanced optimizes, and may comprise steps of:
Step 1, the ash that multiple positions in predeterminable area are obtained by the corresponding detection window of each liquid crystal digital stroke Angle value, and using the corresponding detection window in the maximum position of the gray value as reference windows;
Wherein, this step is the detection window size that the region existing for each liquid crystal digital stroke obtains multiple positions The gray value in region, using the maximum position of gray value (i.e. black pixel point is minimum) corresponding detection window as reference windows. Referring herein to the gray value of detection window refer to the sum of the gray value of all pixels point in the corresponding region of the detection window.
Step 2 judges whether the corresponding gray value of the reference windows is less than gray threshold;If so, determining the ginseng Examining the corresponding liquid crystal digital stroke of window is effective stroke;If it is not, then determining the corresponding liquid crystal number of the reference windows Stroke is invalid stroke;
Wherein, if the sum of gray value of reference windows is less than given threshold value, then it is assumed that occur on corresponding position Stroke, referred to as " effective stroke ".By taking Fig. 5 as an example, for digital " 3 ", there are 5 windows (R1, R2, R3, R5 and R7) true Think that there are effective strokes.
Step 3, the feature vector that each handwritten numeral is generated according to the distribution situation of all effective strokes.
The present embodiment can define the feature vector of each handwritten numeral are as follows: x=to indicate numeral sample to be identified (x1,x2,x3,x4,x5,x6,x7)T
Wherein, xjA certain detection window in digital 7 detection windows of liquid crystal in (j=1,2 ..., 7) expression images to be recognized The sum of the gray value of mouth.Since different digital sample is variant in terms of the thickness of person's handwriting, the depth, and the same sample Lateral stroke and vertical stroke may also thickness it is different, constant sample indicates in order to obtain, can with laterally, vertical effective window The mean value of mouth gray value carries out reduction to the characteristic value of stroke in both direction respectively.For the ease of indicating, the sample after reduction Feature vector still indicated with vector x.
Specifically, xjThere are its unique corresponding detection window, by taking Fig. 4 and Fig. 5 as an example, x1Corresponding R1, x2Corresponding R2, x3It is right Answer R3, x4Corresponding R4, x5Corresponding R5, x6Corresponding R6, x7Corresponding R7.For digital " 3 ", have 5 windows (R1, R2, R3, R5 and R7 the feature vector for) being confirmed to be the handwritten numeral therefore obtained there are effective stroke is (1,1,1,0,1,0,1)T, work as xjIt is corresponding Detection window there are x when effective strokejValue is 1, works as xjValue is 0 when effective stroke is not present in corresponding detection window.
Further, vector r can also be usedi=(r1,r2,r3,r4,r5,r6,r7)T, i=0,1,2,3,4 ..., 9, difference table Show the standard feature vector of 0~9 this 10 numeric classes.Specifically, vector riIn some element value if 1, then it represents that template Corresponding position with the presence of stroke;If 0, then it represents that corresponding position is without stroke for example, with vector r2=(1,1,1,0,1,1, 0) it is indicated as the category feature of standard digital " 2 ", may be interpreted as position corresponding to the 4th and the 7th component without stroke, remaining position Setting has stroke.The corresponding feature vector of all standard digital classes is as follows:
Standard digital 0 is (1,0,1,1,1,1,1)T
Standard digital 1 is (0,0,0,1,0,1,0)T
Standard digital 2 is (1,1,1,0,1,1,0)T
Standard digital 3 is (1,1,1,0,1,0,1)T
Standard digital 4 is (0,1,0,1,1,0,1)T
Standard digital 5 is (1,1,1,1,0,0,1)T
Standard digital 6 is (1,1,1,1,0,1,1)T
Standard digital 7 is (1,0,0,0,1,0,1)T
Standard digital 8 is (1,1,1,1,1,1,1)T
Standard digital 9 is (1,1,1,1,1,0,1)T
S202: the feature vector and each standard digital class that each handwritten numeral is passed through based on Bayes classifier The distance between feature vector determines the posterior probability of each handwritten numeral;
Bayes classifier is a kind of to determine the general of new samples output attribute based on Bayes' theorem and maximum a posteriori probability Rate disaggregated model calculates probability based on this mutually indepedent hypothesis of input feature vector, and realization classifies to model, is suitble to processing More classification problems.Therefore, the present embodiment has selected the classifier to identify liquid crystal number, and algorithm structure and realization are simple, but Ideal effect is reached in terms of recognition speed and accuracy rate.
Firstly, for each digital classification ωi, its prior probability can be calculated, is calculated as follows:
Wherein NiIndicate the quantity of the i-th class digital picture in data set, N indicates the sum of digital picture in data set.? Student number identifies in this application scenarios that the prior probability of 10 numeric classes is obviously 0.1, this is the equal feelings of prior probability Shape.The unequal situation of prior probability may also be will appear in other scenes, such as when identifying ID card No., identity card sequence First of column can only take 1~6 and 8 this seven bit digital, it is clear that the prior probability of these ten types of numbers is not equal, these seventh types of numbers The prior probability of word will be greater than other three classes number.The prior probability of 10 numeric classes can be respectively configured according to statistical data, To guarantee the accuracy rate finally identified.
Further, can the distance between feature vector based on numeral sample to be identified and each numeric class come Calculate its likelihood p for belonging to every a kind of numberi, calculate as follows: enabling
WhereinEuclidean distance between sample and category feature vector as to be identified, passes through normalizing Change, by piThe probability being each mapped between 0 to 1, the class conditional probability density of sample X as to be identified calculate as follows:
Finally, calculating the Posterior probability distribution of sample to be identified according to Bayesian formula, calculate as follows:
According to Bayes's classification criterion, the maximum class of posterior probability value is digital classification belonging to sample to be identified, That is:
If the specification of user not as requested is write, it is this in order to distinguish to will lead to the unreliable of classification results Situation, it may be considered that contain " refusing to sentence " decision classification method, i.e., if
Then refuse to sentence in conjunction with (formula 5) and (formula 6) to X, the specific disaggregated model of the present embodiment is are as follows:
Wherein τPDefault value is 0.16.
S203: judge whether the maximum value of the posterior probability is greater than preset value;If so, into S204;If it is not, then tying Line journey.
S204: using the corresponding standard digital class of the maximum value of the posterior probability as the corresponding matching of the handwritten numeral As a result.
7 windows are arranged in the coordinate that above-described embodiment is primarily based on writing analog board, and the information for thus acquiring digital picture is special Sign;Then the standard feature vector of the feature of sample to be identified and 10 class different digitals is carried out according to bayes decision method Match;Classification and identification are finally completed according to maximum posteriori criterion.The computation complexity of the algorithm is low, therefore has cracking Recognition speed meets the requirement identified in real time online.
For computer marking system, it is contemplated that some users may not require to fill out fully according to digital template It writes, results in and had differences between practical hand-written liquid crystal numeral sample and standard form number.Fig. 7 is referred to, Fig. 7 is human eye Clerical error exemplary diagram that can be fault-tolerant can sum up some representative mistakes based on collected digital sample. As shown in Figure 71,6,9 these three digital classifications can carry out it is fault-tolerant, it is fault-tolerant to indicate to be known as " tolerant code ".It, can be in classification The feature vectors of these three types of numbers are supplemented, that is, by the feature vector of sample to be identified and these three numeric classes Standard code and tolerant code carry out probability calculation respectively, and then the result high according to probability value is classified, and obtain final identification knot Fruit.
In order to correctly identify mistake that these human eyes can be tolerated, following fault-tolerant plan can be joined in identification process Slightly, to improve the efficiency of identification.
Step 1, when the maximum value of the posterior probability be not more than preset value when, based on Bayes classifier pass through each institute It states the distance between the feature vector of handwritten numeral and the feature vector of each fault tolerant digital class and determines each handwritten numeral Fault-tolerant posterior probability;
Correspondingly, each fault tolerant digital class has its corresponding feature vector, for the situation in Fig. 7, fault tolerant digital class Feature vector may include:
Fault tolerant digital class " 1 ": (0,0,0,0,1,0,1)T
Fault tolerant digital class " 6 ": (0,1,1,1,0,1,1)T
Fault tolerant digital class " 1 ": (1,1,0,1,1,0,1)T
Step 2 judges whether the maximum value of the fault-tolerant posterior probability is greater than preset value;If so, entering step 3;If It is no, then enter step 4;
Step 3 determines the corresponding fault tolerant digital class of the maximum value of the fault-tolerant posterior probability, and by the fault tolerant digital class Corresponding standard digital class is as the corresponding matching result of the handwritten numeral;
Step 4 determines that the handwritten numeral sample writing is lack of standardization.
If the numeral sample of three kinds " mistake " can be classified mistake in Fig. 7 without increasing above-mentioned fault-tolerant strategy in algorithm Or refused to sentence, discrimination is unsatisfactory.Therefore, which is effectively increased the reliability of recognizer and identification is The availability of system, improves recognition accuracy.
Fig. 8 is referred to, Fig. 8 is a kind of structural representation of the system of Handwritten Digit Recognition provided by the embodiment of the present application Figure;
The system may include:
Sample acquisition module 100, for obtaining the handwritten numeral sample that font format is liquid crystal number;
Characteristic matching module 200, for extracting the characteristic information of each handwritten numeral in the handwritten numeral sample, and will All characteristic informations match to obtain matching result with standard digital characteristic information;
Identification module 300, for generating the recognition result of the handwritten numeral sample according to all matching results.
The present embodiment obtains the handwritten numeral sample that font format is liquid crystal number, passes through the book of unified digital book writer Write body, so that the handwritten numeral sample obtained has the format of opposite specification, to significantly mitigate the pressure of later period cognitive phase Power.Since the handwritten numeral sample of the present embodiment is all written on writing analog board, there are the contour line of liquid crystal number on writing analog board, Contour line can effectively assist the number of user's writing liquid crystal number format.It is liquid crystal number obtaining font format Handwritten numeral sample on the basis of, can to hand-written numeral sample carry out characteristic information matching finally obtain recognition result.By The complexity of sorting algorithm is reduced in the opposite specification of handwritten numeral sample format, can be realized without largely training higher Recognition accuracy.Further, the writing analog board includes the second predeterminable area, is equipped with font in second predeterminable area Format is the standard digital sample of the liquid crystal number.
Further, the characteristic matching module 200 includes:
Vector extraction unit, for extracting the handwritten numeral by the corresponding detection window of each liquid crystal digital stroke The feature vector of each handwritten numeral in sample;
Matching unit matches to obtain matching result for all characteristic informations with standard digital characteristic information;
Wherein, the detection window is located on the corresponding contour line of each liquid crystal digital stroke and the detection window The size of mouth is within the scope of pre-set dimension.
Further, vector extraction unit is used to obtain by the corresponding detection window of each liquid crystal digital stroke default The gray value of multiple positions in region, and using the corresponding detection window in the maximum position of the gray value as reference windows;Also For judging whether the corresponding gray value of the reference windows is less than gray threshold;If so, determining that the reference windows are corresponding Liquid crystal digital stroke be effective stroke;If it is not, then determining that the corresponding liquid crystal digital stroke of the reference windows is invalid Stroke;It is also used to generate the feature vector of each handwritten numeral according to the distribution situation of all effective strokes.
Further, matching unit be used for based on Bayes classifier through the feature vector of each handwritten numeral and The distance between feature vector of each standard digital class determines the posterior probability of each handwritten numeral;It is also used to judge institute Whether the maximum value for stating posterior probability is greater than preset value;If so, by the corresponding standard digital of the maximum value of the posterior probability Class is as the corresponding matching result of the handwritten numeral.
Further, when the maximum value of the posterior probability is not more than preset value, further includes:
Fault-tolerant processing module, for based on Bayes classifier by the feature vector of each handwritten numeral with it is each The distance between feature vector of fault tolerant digital class determines the fault-tolerant posterior probability of each handwritten numeral;It is also used to judge institute Whether the maximum value for stating fault-tolerant posterior probability is greater than preset value;If so, determining that the maximum value of the fault-tolerant posterior probability is corresponding Fault tolerant digital class, and tied the corresponding standard digital class of the fault tolerant digital class as the corresponding matching of the handwritten numeral Fruit;If it is not, it is lack of standardization then to determine that the handwritten numeral sample is write.
Further, the standard digital characteristic information is specially that the standard digital that font format is liquid crystal number is corresponding Characteristic information;Wherein, the standard digital includes 0,1,2,3,4,5,6,7,8 and 9.
Since the embodiment of components of system as directed is corresponded to each other with the embodiment of method part, the embodiment of components of system as directed is asked Referring to the description of the embodiment of method part, wouldn't repeat here.
Present invention also provides a kind of computer readable storage mediums, have computer program thereon, the computer program It is performed and step provided by above-described embodiment may be implemented.The storage medium may include: USB flash disk, mobile hard disk, read-only deposit Reservoir (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or The various media that can store program code such as CD.
Present invention also provides a kind of electronic equipment, may include memory and processor, have meter in the memory Calculation machine program may be implemented provided by above-described embodiment when the processor calls the computer program in the memory Step.Certain electronic equipment can also include various network interfaces, the components such as power supply.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For system disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration ?.It should be pointed out that for those skilled in the art, under the premise of not departing from the application principle, also Can to the application, some improvement and modification can also be carried out, these improvement and modification also fall into the protection scope of the claim of this application It is interior.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.Under the situation not limited more, the element limited by sentence "including a ..." is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Claims (10)

1. a kind of method of Handwritten Digit Recognition characterized by comprising
Font format is the handwritten numeral sample of liquid crystal number in the first predeterminable area on acquisition writing analog board;Wherein, described The contour line of multiple liquid crystal numbers is equipped in first predeterminable area of writing analog board, so that user writes according to the contour line The number of liquid crystal number format;
Extract the characteristic information of each handwritten numeral in the handwritten numeral sample, and by all characteristic informations and criterion numeral Word characteristic information matches to obtain matching result;
The recognition result of the handwritten numeral sample is generated according to all matching results.
2. method according to claim 1, which is characterized in that the writing analog board include the second predeterminable area, described second The standard digital sample that font format is the liquid crystal number is equipped in predeterminable area.
3. method according to claim 1, which is characterized in that extract the spy of each handwritten numeral in the handwritten numeral sample Reference ceases
Each handwritten numeral in the handwritten numeral sample is extracted by the corresponding detection window of each liquid crystal digital stroke Feature vector;
Wherein, the detection window is located on the corresponding contour line of each liquid crystal digital stroke and the detection window Size is within the scope of pre-set dimension.
4. method according to claim 3, which is characterized in that described to pass through the corresponding detection window of each liquid crystal digital stroke Mouth extracts the feature vector of each handwritten numeral in the handwritten numeral sample, comprising:
Obtain the gray value of multiple positions in predeterminable area by the corresponding detection window of each liquid crystal digital stroke, and by institute The corresponding detection window in the maximum position of gray value is stated as reference windows;
Judge whether the corresponding gray value of the reference windows is less than gray threshold;If so, determining that the reference windows are corresponding Liquid crystal digital stroke be effective stroke;If it is not, then determining that the corresponding liquid crystal digital stroke of the reference windows is invalid Stroke;
The feature vector of each handwritten numeral is generated according to the distribution situation of all effective strokes.
5. method according to claim 3, which is characterized in that by all characteristic informations and standard digital characteristic information With obtaining matching result, comprising:
Pass through the feature vector of each handwritten numeral and the feature vector of each standard digital class based on Bayes classifier The distance between determine the posterior probability of each handwritten numeral;
Judge whether the maximum value of the posterior probability is greater than preset value;
If so, being tied the corresponding standard digital class of the maximum value of the posterior probability as the corresponding matching of the handwritten numeral Fruit.
6. method according to claim 5, which is characterized in that when the maximum value of the posterior probability is not more than preset value, Further include:
Pass through the feature vector of each handwritten numeral and the feature vector of each fault tolerant digital class based on Bayes classifier The distance between determine the fault-tolerant posterior probability of each handwritten numeral;
Judge whether the maximum value of the fault-tolerant posterior probability is greater than preset value;
If so, determine the corresponding fault tolerant digital class of maximum value of the fault-tolerant posterior probability, and by the fault tolerant digital class pair The standard digital class answered is as the corresponding matching result of the handwritten numeral;
If it is not, it is lack of standardization then to determine that the handwritten numeral sample is write.
7. according to claim 1 to any one of 6 the methods, which is characterized in that the standard digital characteristic information is specially word Physique formula is the corresponding characteristic information of standard digital of liquid crystal number;Wherein, the standard digital include 0,1,2,3,4,5, 6,7,8 and 9.
8. a kind of device of Handwritten Digit Recognition characterized by comprising
Sample acquisition module, for obtaining the handwritten numeral sample that font format is liquid crystal number;
Characteristic matching module, for extracting the characteristic information of each handwritten numeral in the handwritten numeral sample, and by all institutes Characteristic information is stated to match to obtain matching result with standard digital characteristic information;
Identification module, for generating the recognition result of the handwritten numeral sample according to all matching results.
9. a kind of electronic equipment characterized by comprising
Memory, for storing computer program;
Processor, realizing the Handwritten Digit Recognition as described in any one of claim 1 to 7 when for executing the computer program The step of method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes the side of the Handwritten Digit Recognition as described in any one of claim 1 to 7 when the computer program is executed by processor The step of method.
CN201811554544.7A 2018-12-18 2018-12-18 A kind of method, apparatus and associated component of Handwritten Digit Recognition Pending CN109583423A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811554544.7A CN109583423A (en) 2018-12-18 2018-12-18 A kind of method, apparatus and associated component of Handwritten Digit Recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811554544.7A CN109583423A (en) 2018-12-18 2018-12-18 A kind of method, apparatus and associated component of Handwritten Digit Recognition

Publications (1)

Publication Number Publication Date
CN109583423A true CN109583423A (en) 2019-04-05

Family

ID=65930017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811554544.7A Pending CN109583423A (en) 2018-12-18 2018-12-18 A kind of method, apparatus and associated component of Handwritten Digit Recognition

Country Status (1)

Country Link
CN (1) CN109583423A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016425A (en) * 2020-08-21 2020-12-01 上海松鼠课堂人工智能科技有限公司 Handwritten answer recognition method and system based on deep learning
CN112308058A (en) * 2020-10-25 2021-02-02 北京信息科技大学 Method for recognizing handwritten characters
CN112784845A (en) * 2021-01-12 2021-05-11 安徽淘云科技有限公司 Handwritten character detection method, electronic equipment and storage device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1763766A (en) * 2005-11-16 2006-04-26 北京交通大学 Writing and recognizing method and application for promissory hand-written machine-read number
CN103390358A (en) * 2013-07-03 2013-11-13 广东小天才科技有限公司 Method and device for performing standardability judgment of character writing operation of electronic device
CN106951832A (en) * 2017-02-28 2017-07-14 广东数相智能科技有限公司 A kind of verification method and device based on Handwritten Digits Recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1763766A (en) * 2005-11-16 2006-04-26 北京交通大学 Writing and recognizing method and application for promissory hand-written machine-read number
CN103390358A (en) * 2013-07-03 2013-11-13 广东小天才科技有限公司 Method and device for performing standardability judgment of character writing operation of electronic device
CN106951832A (en) * 2017-02-28 2017-07-14 广东数相智能科技有限公司 A kind of verification method and device based on Handwritten Digits Recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马飒飒等: "《物联网基础技术及应用》", 西安电子科技大学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016425A (en) * 2020-08-21 2020-12-01 上海松鼠课堂人工智能科技有限公司 Handwritten answer recognition method and system based on deep learning
CN112016425B (en) * 2020-08-21 2021-04-27 上海松鼠课堂人工智能科技有限公司 Handwritten answer recognition method and system based on deep learning
CN112308058A (en) * 2020-10-25 2021-02-02 北京信息科技大学 Method for recognizing handwritten characters
CN112308058B (en) * 2020-10-25 2023-10-24 北京信息科技大学 Method for recognizing handwritten characters
CN112784845A (en) * 2021-01-12 2021-05-11 安徽淘云科技有限公司 Handwritten character detection method, electronic equipment and storage device

Similar Documents

Publication Publication Date Title
CN109308476B (en) Billing information processing method, system and computer readable storage medium
WO2019238063A1 (en) Text detection and analysis method and apparatus, and device
CN106096538B (en) Face identification method and device based on sequencing neural network model
CN109034050A (en) ID Card Image text recognition method and device based on deep learning
CN104463101B (en) Answer recognition methods and system for character property examination question
CN107729865A (en) A kind of handwritten form mathematical formulae identified off-line method and system
CN108154136B (en) Method, apparatus and computer readable medium for recognizing handwriting
US7929769B2 (en) Script recognition for ink notes
CN106446954A (en) Character recognition method based on depth learning
CN109583423A (en) A kind of method, apparatus and associated component of Handwritten Digit Recognition
CN108664975A (en) A kind of hand-written Letter Identification Method of Uighur, system and electronic equipment
CN108564040A (en) A kind of fingerprint activity test method based on depth convolution feature
Chherawala et al. Combination of context-dependent bidirectional long short-term memory classifiers for robust offline handwriting recognition
CN111523537A (en) Character recognition method, storage medium and system
Asad et al. High performance OCR for camera-captured blurred documents with LSTM networks
CN110458145A (en) A kind of offline person's handwriting Individual Identification System and method based on two-dimentional behavioral characteristics
CN111368632A (en) Signature identification method and device
CN115620312A (en) Cross-modal character handwriting verification method, system, equipment and storage medium
CN111008624A (en) Optical character recognition method and method for generating training sample for optical character recognition
JP7450868B2 (en) Gesture stroke recognition in touch-based user interface input
Xu et al. Automatic facsimile of chinese calligraphic writings
CN114550189A (en) Bill recognition method, device, equipment, computer storage medium and program product
Rahul et al. Deep reader: Information extraction from document images via relation extraction and natural language
CN112861847A (en) OCR image recognition device
US11755687B2 (en) Text independent writer verification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination