CN103235945A - Method for recognizing handwritten mathematical formulas and generating MathML (mathematical makeup language) based on Android system - Google Patents

Method for recognizing handwritten mathematical formulas and generating MathML (mathematical makeup language) based on Android system Download PDF

Info

Publication number
CN103235945A
CN103235945A CN2013101001859A CN201310100185A CN103235945A CN 103235945 A CN103235945 A CN 103235945A CN 2013101001859 A CN2013101001859 A CN 2013101001859A CN 201310100185 A CN201310100185 A CN 201310100185A CN 103235945 A CN103235945 A CN 103235945A
Authority
CN
China
Prior art keywords
character
node
mathml
ternary tree
dimensional array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013101001859A
Other languages
Chinese (zh)
Other versions
CN103235945B (en
Inventor
王少青
胡龙灿
孙怀义
樊爱军
钟琼茹
夏国庆
陆科成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Academy of Science and Technology
Original Assignee
Chongqing Academy of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Academy of Science and Technology filed Critical Chongqing Academy of Science and Technology
Priority to CN201310100185.9A priority Critical patent/CN103235945B/en
Publication of CN103235945A publication Critical patent/CN103235945A/en
Application granted granted Critical
Publication of CN103235945B publication Critical patent/CN103235945B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention provides a method for recognizing handwritten mathematical formulas and generating MathML (mathematical makeup language) based on an Android system. The method includes collecting discrete ordinate sequences of handwritten characters on a drawing board to acquire boundary information of the characters; generating an initial image in a processor, drawing the collected discrete ordinate sequences in the initial image, and cutting the image into character images containing characters only; performing gray-scale processing and binarization processing on the character images; extracting a feature value; performing dimensionality reduction on rows and columns of two-dimensional arrays respectively; performing rough classification according to the feature value, fixing to-be-recognized characters in one class, then using a BP (back propagation) neural network to performing character matching, and acquiring optimal characters; adopting trigeminal tree structure to position spatial relationships of the mathematical formulas, using a preorder algorithm of the trigeminal tree to perform order, and acquiring the MathML; and displaying the MathML on a browser. By the aid of the method for recognizing the handwritten mathematical formulas and generating the MathML based on the Android system, false accept rate can be reduced, and system performances can be improved.

Description

A kind of based on the hand-written mathematical formulae identification of Android system and the method for generation MathML
Technical field
The invention belongs to mode identification technology, relate to intercharacter space Structure Analysis in the mathematical formulae, be specifically related to a kind of hand-written mathematical formulae identification based on the Android system and the method for generation MathML.
Background technology
The Android system is based on the Mobile operating system of increasing income of Linux platform, it is made up of operating system, middleware, user interface and application software, and adopt software to pile the framework of layer, mainly be divided into three parts: bottom is based on linux kernel work, the middle layer comprises function library Library and virtual machine, and the superiors are various application software.Along with popularizing of Android intelligent machine, handwriting recognition is becoming the main mode of intelligent machine input.In recent years, obtained very big progress based on the identification of hand-written mathematical formulae, as the formula identification of Microsoft, Chinese king's e chalk etc., but these softwares mainly run on Windows operating system.
Handwritten Digits Recognition belongs to the branch of optical character identification OCR, is the focus of Recent study.The developing history of Handwritten Digits Recognition can be traced back to 1950's, is accompanied by the appearance of handwriting pad hardware, people's hand script Chinese input equipment character recognition technologies that begins one's study.Along with semiconductor and development of computer and area of pattern recognition is theoretical and method research deepen continuously and perfect, to the later stage eighties, the research of hand script Chinese input equipment character recognition technologies is towards the direction effort of practicality, particularly English, the whole sentence recognition technology of the Complete Infinite system that begun one's study.China's handwriting character recognition system the earliest can be traced back to the beginning of the seventies, among people such as scientific worker's Hu Qiheng academician of Institute of Automation, CAS began to develop the automatic partition letters system that is applied to the post office of Handwritten Digital Recognition system and success in 1974.The eighties, robotization institute of Tsing-Hua University, colleges and universities such as Peking University begin system character recognition technologies is studied and born fruit.
Hand-written character is because different people's stroke difference, even cause same character, different person writings also can be because of font, and custom is different and make a world of difference.The degree of tilt of the thickness of stroke, the size of font, handwritten form, the bird caging distortion of stroke, the difference of font weight etc. all directly have influence on the final recognition effect of character, so System for Handwritten Character Recognition is the challenging problem of tool in the area of pattern recognition.
The most basic target of Handwritten Digits Recognition is to obtain high as far as possible discrimination under the certain speed prerequisite.That is to say that one is speed, the 2nd, discrimination.Being that classic method or some new methods all exist insurmountable identification " dead angle " inevitably at present, is very difficult and adopt single recognition methods to improve discrimination.Therefore, the research tendency of Handwritten Digits Recognition should adopt the multistage matching and recognition method of multiple classifiers combination, namely complements each other in conjunction with obtaining the optimization feature so that several different methods is effective when feature extraction, thereby reduces misclassification rate, improves the performance of OCR system.
In addition, at present, minimum spanning tree and LL(1 are often adopted in the structure analysis of mathematical formulae) syntax, LL (1) syntax are a kind of top-down algorithms that utilize stack architexture, begin to use repeatedly production to derive until deriving the incoming symbol string from the primary sign of the syntax, the minimum spanning tree method is to set up the method for binary tree according to the priority of operational symbol, its thought is earlier the character set of mathematical formulae to be carried out processed, set up binary tree according to priority, make and saved memory headroom in this way, but very high to the requirement of priority.
Summary of the invention
In order to overcome the defective that exists in the above-mentioned prior art, the purpose of this invention is to provide a kind of hand-written mathematical formulae identification based on the Android system and the method for generation MathML, can reduce misclassification rate, improve system performance.
In order to realize above-mentioned purpose of the present invention, the invention provides a kind of hand-written mathematical formulae identification based on the Android system and the method for generation MathML, it is characterized in that, comprise the steps:
S1: gather the discrete coordinate sequence of hand-written character on the plotting sheet, calculate minimum value and the maximal value of described discrete coordinate sequence on coordinate axis X and coordinate axis Y, obtain the boundary information of character;
S2: generate an initial pictures in processor, the discrete coordinate sequence of gathering is drawn in the initial pictures, the boundary information that obtains according to step S1 cuts into initial pictures the character picture that only comprises character;
S3: described character picture is carried out gray scale handle and binary conversion treatment, obtain one by the two-dimensional array of 0 and 1 expression;
S4: use thinning processing and contour extraction method to extract eigenwert;
S5: the row and column to described two-dimensional array carries out dimension-reduction treatment respectively, generates fixing dimension;
S6: the eigenwert that S4 in the described step extracts is carried out rough sort, judge which kind of type character to be identified is contained in, use the BP neural network to carry out character recognition to such character set again, obtain optimum character;
S7: adopt the spatial relationship of ternary tree structure location mathematical formulae, use the first sequence algorithm of ternary tree to sort, obtain MathML;
S8: utilize browser that MathML is shown.
The method that MathML was identified and generated to hand-written mathematical formulae based on the Android system of the present invention can reduce misclassification rate, improves system performance.
Additional aspect of the present invention and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment in conjunction with following accompanying drawing, wherein:
Fig. 1 is the process flow diagram that the present invention is based on the hand-written mathematical formulae identification of Android system and generate the method for MathML;
Fig. 2 is hand-written mathematical formulae in a kind of preferred implementation of the present invention;
Fig. 3 is that the 3*3 of a kind of preferred implementation mid point of the present invention P faces territory figure;
Fig. 4 is the process of setting up of ternary tree in a kind of preferred implementation of the present invention;
Fig. 5 is the hierarchical chart of ternary tree in a kind of preferred implementation of the present invention;
Fig. 6 is the MathML that the order ordering of ternary tree elder generation generates in a kind of preferred implementation of the present invention;
Fig. 7 is that browser is to the demonstration of MathML.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical or similar label is represented identical or similar elements or the element with identical or similar functions from start to finish.Be exemplary below by the embodiment that is described with reference to the drawings, only be used for explaining the present invention, and can not be interpreted as limitation of the present invention.
In description of the invention, unless otherwise prescribed and limit, need to prove that term " installation ", " linking to each other ", " connection " should be done broad understanding, for example, can be mechanical connection or electrical connection, also can be the connection of two element internals, can be directly to link to each other, and also can link to each other indirectly by intermediary, for the ordinary skill in the art, can understand the concrete implication of above-mentioned term as the case may be.
The invention provides a kind of hand-written mathematical formulae identification based on the Android system and the method for generation MathML, as shown in Figure 1, comprise the steps:
The first step: as shown in Figure 2, hand-written mathematical formulae on plotting sheet, processor is gathered the discrete coordinate sequence of hand-written character on the plotting sheet, calculates minimum value and the maximal value of this discrete coordinate sequence on coordinate axis X and coordinate axis Y then, obtains the boundary information of character.
Second step: generate an initial pictures in processor, the discrete coordinate sequence of gathering is drawn in the initial pictures, the boundary information that obtains according to the first step becomes only to comprise the character picture of character with image cut, and this character picture is less than initial pictures.
The 3rd step: described character picture is carried out gray scale handle and binary conversion treatment, obtain one by the two-dimensional array of 0 and 1 expression, in the present embodiment, obtain behind the two-dimensional array this two-dimensional array being removed the processing of making an uproar, namely some the flaw point in the two-dimensional array is handled, remove the flaw point in the two-dimensional array, in the present embodiment, the determination methods of flaw point is: if with four points that point is adjacent, namely go up, down, a left side, all the value with this point is different for the value of right four points, then this is the flaw point, the method of removing flaw point is that the value of flaw point is carried out conversion, is about to 0 and becomes 1, or become 0 with 1.
The 4th step: use thinning processing and contour extraction method to extract eigenwert.In the present embodiment, concrete steps are:
S31: use thinning algorithm that two-dimensional array is carried out thinning processing.In the present embodiment, use thinning algorithm that two-dimensional array is carried out thinning processing, specifically can adopt the Hilditch thinning algorithm that two-dimensional array is carried out thinning processing, concrete grammar is: with two-dimensional array from left to right from the top down the value of each point of iteration as an iteration cycle, in each iteration cycle, for each some p, if it satisfies threshold condition simultaneously, then mark it, when the current iteration end cycle, the value of underlined point is made as 0, if there is not gauge point in certain iteration cycle, then algorithm finishes, in conjunction with shown in Figure 3,8 points of the 3*3 neighborhood of some p are starting point with the point on a p right side, be x1 according to counterclockwise order number consecutively, x2, x3, x4, x5, x6, x7, x8, described threshold condition comprise following 6 points:
1) value of some p is 1;
2) value of four points in upper and lower, left and right of some p all is not 1, in the present embodiment, namely puts x1, x3, and x5, the value of x7 all is not 1;
3) to have the value of 2 points at least be 1 in the 3*3 neighborhood of some p, in the present embodiment, i.e. x1, x2, x3, x4, x5, x6, x7, the value that has 2 points among the x8 at least is 1;
4) the 8 connection associated numbers of some p are 1, described associated number refers in the 3*3 of p neighborhood, the number of the graphics component that is connected with a p, in Fig. 3, the grey point value is 1, and white point value is that the 4 connection associated numbers of left figure are 2 among 0, Fig. 3,8 connection associated numbers are 1, and 4 UNICOM's associated numbers of right figure and 8 UNICOM's associated numbers all are 2.In the present embodiment, 4 connection associated number computing formula are:
N C 4 ( P ) = Σ i = 1 4 ( x 2 i - 1 - x 2 i - 1 x 2 i x 2 i + 1 )
8 are communicated with the associated number computing formula is:
N c 8 ( p ) = Σ i = 1 4 ( x ‾ 2 i - 1 - x ‾ 2 i - 1 x ‾ 2 i x ‾ 2 i + 1 )
Wherein, x ‾ = 1 - x
Wherein, x is the value of a P:
5) the x3 tag delete of setting up an office, when x3 was 0,8 UNICOM's associated numbers of p were 1 so;
6) postulated point x5 tag delete, when x5 was 0,8 UNICOM's associated numbers of p were 1 so.
By to the two-dimensional array thinning processing, can obtain following feature:
Cusp: namely 8 neighborhood numbers are 1 point;
Three point of crossing: namely 8 neighborhood numbers are 3 point;
Four point of crossing: namely 8 neighborhood numbers are 4 point.
S32: two-dimensional array is extracted profile, and described profile is judged according to following formula:
p = | p - p 3 | , | p - p 3 | &GreaterEqual; | p 1 - p 2 | | p 1 - p 2 | , | p - p 3 | < | p 1 - p 2 | ,
Making the two-dimensional array after image thinning is handled is G[h] [w], p, p 1, p 2, p 3The position close and to be:
p=G[i][j],p1=G[i][j+1],p2=G[i+1][j],p3=G[i+1][j+1],
Wherein, h is line number, and w is columns, 0≤i<h, 0≤j<w; By the two-dimensional array profile is extracted, can obtain the inner circular number is 0,1 or 2.In the present embodiment, inner circular also is not required for the circle of standard, is the ring texture that 1 point is formed by adjacent value just, and the value of the point in this ring texture is 0.
S33: the statistics ray penetrates the number feature: method is for to add up the two-dimensional array behind the image thinning, gets the of two-dimensional array respectively
Figure BDA00002971944800071
The data of row, the
Figure BDA00002971944800072
The data of row, the
Figure BDA00002971944800073
The data of row, the
Figure BDA00002971944800074
The data of row, each row that statistics is taken out or the point value of each row are 1 number;
S34: statistics density feature: method is for to be divided into 6 with described two-dimensional array, and triplex row two is listed as,
Statistics two-dimensional array intermediate value is 1 number N;
Value is 1 number N in each piece zone that statistics is divided i, i=1,2 ... .6,
Then the density of each piece is:
&rho; ( i ) = 0 , Ni / N < 0.1 , N > 0 1 , Ni / N &GreaterEqual; 0.1 , N > 0 ; i = 1 &CenterDot; &CenterDot; &CenterDot; 6 .
The 5th step: the row and column to two-dimensional array carries out dimension-reduction treatment respectively, generates fixing dimension.In the present embodiment, specifically may further comprise the steps:
S51: two-dimensional array is carried out dimension-reduction treatment, represents with fixing dimension, in the present embodiment, with the row and column equal proportion of two-dimensional array be divided into 35, every row comprises 5, every row comprise 7, the dimension behind the two-dimensional array dimensionality reduction is 5 row, 7 row;
S52: generate new two-dimensional array and each piece in the new two-dimensional array is carried out assignment, concrete assignment method is: if 1 number is greater than 1/3 of this part sum in a certain, the value that then makes this piece is 1, otherwise is 0;
S53: the value in the two-dimensional array is linked up the template that generates regular length, and the dimension of array determines behind the length of template and the dimensionality reduction, and in the present embodiment, the length of template is 35.
The 6th step: carry out rough sort according to the eigenwert that the 4th step extracted, characteristic according to eigenwert is classified to the character set of having trained, judge which kind of type character to be identified is contained in, the character that will identify is fixed to earlier in a certain class, use the BP neural network to carry out character recognition to such character set again, obtaining optimum character, specifically is to adopt the template of the character set that the BP neural network obtains rough sort to give training, identifies the character of similarity maximum.In the present embodiment, rough sort refers to that according to the cusp number after the thinning processing inner circular number that profile extracts and the parameters such as the ratio of width to height of character are classified to eigenwert.
The 7th step: adopt the spatial relationship of ternary tree structure location mathematical formulae, use the first sequence algorithm of ternary tree to sort, obtain MathML.The ternary tree node is the data structure with storage information (info) and three child nodes, and in recent years, the ternary tree structure has all obtained application in a lot of fields.Adopt the structure storage mathematical formulae of ternary tree to make that the priority reduction of the operational symbol process of contributing simultaneously is simpler, the JAVA language representation of ternary tree node is as follows:
Figure BDA00002971944800081
In the present embodiment, the link structure of ternary tree node is as shown in table 1, wherein info is the information of node, llink, mlink, rlink are three pointers, be respectively left pointer, middle pointer, right pointer, the left child node of left side pointed node, the middle child node of middle pointed node, the right child node of right pointed node.
The link structure of table 1. ternary tree node
info llink mlink rlink
In the present embodiment, when adopting the spatial relationship of ternary tree structure location mathematical formulae, determine that the pointing method of ternary tree pointer is:
1) when existence in the spatial relationship of mathematical formulae comprises structure, the described border that comprises the occlusion body in the structure comprises other characters, if all there is involved character the left and right sides of occlusion body, the left pointed storage information that then with storage information is the ternary tree node of occlusion body is the left child node of described occlusion body left side character, the middle child node that the middle pointed storage information that with storage information is the ternary tree node of occlusion body is described occlusion body right side character.For example, radical sign "
Figure BDA00002971944800091
" structure; the square boundary of radical sign comprises character " 5 "; " 3 ", and the left pointed storage information that then with storage information is the trident node of radical sign is the trident node of " 5 ", and just storage information is that the middle pointed storage information of the trident node of radical sign is the trident node of " 3 ".Need to prove, only with the left and right sides of occlusion body involved character being arranged all in the present embodiment is that example describes, when if occlusion body has only a side that involved character is arranged, for example have only the right side that involved character is arranged, the right pointed storage information that then with storage information is the ternary tree node of occlusion body is the right child node of occlusion body left side character.
2) when having the up-down structure that is divided into by separator in the spatial relationship of described mathematical formulae, the left pointed storage information that with storage information is the ternary tree node of separator is the left child node of the character of the leftmost side, described separator top, and the middle pointed storage information that with storage information is the ternary tree node of separator is the middle child node of the character of the leftmost side, described separator below.As fraction "-", summation " ∑ " etc., being the trident node at the most left character place, the left pointed fraction top of trident node of fraction "-" with storage information, is the trident node at the most left character place, the right pointed fraction below of trident node of fraction "-" with storage information.
3) when the subscript structure is gone up in existence in the spatial relationship of described mathematical formulae, be that the left pointed storage information that is marked the ternary tree node of character is last target left side child node with storage information, be to be marked the middle pointed storage information of ternary tree node of character for child node in the following target ternary tree with storage information, be marked the character that character refers to have last subscript structure.For example "
Figure BDA00002971944800101
", being marked character is " a ", is the trident node at left pointed subscript " 2 " place of the trident node of " a " with storage information, with the trident node at right pointed subscript " 2 " place of its storage information trident node that is " a ".In the present embodiment, if subscript or subscript are only arranged, then only stipulate that according to above-mentioned method the sensing of left pointer or right pointer gets final product.
4) when having left and right sides structure in the spatial relationship of described mathematical formulae, the right child node that the right pointed storage information that with storage information is the ternary tree node of left side character is the right side character.For example " ab " is that the right pointed storage information of the trident node of " a " is the trident node of " b " with storage information.
Vital role has been played in searching for the foundation of ternary tree of root node, and in the present embodiment, definite method of ternary tree root node is:
When not having up-down structure in the spatial relationship of the leftmost side of mathematic(al) representation character or comprising structure, then described leftmost side character is root node;
When up-down structure being arranged in the spatial relationship of the leftmost side of mathematic(al) representation character or comprises structure, then the described leftmost side of first mark character is root node, check then and comprise the up-down structure of described leftmost side character or comprise whether specific character is arranged in the structure, the kind of described specific character comprises fraction, radical sign or summation symbol, if have, then root node changes specific character into, otherwise root node is constant.
By the character that identifies is carried out structure analysis, search root node, judge the spatial relationship between the character, after the pointed of determining good root node and each node, set up the spatial relationship of ternary tree store character, in the present embodiment, as shown in Figure 4, the method for setting up ternary tree is:
S81:: empty queue Q of initialization, the root node of searching is put in the tail of the queue of formation Q, formation this moment Q has only a trident node;
S82: take out a trident node from team's head of formation Q, be designated as N, the character in the storage information of node N is carried out spatial relationship judge, judge whether to exist the described structure that comprises, up-down structure or go up the subscript structure, if having, execution in step S83 then, if do not have, then judge whether to exist described left and right sides structure, if having, then carry out and change step S84 over to, if do not have, then execution in step S85;
S83: according to left pointer and its child node of middle pointed with described trident node N of the method shown in the claim 6, simultaneously its child node is entered formation Q from tail of the queue successively, character in the storage information of trident node N is carried out spatial relationship to be judged, judge whether to exist described left and right sides structure, if have, if execution in step S84 is nothing, then execution in step S85;
S84: according to the pointing method of ternary tree pointer during the subscript structure in the existence shown in the described claim 6, its sub-trident node of right pointed with trident node N enters its child node the formation Q from tail of the queue simultaneously, returns step S82;
S85: judge whether formation Q is empty, if formation Q is not empty, then returns step S82, if formation is empty, then destroys formation, and algorithm finishes.Child node and the father node of each layer of performance ternary tree node of the ternary tree clear layer of building up as shown in Figure 5.
In the present embodiment, use the first sequence algorithm of the ternary tree method that obtains MathML that sorts to be:
If some ternary tree nodes have left child (left pointer is not for empty) or middle child (middle pointer is not for empty), then check the type of the character of this ternary tree node storage earlier, as fraction, radical sign, summation etc., generate different MathML SGMLs according to different type, left child and the middle child of ternary tree node put in the MathML SGML, the right child of ternary tree node is placed on outside the MathML SGML of ternary tree node.In the present embodiment, the MathML language of fraction '-' is<mfrac〉</mfrac 〉, the MathML language tag of radical sign is<mroot〉</mroot 〉, Fig. 6 is the last MathML file that generates in a kind of preferred implementation of the present invention.
The 8th step: as shown in Figure 7, browser shows MathML.
The method that MathML was identified and generated to hand-written mathematical formulae based on the Android system of the present invention can reduce misclassification rate, improves system performance.
In the description of this instructions, concrete feature, structure, material or characteristics that the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example description are contained at least one embodiment of the present invention or the example.In this manual, the schematic statement to above-mentioned term not necessarily refers to identical embodiment or example.And concrete feature, structure, material or the characteristics of description can be with the suitable manner combination in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, those having ordinary skill in the art will appreciate that: can carry out multiple variation, modification, replacement and modification to these embodiment under the situation that does not break away from principle of the present invention and aim, scope of the present invention is limited by claim and equivalent thereof.

Claims (9)

1. one kind based on the hand-written mathematical formulae identification of Android system and generate the method for MathML, it is characterized in that, comprises the steps:
S1: gather the discrete coordinate sequence of hand-written character on the plotting sheet, calculate minimum value and the maximal value of described discrete coordinate sequence on coordinate axis X and coordinate axis Y, obtain the boundary information of character;
S2: generate an initial pictures in processor, the described discrete coordinate sequence of gathering is drawn in the initial pictures, the boundary information that obtains according to step S1 cuts into described initial pictures the character picture that only comprises character;
S3: described character picture is carried out gray scale handle and binary conversion treatment, obtain one by the two-dimensional array of 0 and 1 expression;
S4: use thinning processing and contour extraction method to extract eigenwert;
S5: the row and column to described two-dimensional array carries out dimension-reduction treatment respectively, generates fixing dimension;
S6: the eigenwert that S4 in the described step extracts is carried out rough sort, judge which kind of type character to be identified is contained in, use the BP neural network to carry out character recognition to such character set again, obtain optimum character;
S7: adopt the spatial relationship of ternary tree structure location mathematical formulae, use the first sequence algorithm of ternary tree to sort, obtain MathML;
S8: browser shows MathML.
2. as claimed in claim 1 based on the hand-written mathematical formulae identification of Android system and the method for generation MathML, it is characterized in that, obtain also having following steps behind the two-dimensional array by 0 and 1 expression among the described step S3: described two-dimensional array is removed the processing of making an uproar, remove the flaw point in the described two-dimensional array.
3. as claimed in claim 1 based on the hand-written mathematical formulae identification of Android system and the method for generation MathML, described step S4 is characterized in that the step that described use thinning processing and contour extraction method extract eigenwert is:
S31: use thinning algorithm that described two-dimensional array is carried out thinning processing;
S32: described two-dimensional array is extracted profile, and described profile is judged according to following formula:
p = | p - p 3 | , | p - p 3 | &GreaterEqual; | p 1 - p 2 | | p 1 - p 2 | , | p - p 3 | < | p 1 - p 2 | ,
Making the two-dimensional array after image thinning is handled is G[h] [w], p, p 1, p 2, p 3The position close and to be:
p=G[i][j],p1=G[i][j+1],p2=G[i+1][j],p3=G[i+1][j+1],
Wherein, 0≤i<h, 0≤j<w;
S33: the statistics ray penetrates the number feature: method is for to add up the two-dimensional array behind the image thinning, gets the of two-dimensional array respectively
Figure FDA00002971944700022
The data of row, the
Figure FDA00002971944700023
The data of row, the
Figure FDA00002971944700024
The data of row, the
Figure FDA00002971944700025
The data of row, each row that statistics is taken out or the point value of each row are 1 number;
S34: statistics density feature: method is for to be divided into 6 with described two-dimensional array, and triplex row two is listed as,
Statistics two-dimensional array intermediate value is 1 number N;
Value is 1 number N in each piece zone that statistics is divided i, i=1,2 ... .6,
Then the density of each piece is:
&rho; ( i ) = 0 , Ni / N < 0.1 , N > 0 1 , Ni / N &GreaterEqual; 0.1 , N > 0 ; i = 1 &CenterDot; &CenterDot; &CenterDot; 6 .
4. as claimed in claim 3 based on the hand-written mathematical formulae identification of Android system and the method for generation MathML, it is characterized in that, use thinning algorithm to the method that described two-dimensional array carries out thinning processing to be: with two-dimensional array from left to right from the top down the value of each point of iteration as an iteration cycle, in each iteration cycle, for each some p, if it satisfies threshold condition simultaneously, then mark it, when the current iteration end cycle, the value of underlined point is made as 0, if there is not gauge point in certain iteration cycle, then algorithm finishes; Described threshold condition is:
1) value of some p is 1;
2) value of four points in upper and lower, left and right of some p all is not 1;
3) having the value of 2 points at least in the 3*3 neighborhood of some p is 1, and 8 points of the 3*3 neighborhood of described some p are starting point with the point on a p right side, are x1 according to counterclockwise order number consecutively, x2, x3, x4, x5, x6, x7, x8;
4) the 8 connection associated numbers of some p are 1, and described associated number refers in the 3*3 of p neighborhood, the number of the graphics component that is connected with p, and described 8 are communicated with the associated number computing formula is:
N c 8 ( p ) = &Sigma; i = 1 4 ( x &OverBar; 2 i - 1 - x &OverBar; 2 i - 1 x &OverBar; 2 i x &OverBar; 2 i + 1 )
Wherein, x &OverBar; = 1 - x
Wherein, x is the value of a p;
5) establish x3 tag delete, when x3 was 0,8 UNICOM's associated numbers of p were 1 so;
6) suppose x5 tag delete, when x5 was 0,8 UNICOM's associated numbers of p were 1 so.
As claimed in claim 1 based on the Android system the identification of hand-written mathematical formulae and generate the method for MathML, it is characterized in that, the row and column of described two-dimensional array is carried out dimension-reduction treatment respectively, generate fixing dimension and may further comprise the steps:
S51: two-dimensional array is carried out dimension-reduction treatment, represent with fixing dimension;
S52: generate new two-dimensional array and each piece in the new two-dimensional array is carried out assignment;
S53: the value in the two-dimensional array is linked up the template that generates regular length.
6. as claimed in claim 1 based on the hand-written mathematical formulae identification of Android system and the method for generation MathML, it is characterized in that, in described step S7, when adopting the spatial relationship of ternary tree structure location mathematical formulae, determine that the pointing method of ternary tree pointer is:
1) when existence in the spatial relationship of described mathematical formulae comprises structure, the described border that comprises the occlusion body in the structure comprises other characters, if all there is involved character the left and right sides of occlusion body, the left pointed storage information that then with storage information is the ternary tree node of occlusion body is the left child node of described occlusion body left side character, the middle child node that the middle pointed storage information that with storage information is the ternary tree node of occlusion body is described occlusion body right side character;
2) when having the up-down structure that is divided into by separator in the spatial relationship of described mathematical formulae, the left pointed storage information that with storage information is the ternary tree node of separator is the left child node of the character of the leftmost side, described separator top, and the middle pointed storage information that with storage information is the ternary tree node of separator is the middle child node of the character of the leftmost side, described separator below;
3) when the subscript structure is gone up in existence in the spatial relationship of described mathematical formulae, being that the left pointed storage information that is marked the ternary tree node of character is last target left side child node with storage information, is that the middle pointed storage information that is marked the ternary tree node of character is child node in the target ternary tree down with storage information;
4) when having left and right sides structure in the spatial relationship of described mathematical formulae, the right child node that the right pointed storage information that with storage information is the ternary tree node of left side character is the right side character.
7. as claimed in claim 6 based on the hand-written mathematical formulae identification of Android system and the method for generation MathML, it is characterized in that definite method of described ternary tree root node is:
When not having up-down structure in the spatial relationship of the leftmost side of mathematic(al) representation character or comprising structure, then described leftmost side character is root node;
When up-down structure being arranged in the spatial relationship of the leftmost side of mathematic(al) representation character or comprises structure, then the described leftmost side of first mark character is root node, check then and comprise the up-down structure of described leftmost side character or comprise whether specific character is arranged in the structure, the kind of described specific character comprises fraction, radical sign or summation symbol, if have, then root node changes specific character into, otherwise root node is constant.
As claim 6 or 7 described based on the Android system the identification of hand-written mathematical formulaes and generate the method for MathML, it is characterized in that the method that ternary tree is set up is:
S81: empty queue Q of initialization, put the root node of searching in the tail of the queue of formation Q;
S82: take out a trident node from team's head of formation Q, be designated as N, the character in the storage information of node N is carried out spatial relationship judge, judge whether to exist the described structure that comprises, up-down structure or go up the subscript structure, if having, execution in step S83 then, if do not have, then judge whether to exist described left and right sides structure, if having, then carry out and change step S84 over to, if do not have, then execution in step S85;
S83: according to left pointer and its child node of middle pointed with described trident node N of the method shown in the claim 6, simultaneously its child node is entered formation Q from tail of the queue successively, character in the storage information of trident node N is carried out spatial relationship to be judged, judge whether to exist described left and right sides structure, if have, if execution in step S84 is nothing, then execution in step S85;
S84: according to the pointing method of ternary tree pointer during the subscript structure in the existence shown in the described claim 6, its sub-trident node of right pointed with trident node N enters its child node the formation Q from tail of the queue simultaneously, returns step S82;
S85: judge whether formation Q is empty, if formation Q is not empty, then returns step S82, if formation is empty, then destroys formation, and algorithm finishes.
As claim 6 or 7 described based on the Android system the identification of hand-written mathematical formulaes and generate the method for MathML, it is characterized in that, in described step S7, use the first sequence algorithm of the ternary tree method that obtains MathML that sorts to be:
If some ternary tree nodes have left child or middle child, then check the type of the character of described ternary tree node storage earlier, generate different MathML SGMLs according to different types, left child and the middle child of described ternary tree node are put in the described MathML SGML, the right child of described ternary tree node is placed on outside the MathML SGML of described ternary tree node.
CN201310100185.9A 2013-03-27 2013-03-27 A kind of method of hand-written mathematical formulae identification based on android system and generation MathML Active CN103235945B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310100185.9A CN103235945B (en) 2013-03-27 2013-03-27 A kind of method of hand-written mathematical formulae identification based on android system and generation MathML

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310100185.9A CN103235945B (en) 2013-03-27 2013-03-27 A kind of method of hand-written mathematical formulae identification based on android system and generation MathML

Publications (2)

Publication Number Publication Date
CN103235945A true CN103235945A (en) 2013-08-07
CN103235945B CN103235945B (en) 2016-03-23

Family

ID=48883984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310100185.9A Active CN103235945B (en) 2013-03-27 2013-03-27 A kind of method of hand-written mathematical formulae identification based on android system and generation MathML

Country Status (1)

Country Link
CN (1) CN103235945B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750667A (en) * 2015-03-12 2015-07-01 广东欧珀移动通信有限公司 Image content processing method and mobile terminal
CN104992173A (en) * 2015-06-03 2015-10-21 北京好运到信息科技有限公司 Symbol recognition method and system used for medical report
CN108319724A (en) * 2018-02-28 2018-07-24 北京仁和汇智信息技术有限公司 A kind of Homepage Publishing method and device with formula file
CN110135425A (en) * 2018-02-09 2019-08-16 北京世纪好未来教育科技有限公司 Sample mask method and computer storage medium
CN113095314A (en) * 2021-04-07 2021-07-09 科大讯飞股份有限公司 Formula identification method and device, storage medium and equipment
CN113743315A (en) * 2021-09-07 2021-12-03 电子科技大学 Handwritten elementary mathematical formula recognition method based on structure enhancement

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060062470A1 (en) * 2004-09-22 2006-03-23 Microsoft Corporation Graphical user interface for expression recognition
CN101329731A (en) * 2008-06-06 2008-12-24 南开大学 Automatic recognition method pf mathematical formula in image
CN101388068A (en) * 2007-09-12 2009-03-18 汉王科技股份有限公司 Mathematical formula identifying and coding method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060062470A1 (en) * 2004-09-22 2006-03-23 Microsoft Corporation Graphical user interface for expression recognition
CN101388068A (en) * 2007-09-12 2009-03-18 汉王科技股份有限公司 Mathematical formula identifying and coding method
CN101329731A (en) * 2008-06-06 2008-12-24 南开大学 Automatic recognition method pf mathematical formula in image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
贺强: "字符识别的相关方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 8, 15 August 2010 (2010-08-15) *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750667A (en) * 2015-03-12 2015-07-01 广东欧珀移动通信有限公司 Image content processing method and mobile terminal
CN104992173A (en) * 2015-06-03 2015-10-21 北京好运到信息科技有限公司 Symbol recognition method and system used for medical report
CN104992173B (en) * 2015-06-03 2018-08-17 北京拍医拍智能科技有限公司 Symbol Recognition and system for medical report list
CN110135425A (en) * 2018-02-09 2019-08-16 北京世纪好未来教育科技有限公司 Sample mask method and computer storage medium
CN108319724A (en) * 2018-02-28 2018-07-24 北京仁和汇智信息技术有限公司 A kind of Homepage Publishing method and device with formula file
CN108319724B (en) * 2018-02-28 2019-04-09 北京仁和汇智信息技术有限公司 A kind of Homepage Publishing method and device with formula file
CN113095314A (en) * 2021-04-07 2021-07-09 科大讯飞股份有限公司 Formula identification method and device, storage medium and equipment
CN113743315A (en) * 2021-09-07 2021-12-03 电子科技大学 Handwritten elementary mathematical formula recognition method based on structure enhancement
CN113743315B (en) * 2021-09-07 2023-07-14 电子科技大学 Handwriting elementary mathematical formula identification method based on structure enhancement

Also Published As

Publication number Publication date
CN103235945B (en) 2016-03-23

Similar Documents

Publication Publication Date Title
CN110738207A (en) character detection method for fusing character area edge information in character image
CN105608454B (en) Character detecting method and system based on text structure component detection neural network
CN103235945B (en) A kind of method of hand-written mathematical formulae identification based on android system and generation MathML
CN107392141B (en) Airport extraction method based on significance detection and LSD (least squares distortion) line detection
CN101290659B (en) Hand-written recognition method based on assembled classifier
CN105469047A (en) Chinese detection method based on unsupervised learning and deep learning network and system thereof
Agnihotri Offline handwritten Devanagari script recognition
CN105139041A (en) Method and device for recognizing languages based on image
CN101763516A (en) Character recognition method based on fitting functions
CN104573685A (en) Natural scene text detecting method based on extraction of linear structures
CN102663454A (en) Method and device for evaluating character writing standard degree
CN102968619B (en) Recognition method for components of Chinese character pictures
Afakh et al. Aksara jawa text detection in scene images using convolutional neural network
CN104573683A (en) Character string recognizing method and device
CN102136074A (en) Man-machine interface (MMI) based wood image texture analyzing and identifying method
CN105718934A (en) Method for pest image feature learning and identification based on low-rank sparse coding technology
CN101488182B (en) Image characteristics extraction method used for handwritten Chinese character recognition
Roy et al. A new quad tree based feature set for recognition of handwritten bangla numerals
Aravinda et al. Template matching method for Kannada handwritten recognition based on correlation analysis
CN105844299A (en) Image classification method based on bag of words
CN104992161B (en) A kind of Hanzi component segmentation and structural determination method based on part identification
Naz et al. Challenges in baseline detection of cursive script languages
CN104063705A (en) Handwriting feature extracting method and device
CN112633116A (en) Method for intelligently analyzing PDF (Portable document Format) image-text
CN111612045A (en) Universal method for acquiring target detection data set

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant