CN105184292A

CN105184292A - Method for analyzing and recognizing structure of handwritten mathematical formula in natural scene image

Info

Publication number: CN105184292A
Application number: CN201510531070.4A
Authority: CN
Inventors: 陈李江; 刘宁; 刘辉
Original assignee: Beijing Yun Jiang Science And Technology Ltd
Current assignee: Beijing Yun Jiang Science And Technology Ltd
Priority date: 2015-08-26
Filing date: 2015-08-26
Publication date: 2015-12-23
Anticipated expiration: 2035-08-26
Also published as: CN105184292B

Abstract

The invention provides a method for analyzing and recognizing the structure of a handwritten mathematical formula in a natural scene image. The method comprises the steps of S1, converting the gray matrix of a natural scene image into a local contrast matrix, and conducting the binary classification on the local contrast matrix based on the otsu method to obtain a binary matrix; S2, analyzing the connected domains of the binary matrix obtained in the step S1, and removing non-character type connected domains to obtain character type connected domains; S3, detecting formula structural elements and other special structural elements in the character type connected domains based on the correlation coefficient method, and separately marking out all detected special structural elements; S4, dividing the binary matrix obtained in the step S1 based on the horizontal projection method; S5, recognizing each character type connected domain via a convolutional neural network; S6, defining an output sequence and outputting recognized results according to the corresponding sequence in the latex layout format. According to the technical scheme of the invention, by means of the method, the expression problem of elementary mathematical formulas during the OCR recognition process can be effectively solved.

Description

The structure analysis of handwritten form mathematical formulae and recognition methods in natural scene image

Technical field

The present invention relates to image processing and pattern recognition, particularly relate to the structure analysis of handwritten form mathematical formulae and knowledge method for distinguishing in natural scene image.

Background technology

OCR (OpticalCharacterRecognition, optical character identification) technology has a wide range of applications, for Chinese and English OCR recognition technology all comparative maturities, but at present for this situation having complicated structure of mathematical formulae, current OCR technology is not well supported, the present invention solves emphatically the problem that this has very strong application demand.

Summary of the invention

The structure analysis of handwritten form mathematical formulae and recognition methods in natural scene image provided by the invention, can solve the problem of representation of elementary mathematics formula in OCR identification effectively.

The structure analysis of handwritten form mathematical formulae and recognition methods in natural scene image of the present invention, comprising:

Step S1: the gray matrix of natural scene image is converted to local contrast matrix, uses otsu (Otsu threshold) method to carry out two-value division to the local contrast matrix obtained, obtains two values matrix;

Step S2: carry out connected domain analysis to two values matrix in step S1, rejects non-character connected domain, obtains character connected domain;

Step S3: adopt correlation coefficient process to carry out formula special construction Element detection to the character connected domain in step S2, and all special construction elements detected are marked separately;

Step S4: adopt horizontal projection method to the capable division of the two values matrix in step S1;

Step S5: adopt convolutional neural networks to identify each character connected domain;

Step S6: definition output order, by the order of recognition result according to correspondence, exports with latex (composing system based on Τ Ε Χ) typesetting format.

Preferably, in described local contrast matrix coordinate to be local contrast Con (i, the j) computing formula of the point of (i, j) be:

Con(i,j)＝αC(i,j)+(1-α)(I _max(i,j)-I _min(i,j))

Wherein,

I _max(i, j) and I _min(i, j) be respectively in the gray matrix of image with coordinate be (i, j) point centered by the maximum gradation value of neighborhood and minimum gradation value, the radius that we arrange neighborhood is herein 5;

std represents the standard deviation of gray matrix, γ=1.

C (i, j) = \frac{I_{\max} (i, j) - I_{\min} (i, j)}{I_{\max} (i, j) + I_{\min} (i, j) + ϵ},

ε be prevent denominator be 0 dimensionless.

Preferably, use otsu method to the method that the local contrast matrix obtained carries out two-value division is: get maximal value and minimum value in local contrast matrix, n part minizone will be divided between maximal value and minimum value, each element is divided in the minizone of its correspondence, form histogram, otsu division is carried out on this histogram basis, and the point being less than selected threshold value is background dot, and the point being greater than selected threshold value is character point.

Preferably, carry out connected domain analysis to two values matrix in step S1, reject non-character connected domain, the method obtaining character connected domain is::

Step S201: the minimum outsourcing rectangle obtaining connected domain, records the coordinate on four summits of this minimum outsourcing rectangle, calculates length and the height of minimum outsourcing rectangle;

Step 202: average length and the height of adding up all connected domains;

Step S203: the rejecting carrying out non-character connected domain:

If the length of certain connected domain and be highly all less than average length and height 1/4, then think that it is noise spot, weed out this connected domain;

If the length of certain connected domain and be highly all greater than average length and height 4 times, then think that it is the non-character part in image, weed out this connected domain;

Step S204: preserve residue connected domain as character connected domain.

Preferably, the special construction of formula described in step S3 element comprises braces, radical sign, fraction line;

Adopt rule match method separable type line connected domain to detect: select connected domain length be greater than 5 with the ratio of width and need there be the connected domain of adjacent connected domain the upper and lower of connected domain, and this connected domain is designated fraction line connected domain;

Template matching method is adopted to detect for braces connected domain and radical sign connected domain:

Step S301: the standard two-value template selecting braces connected domain and radical sign connected domain;

Step S302: the size of current connected domain standardized, makes its size the same with standard form;

Step S303: standard two-value template is mated with current connected domain respectively,

The formula of coupling is formula of correlation coefficient, is expressed as:

r = \frac{Σ_{i = 1}^{n} (x_{i} - \overset{&OverBar;}{x}) (y_{i} - \overset{&OverBar;}{y})}{\sqrt{Σ_{i = 1}^{n} {(x_{i} - \overset{&OverBar;}{x})}^{2} \cdot Σ_{i = 1}^{n} {(y_{i} - \overset{&OverBar;}{y})}^{2}}}

Wherein, x _iand y _irepresent the value of i-th element in current template and standard form respectively, with represent the average of current template and standard form respectively; R ∈ (0,1), when r value is greater than 0.5, the match is successful.

Preferably, the method for horizontal projection method to the capable division of two values matrix is adopted to be in step S4:

Obtain oscillogram after carrying out horizontal projection to the two values matrix in step S1, the value of oscillogram horizontal ordinate is the line number of original image, the number of the character point that the value of ordinate comprises for current line;

From each crest of oscillogram to expanding about it, until when numerical value is less than 0.1 times of its crest value, stop expansion; If there occurs overlap during adjacent two crests expansion, then two row of its correspondence merge into a line;

Record the starting and ending position of every a line, the horizontal ordinate that crest left end is corresponding is the initial row coordinate of current line, and the horizontal ordinate that crest right-hand member is corresponding is the end line coordinate of current line.

Preferably, after obtaining the starting and ending positional information of every a line, each character connected domain is corresponding with row, and concrete grammar is:

Calculate the distance of the horizontal coordinate at each character connected domain center and the horizontal coordinate at each line of text center, character connected domain is divided into apart from that minimum a line.

Preferably, the structure of the convolutional neural networks in step S4 is Lenet-5 structure, and this convolutional neural networks is made up of an input layer, two Convolution sums down-sampling layers, a full connection hidden layer and output layers;

The training data of described convolutional neural networks is the sample of the character connected domain after standardization;

Input convolutional neural networks by after the character connected domain standardization in step S2, obtain the character that each character connected domain is corresponding.

Preferably, the output order of step S6 definition comprises three layers:

Ground floor ordinal relation is line order relation: according to the corresponding relation of character connected domain with row, export corresponding character connected domain by row;

Second layer ordinal relation is row order relations: in every a line, and all character connected domains carry out ascending sort according to its left end row coordinate;

Third layer ordinal relation is the sequence relation in formula special construction: in system of equations, element exports according to each equation; Fraction element exports according to the form of first molecule, rear denominator.

Preferably, for the sequence relation in formula special construction, need the character block determining that each formula special construction element comprises;

For braces, representative be this special construction of system of equations, need the row coordinate determining that system of equations terminates, thus determine its all character blocks comprised; According to the position of current line residing for character block, be divided into " top, middle part, bottom " three parts, every character block being positioned at upper and lower, all think the element in system of equations, find out all such character blocks, using the end column of the wherein character block of low order end as the end column of whole system of equations; Every all character blocks being positioned at braces and system of equations end column, are all divided in current system of equations structure; Division gone again to system of equations inside configuration, determines that it is inner containing several equation, the character block of system of equations inside is exported according to equation order;

For fraction line, need to determine all molecules of current fraction and point parent element, every initial ordinate is greater than the initial ordinate of fraction line, and end ordinate is less than the character block of fraction toe-in bundle ordinate, is all divided in current fraction structure; Character block in separable type structure, needs to determine that it is molecule or denominator further, determines that mode determines according to the horizontal ordinate of character block: if if character block bottom transverse coordinate is less than fraction line center horizontal ordinate, then it belongs to molecule; If character block top horizontal ordinate is greater than fraction line center horizontal ordinate, then it belongs to denominator;

For radical sign, need to determine the character block being positioned at radical sign inside, every initial ordinate is greater than the initial ordinate of radical sign, and end ordinate is less than the character block that radical sign terminates ordinate, is all divided in current radical sign structure;

According to the sequence relation in above-mentioned line order relation, row order relation and formula special construction, determine the output of final formula structure, export with latex (composing system based on Τ Ε Χ) typesetting format.

Present invention efficiently solves the problem of representation of elementary mathematics formula in OCR identification, achieve the accurate identification of formula.

Accompanying drawing explanation

The process flow diagram of the structure analysis of handwritten form mathematical formulae and recognition methods in the natural scene that Fig. 1 provides for the embodiment of the present invention;

Fig. 2 is the structural representation of the convolutional neural networks that the invention process character recognition adopts.

Embodiment

In the natural scene embodiment of the present invention provided below in conjunction with accompanying drawing the structure analysis of handwritten form mathematical formulae with know method for distinguishing and be described in detail.

As shown in Figure 1, in the natural scene that provides of the embodiment of the present invention, the structure analysis of handwritten form mathematical formulae and recognition methods comprise the following steps:

Step S1, is converted to local contrast matrix by the gray matrix of natural scene image, uses otsu method to carry out two-value division to the local contrast matrix obtained, obtains two values matrix;

In the present embodiment in local contrast matrix coordinate to be local contrast Con (i, the j) computing formula of the point of (i, j) be:

Con(i,j)＝αC(i,j)+(1-α)(I _max(i,j)-I _min(i,j))

Wherein,

std represents the standard deviation of gray matrix, γ=1.

C (i, j) = \frac{I_{\max} (i, j) - I_{\min} (i, j)}{I_{\max} (i, j) + I_{\min} (i, j) + ϵ},

ε be prevent denominator be 0 dimensionless.

Otsu method is used to the method that the local contrast matrix obtained carries out two-value division to be in the present embodiment: to get maximal value and minimum value in local contrast matrix, 1000 parts of minizones will be divided between maximal value and minimum value, each element is divided in the minizone of its correspondence, forming a length is the statistic histogram of 1000, OTSU method is adopted to carry out two-value division to this histogram, the point being less than selected threshold value is background dot, and the point being greater than selected threshold value is character point.

Step S2, carries out connected domain analysis to two values matrix in step S1, and reject non-character connected domain, obtain character connected domain, concrete grammar is:

Step 202: average length and the height of adding up all connected domains;

Step S203: the rejecting carrying out non-character connected domain:

Step S204: preserve residue connected domain as character connected domain, obtain the character block of character connected domain according to its minimum outsourcing rectangle.

Step S3, adopts correlation coefficient process to carry out formula special construction Element detection to the character connected domain in step S2, and marks separately all special construction elements detected;

The formula special construction element of the present embodiment comprises braces, radical sign, fraction line;

Template matching method is adopted to detect for braces connected domain and radical sign connected domain, what standard form adopted is the matrix of 32*32, for the character block of character connected domain to be detected, also specification is needed to change into the matrix of 32*32, calculate the related coefficient of these two matrixes, if it is greater than 0.5, then represent that the match is successful, concrete steps are:

The formula of coupling is formula of correlation coefficient, is expressed as:

r = \frac{Σ_{i = 1}^{n} (x_{i} - \overset{&OverBar;}{x}) (y_{i} - \overset{&OverBar;}{y})}{\sqrt{Σ_{i = 1}^{n} {(x_{i} - \overset{&OverBar;}{x})}^{2} \cdot Σ_{i = 1}^{n} {(y_{i} - \overset{&OverBar;}{y})}^{2}}}

All special construction elements detected are marked separately, carries out structure analysis so that follow-up.

Step S4, adopts horizontal projection method to the capable division of the two values matrix in step S1;

Obtain oscillogram after carrying out horizontal projection to the two values matrix in step S1, the value of oscillogram horizontal ordinate is the line number of original image, and the number of the character point that the value of ordinate comprises for current line, obtains row information based on crest;

The distance of regulation adjacent peaks more than 10, must be less than two crests of 10 for distance, only retain one that peak value is higher, and the height of regulation crest is so minimum that to be greater than 1/20th of image length;

For the crest meeting above-mentioned condition, from two ends, crest left and right simultaneously toward external expansion, until when numerical value is less than 0.01 times of crest height, stop expansion,

From each crest of oscillogram to expanding about it, until when numerical value is less than 0.1 times of its crest value, stop expansion; If there occurs overlap during adjacent two crests expansion, then two row of its correspondence merge into a line; If there occurs overlap during adjacent two crests expansion, then two row of its correspondence merge into a line;

After obtaining the starting and ending positional information of every a line, each character connected domain is corresponding with row, concrete grammar is: the distance calculating the horizontal coordinate at each character connected domain center and the horizontal coordinate at each line of text center, character connected domain is divided into apart from that minimum a line.

Because system of equations may be divided into multirow sometimes by mistake, therefore specify, the line number at braces place, do not allow to be divided into multirow.

Step S5, adopts convolutional neural networks to identify each character connected domain;

As shown in Figure 2, the structure of convolutional neural networks is Lenet-5 structure, and this convolutional neural networks is made up of an input layer, two Convolution sums down-sampling layers, a full connection hidden layer and output layers;

Input layer sample size is 32*32, and first convolutional layer characteristic pattern number is 6, and second convolutional layer characteristic pattern number is 16, what down-sampling layer adopted is the mode that maximal value exports, ranks all become original half, and hidden layer node number is 120, and output layer node number is 84;

Training sample is the sample of the character connected domain after standardization, and obtain by by above-mentioned binaryzation mode, namely training sample is the same with the mode that forecast sample obtains and normalized mode, this is done to improve recognition accuracy.

Step S6, definition output order, by the order of recognition result according to correspondence, exports with latex typesetting format;

The output order of definition comprises three layers:

For the sequence relation in formula special construction, need the character block determining that each formula special construction element comprises;

According to the sequence relation in above-mentioned line order relation, row order relation and formula special construction, determine the output of final formula structure, export with latex typesetting format.

Effectively can be solved the problem of representation of elementary mathematics formula in OCR identification by above-described embodiment, achieve the accurate identification of formula.

The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. the structure analysis of handwritten form mathematical formulae and a recognition methods in natural scene image, is characterized in that, described method comprises:

Step S1: the gray matrix of natural scene image is converted to local contrast matrix, uses otsu method to carry out two-value division to the local contrast matrix obtained, obtains two values matrix;

Step S6: definition output order, by the order of recognition result according to correspondence, exports with latex typesetting format.

2. method according to claim 1, is characterized in that, in described local contrast matrix, to be local contrast Con (i, the j) computing formula of the point of (i, j) be coordinate:

Con(i,j)＝αC(i,j)+(1-α)(I _max(i,j)-I _min(i,j))

Wherein,

std represents the standard deviation of gray matrix, γ=1.

C (i, j) = \frac{I_{\max} (i, j) - I_{\min} (i, j)}{I_{\max} (i, j) + I_{\min} (i, j) + ϵ},

ε be prevent denominator be 0 dimensionless.

3. method according to claim 2, it is characterized in that, use otsu method to the method that the local contrast matrix obtained carries out two-value division is: get maximal value and minimum value in local contrast matrix, n part minizone will be divided between maximal value and minimum value, each element is divided in the minizone of its correspondence, forms histogram, otsu division is carried out on this histogram basis, the point being less than selected threshold value is background dot, and the point being greater than selected threshold value is character point.

4. method according to claim 3, is characterized in that, carries out connected domain analysis to two values matrix in step S1, and reject non-character connected domain, the method obtaining character connected domain is::

Step 202: average length and the height of adding up all connected domains;

Step S203: the rejecting carrying out non-character connected domain:

Step S204: preserve residue connected domain as character connected domain.

5. method according to claim 4, is characterized in that, the special construction of formula described in step S3 element comprises braces, radical sign, fraction line;

The formula of coupling is formula of correlation coefficient, is expressed as:

r = \frac{Σ_{i = 1}^{n} (x_{i} - \overset{&OverBar;}{x}) (y_{i} - \overset{&OverBar;}{y})}{\sqrt{Σ_{i = 1}^{n} {(x_{i} - \overset{&OverBar;}{x})}^{2} \cdot Σ_{i = 1}^{n} {(y_{i} - \overset{&OverBar;}{y})}^{2}}}

6. method according to claim 5, is characterized in that, adopts the method for horizontal projection method to the capable division of two values matrix to be in step S4:

7. method according to claim 6, is characterized in that, after obtaining the starting and ending positional information of every a line, each character connected domain is corresponding with row, and concrete grammar is:

8. method according to claim 7, is characterized in that, the structure of the convolutional neural networks in step S4 is Lenet-5 structure, and this convolutional neural networks is made up of an input layer, two Convolution sums down-sampling layers, a full connection hidden layer and output layers;

9. method according to claim 8, is characterized in that, the output order of step S6 definition comprises three layers:

10. method according to claim 9, is characterized in that, for the sequence relation in formula special construction, needs the character block determining that each formula special construction element comprises;