CN111767889A

CN111767889A - Formula recognition method, electronic device and computer readable medium

Info

Publication number: CN111767889A
Application number: CN202010653266.1A
Authority: CN
Inventors: 明卫鹏; 田意翔; 刘子韬
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2020-07-08
Filing date: 2020-07-08
Publication date: 2020-10-13

Abstract

The embodiment of the invention discloses a formula identification method, which comprises the following steps: preprocessing a picture containing a formula, and detecting a formula symbol of the preprocessed picture to obtain category information and position information of the formula symbol contained in the formula; constructing a mixed feature vector based on the category information and the position information of the formula symbol; and identifying and converting the formula symbols based on the mixed feature vector to obtain the character string corresponding to the formula contained in the picture. According to the scheme, the detected type information and position information of the formula symbols are constructed into the mixed characteristic vector, so that the accuracy rate is high in the process of identifying and converting the formula symbols.

Description

Formula recognition method, electronic device and computer readable medium

Technical Field

The embodiment of the invention relates to the technical field of text recognition, in particular to a formula recognition method in a natural scene, electronic equipment and a computer readable medium.

Background

Formula recognition in a natural scene is a process of obtaining a picture containing a formula through operations such as photographing and scanning in the natural scene, and then recognizing the formula in the picture as a latex character string.

At present, although the formula in the picture can be identified by means of various algorithms and neural network models, the formula structure is complex, and the formula presentation forms in the natural scene are extremely rich: for example, the formulas may have different sizes, fonts, colors, brightness, contrast, etc., and there may be bending, rotating, twisting, etc. Therefore, the recognition precision of the formula in the picture is not high, and the recognition result is not accurate enough.

Disclosure of Invention

The present invention provides a formula identification scheme to at least partially address the above-mentioned problems.

According to a first aspect of the embodiments of the present invention, there is provided a formula identification method, including: preprocessing the picture containing the formula to obtain a preprocessed picture; carrying out formula symbol detection on the preprocessed picture to obtain the category information and the position information of the formula symbols contained in the formula; constructing a mixed feature vector based on the category information and the position information of the formula symbol; and identifying and converting the formula symbols based on the mixed feature vector to obtain the character strings corresponding to the formula contained in the picture.

According to a second aspect of embodiments of the present invention, there is provided an electronic apparatus, the apparatus including: one or more processors; a computer readable medium configured to store one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the formula identification method as described in the first aspect.

According to a third aspect of embodiments of the present invention, there is provided a computer-readable medium, on which a computer program is stored, which when executed by a processor, implements a formula identification method as described in the first aspect.

According to the scheme provided by the embodiment of the invention: preprocessing a picture containing a formula, and detecting a formula symbol of the preprocessed picture to obtain category information and position information of the formula symbol contained in the formula; constructing a mixed feature vector based on the category information and the position information of the formula symbol; and identifying and converting the formula symbols based on the mixed feature vector to obtain the character string corresponding to the formula contained in the picture. The mixed feature vector constructed by the scheme comprises the position information and the category information of the formula symbol, the category of the formula symbol can be accurately determined through the category information, and the position of the formula symbol can be clearly indicated through the position information, so that the information for recognizing and converting the formula symbol is more comprehensive and complete, the formula symbol can be recognized more accurately, and the accuracy and the efficiency for recognizing and converting the formula symbol are higher.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

FIG. 1 is a flowchart illustrating steps of a formula recognition method according to a first embodiment of the present invention;

fig. 2 is a schematic diagram of a neural network model based on a Yolo structure according to a first embodiment of the present invention;

FIG. 3 is a diagram of an attention-based sequence-to-sequence model according to a first embodiment of the present invention;

FIG. 4 is a flowchart of a formula identification method according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

Example one

Referring to fig. 1, a flowchart illustrating steps of a formula identification method according to a first embodiment of the present invention is shown.

The formula identification method of the embodiment comprises the following steps:

step 101, preprocessing the picture containing the formula to obtain a preprocessed picture.

In order to reduce the influence of the pictures containing the formulas with different scales on the identification and the positioning of the formula symbols, reduce the size of the proportion of the formula symbol area in the pictures containing the formulas and reduce the influence of the size of the formula symbols on the identification and the positioning of the formula symbols, the pictures containing the formulas can be preprocessed.

In this embodiment, optionally, the picture containing the formula may be preprocessed in the following manner:

firstly, carrying out binarization processing on a picture containing a formula to obtain a binarized picture; then determining the picture area where the formula is located from the binary picture; and finally, obtaining a preprocessed picture according to the picture area cut out from the binary picture. In the preprocessed picture obtained by the method, on one hand, a large amount of unnecessary information, particularly non-formula information, is removed from the binarized picture; on the other hand, the area where the formula is located in the picture can be reserved, and the redundant picture area can be removed, so that the detection efficiency can be improved in the subsequent formula symbol detection.

In this embodiment, a binarization processing is performed on a picture including a formula, where a specific manner of the binarization processing may be implemented by a person skilled in the art in any appropriate manner, and this is not limited in this embodiment of the present invention. In this embodiment, in the binarized picture after the binarization processing, the pixel value corresponding to the formula symbol pixel is 1, and the pixel value of the non-formula symbol pixel is 0.

Then, a plurality of coordinate points with a pixel value of 1 may be obtained from the binarized picture, and the cutting range of the binarized picture may be determined as a picture region where the formula is located according to the plurality of coordinate points. Therefore, the positions of the pixel points with the pixel values of 1 can be obtained, the approximate area of the formula in the picture is further determined, and it can be understood that the more coordinate points are obtained, the more accurate the area of the formula in the picture is determined.

Optionally, the vertex coordinates of the circumscribed quadrangle of the formula may be obtained according to a maximum abscissa, a minimum abscissa, a maximum ordinate, and a minimum ordinate of the plurality of coordinate points; and then determining a cutting range according to the vertex coordinates, and cutting the binary image. Therefore, the picture of the area where the formula is located can be obtained efficiently.

For example, the minimum abscissa x1, the maximum abscissa x2, the minimum ordinate y1, and the maximum ordinate y2 may be obtained from all coordinate points with a pixel value of 1 obtained from the binarized picture; thereby obtaining an upper left corner coordinate point (x1, y2), a lower left corner coordinate point (x1, y1), an upper right corner coordinate point (x2, y2), and a lower right corner coordinate point (x2, y1) of the picture region having the pixel value of 1. Finally, the binarized picture may be cut using a quadrangle formed by the 4 coordinate points as a cutting range. The binarization picture is cut through the four coordinate points, and the minimum abscissa, the maximum abscissa, the minimum ordinate and the maximum ordinate of the pixel value of 1 are included, so that the picture area where the formula is located can be quickly and completely acquired.

Optionally, the image region cut out from the binarized image may be scaled according to a preset scale, and then edge-filling processing is performed to obtain a preprocessed image. The preset proportion can be set by a person skilled in the art according to actual conditions, and the requirement of subsequent processing can be met.

In one possible approach, the picture region cut out from the binarized picture may be scaled by a scaling factor (nx/(x2-x1), ny/(y2-y1)), where nx represents the product of n and x, and ny represents the product of n and y. Wherein n is the number of components corresponding to the formula in the binarized picture, and is specifically realized as the number of connected domains corresponding to the formula, which can be obtained by any appropriate connected domain detection method. Wherein, in the formula, one connected domain corresponds to a component of the formula, and one or more components form one part of the formula. For example: in the formula 6+3 ═ 9, the number of components is 6, "6", "3", "9", and "+" correspond to one component, respectively, and "=" has two connected domains, corresponding to two components. The components of the formula are formed by the symbols of the formula, and the formula includes 5 components, namely "6", "3", "9", "plus", and "═ respectively.

The values of x and y may be obtained through big data statistics, for example, a preset number of formulas may be selected, the size of the formula symbol in each formula is calculated, and then the average value of the sizes of all the formula symbols is obtained, which does not limit the preset number. Where x may represent the height of the formula symbol and y may represent the width of the formula symbol, so (x, y) may represent the pixel size of the formula symbol.

In this embodiment, the picture regions cut out from the binarized picture are scaled by the scaling factors (nx/(x2-x1), ny/(y2-y1), so that the sizes of the formula symbols included in the pictures can be similar.

The edge-filling process may be performed on the scaled picture to obtain a pre-processed picture, which may be a (1024, 256) -sized picture. However, this size is only an exemplary illustration, and in practical applications, a person skilled in the art can process the preprocessed pictures into the required size according to actual needs. The edge complementing processing is carried out on the zoomed picture, so that the outermost pixel point of the preprocessed picture can be completely detected when the formula symbol is detected subsequently.

In this embodiment, the picture including the formula is preprocessed, so that the influence of the pictures including the formula with different scales on the identification and the positioning of the formula symbol can be reduced, the size of the proportion of the formula symbol area in the picture including the formula can be reduced, and the influence of the size of the formula symbol on the identification and the positioning of the formula symbol can be reduced, so that the identification precision and the positioning precision of the formula can be improved.

And 102, detecting formula symbols in the preprocessed pictures to obtain the category information and the position information of the formula symbols contained in the formula.

In this embodiment, the category information of the formula symbol is used to indicate the category of the formula symbol, including but not limited to a number category, a letter category, an operation symbol category, a punctuation category, and the like.

Optionally, the preprocessed picture may be input into a first neural network model for performing formula symbol detection, so as to obtain category information and position information of the formula symbol included in the formula. Formula symbol detection is carried out through the first neural network model, and the detection accuracy rate is high.

Specifically, the preprocessed picture may be input into a first neural network model for formula symbol detection, and multi-scale feature extraction and symbol detection are performed on the preprocessed picture through the first neural network model to obtain category information and position information of the formula symbol included in the formula. The preprocessed picture is subjected to multi-scale feature extraction through the first neural network model, and the higher precision of formula symbol detection can be ensured.

In this embodiment, the first neural network model may be a neural network model based on a Yolo structure. Taking Yolo _ v3 as an example, the existing Yolo _ v3 uses feature maps of 3 different scales to perform object detection, and in order to avoid that the accuracy of formula identification and positioning is affected by the loss of shallow information with the increase of the number of network layers, the neural network model based on the Yolo structure in this embodiment may perform feature extraction of at least four scales to obtain corresponding feature maps of at least four scales. Wherein, the feature extraction of at least four scales comprises low-scale feature extraction. And then, based on the feature mapping chart with at least four scales, carrying out symbol frame detection and symbol identification to obtain the category information and the position information of the formula symbols contained in the formula. The low-scale feature extraction is based on feature extraction among pixels, and the extracted low-scale features are basic features which can be automatically extracted from an image without any shape/space relation information. In the embodiment, by adding low-scale feature extraction, feature extraction can be effectively performed on symbols with unconventional shape proportions, such as small target symbols, for example, dot symbols "-", long horizontal line symbols "-", and the like.

In the existing Yolo _ v3, 9 anchor frames with different scale scales are set, which are respectively: (10, 13), (16, 30), (33, 23), (30, 61), (62, 45), (59, 119), (116, 90), (156, 198), (373, 326), which are not accurate enough for the identification of certain target objects, for example certain mathematical symbols. In this embodiment, a neural network model based on a Yolo structure may be provided with 12 anchor frames with different scale, where the anchor frames include an anchor frame for detecting a set symbol; the size of the anchor frame can be set or adjusted according to actual requirements. As a set of example data, the anchor frame sizes may be: (10, 10),(6, 40),(40, 40),(40,6),(40, 80),(80, 40),(80, 80),(12, 100),(12, 120),(120 ),(100, 12),(228 ). The setting symbol includes at least one of: symbols having a symbol size within a preset size range, such as "═, +," x ", and other symbols that are common; symbols having a length-to-width ratio greater than a predetermined ratio, e.g. ","),

And the like. The preset ratio can be set by a person skilled in the art according to actual needs, and the embodiment of the present invention is not limited thereto. The size of the anchor frame is set in the neural network model based on the Yolo structure, so that the identification accuracy and the positioning accuracy of unusual formula symbols can be improved, for example

Formula symbols such as ' omega ', ' -and ' integral tone ' identify accuracy and positioning accuracy.

As shown in fig. 2, which is a schematic diagram of the above mentioned neural network model based on a Yolo structure, for an input picture, the neural network model based on a Yolo structure maps it to feature maps of multiple scales. Wherein DBL is the basic component of Yolo _ v3, and is convolution + BN + Leakyrelu, and BN and Leakyrelu together form the minimum component of Yolo _ v 3. N in resn in fig. 2 represents a number, and res1, res2, … … res8 and the like indicate how many res _ units are contained in res _ block, which is a large component of Yolo _ v3, and Yolo _ v3 uses the residual structure of ResNet for reference, and by using the structure, the network structure can be made deeper, and the basic component of res _ block is also DBL. The concat is used for tensor splicing, the upsampling of the middle layer of the dark net and the later layer is spliced, the splicing operation is different from the operation of the add of the residual layer, the dimensionality of the tensor can be expanded by splicing, and the add can be directly added without causing the change of the dimensionality of the tensor.

Some symbols with unconventional shape proportions, such as a dotted symbol "-", a long horizontal line symbol "-", and the like, may exist in the formula symbol, the conventional Yolo _ v3 is not easy to detect, in order to ensure the recognition accuracy of these symbols in the formula, a low-scale feature extraction channel is provided in this embodiment, referring to fig. 2, fig. 2 shows a schematic diagram of feature extraction of four scales by a neural network model based on a Yolo structure, a dashed box in the diagram shows 4 feature maps, where y4 is the obtained low-scale feature map. For an input preprocessed picture, the neural network model based on the Yolo structure can map the preprocessed picture to feature maps of 4 scales, and based on the feature maps and the 12 anchor frames, symbol recognition and symbol frame detection can be performed, so that the category information and the position information of the formula symbols contained in the input preprocessed picture are correspondingly obtained.

In the embodiment, the preprocessed pictures are subjected to multi-scale feature extraction and symbol detection through the neural network model based on the Yolo structure, so that missing detection of unconventional target symbols can be avoided, further, the influence on the next-stage translation process is avoided, the accuracy of formula recognition is improved, and the robustness and generalization capability of the pictures containing the formulas in different scenes are better.

And 103, constructing a mixed feature vector based on the category information and the position information of the formula symbol.

The position information of the formula symbol obtained in the above steps may include coordinates of four points of the upper left corner, the lower left corner, the upper right corner, and the lower right corner of the symbol frame, and optionally, each coordinate is formed by 8 numerical values of 0 to 1, and may form a vector representing the position information of the formula symbol.

In this embodiment, vectorization may be performed on the category information of the formula symbol to obtain a category vector; and splicing the category vector and the position information of the formula symbol to obtain a mixed feature vector. For example, the class information of the formula symbol may be vectorized to generate a 122-dimensional class vector, and the value of each dimension is between 0 and 1. The category vector may then be concatenated with the location information to generate a 130-dimensional hybrid feature vector containing the category information and the location information of the formula symbol. Among the 130-dimensional mixed feature vectors, 122 dimensions represent the category information of the formula symbol, and the remaining 8 dimensions represent the position information of the formula symbol. It should be noted that the above specific values are only exemplary, and in practical applications, a person skilled in the art may set the hybrid feature vector to be a dimension of other values according to practical needs.

In this embodiment, optionally, the mixed feature vector may also be obtained in other manners, for example, two sets of data, that is, the category vector and the position information of the formula symbol, are vertically stacked, that is, the position information of the formula symbol may be stacked in the next rows of the category vector. The mixed feature vector can also be obtained by means of superposition combination, i.e. one of the two groups of data of the category vector and the position information of the formula symbol is supplemented into the other group of data, and the like.

In this embodiment, because the constructed hybrid feature vector includes the position information and the category information of the formula symbol, compared with the prior art in which the formula symbol is identified based on the category information only, the positioning of the formula symbol will be faster and more accurate, the situations of position disorder and the like will not occur, and the accuracy rate is higher.

And 104, identifying and converting formula symbols based on the mixed feature vectors to obtain character strings corresponding to the formulas contained in the pictures.

In this embodiment, the character string may be a latex character string, and specifically, the mixed feature vector may be input into the second neural network model to perform recognition and conversion of formula symbols, so as to obtain a character string corresponding to a formula included in the picture.

Alternatively, the second neural network model may be a sequence-to-sequence model based on an attention mechanism.

As shown in fig. 3, a schematic diagram of a sequence-to-sequence model based on attention mechanism, which is a neural network of an Encoder-Decoder structure added with attention mechanism. The input sequence of the encoder is the above-mentioned mixed feature vector, if the formula represented by the mixed feature vector is x²+y²If 125 ÷ 5, the mixed feature vector is processed through an encoder to generate a coded vector, then the coded vector enters an Attention module to perform feature extraction, and then a latex character string is output through a decoder, wherein the character string is a formula x²+y²A latex string corresponding to 125 ÷ 5.

The traditional method for converting mathematical symbols in a formula into a latex character string utilizes a position-based context-free grammar, and the method has low calculation efficiency under the condition that more mathematical symbols are identified. In the embodiment, the sequence based on the attention mechanism is used as a sequence model, the conversion process is used as a translation process, and the mixed feature vector based on the category information and the position information of the formula symbol is constructed.

According to the embodiment, the picture containing the formula is preprocessed, and the preprocessed picture is subjected to formula symbol detection to obtain the category information and the position information of the formula symbol contained in the formula; constructing a mixed feature vector based on the category information and the position information of the formula symbol; and identifying and converting the formula symbols based on the mixed feature vector to obtain the character string corresponding to the formula contained in the picture. The mixed feature vector constructed by the scheme comprises the position information and the category information of the formula symbol, the category of the formula symbol can be accurately determined through the category information, and the position of the formula symbol can be clearly indicated through the position information, so that the information for recognizing and converting the formula symbol is more comprehensive and complete, the formula symbol can be recognized more accurately, and the accuracy and the efficiency for recognizing and converting the formula symbol are higher.

The formula identification method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc.

Example two

Fig. 4 is a flowchart illustrating a formula identification method according to a second embodiment of the present invention.

Exemplarily, if the formula in the preprocessed picture is x²+y²And if the image is 125 ÷ 5, taking the preprocessed image as the input of a neural network model based on a Yolo structure, and performing multi-scale feature extraction and symbol detection on the preprocessed image through the neural network model to obtain a corresponding feature mapping map with multiple scales. Then, based on the characteristic mapping maps, symbol frame detection and symbol identification are carried out to obtain a formula x²+y²The formula symbol category information and the position information included in 125 ÷ 5. As shown by the dotted line frame in fig. 4, each formula symbol in the preprocessed picture is finally obtained: "x", "2", "+", "y", "2", "═ 1", "2", "5", "÷", "5", and each formula symbol is in the symbolLocation in the pre-processed picture. For example, the coordinates of the four corners of the symbol box in fig. 4. Then, the category information and the position information of the formula symbol output by the neural network model based on the Yolo structure can be constructed into a mixed feature vector, the mixed feature vector is used as seq2seq, that is, a sequence based on the attention mechanism is input into the sequence model, the mixed feature vector is identified and converted through the seq2seq model, and finally, a character string corresponding to the formula contained in the picture is obtained: "x", "^", "2", "+", "y", "^", "2", "═ 1", "2", "5", "\\ div", "5".

The process of the preprocessing and the specific processing process of each model can be referred to the description of the relevant parts in the foregoing embodiment one, and are not described herein again.

According to the embodiment, when formula recognition is carried out, particularly when the formula recognition comprises an unconventional symbol such as a small target symbol or a symbol with the length-width ratio larger than a preset ratio, the picture comprising the formula can be preprocessed in advance to reduce the influence of the pictures comprising the formula with different scales on formula symbol recognition and positioning, so that the formula recognition precision and the positioning precision can be improved; then, formula symbol detection is carried out on the preprocessed picture through a neural network model based on a Yolo structure to obtain the category information and the position information of the formula symbols contained in the formula, so that missing detection of target symbols can be avoided, further the influence on the next stage is avoided, and the accuracy of formula identification containing small target symbols is improved; finally, constructing a mixed feature vector based on the category information and the position information of the formula symbol; and then based on the mixed feature vector, recognizing and converting the formula symbol to obtain a character string corresponding to the formula contained in the picture.

It should be noted that the formula identification scheme of the embodiment of the present invention can be widely applied to various scenarios, including but not limited to: formula recognition is performed on pictures that include a pure printed font formula, formula recognition is performed on pictures that include a pure handwritten font formula, formula recognition is performed on images that include both a printed font formula and a handwritten font formula, and so forth. Therefore, the formula identification scheme of the embodiment of the invention can be widely applied to scenes of various formula pictures, and the compatibility is better.

EXAMPLE III

Fig. 5 is a hardware structure of an electronic device according to a third embodiment of the present invention; as shown in fig. 5, the electronic device may include: a processor (processor)301, a communication Interface 302, a memory 303, and a communication bus 304.

Wherein:

the processor 301, the communication interface 302, and the memory 303 communicate with each other via a communication bus 304.

A communication interface 302 for communicating with other electronic devices or servers.

The processor 301 is configured to execute the program 305, and may specifically execute relevant steps in the above formula identification method embodiment.

In particular, program 305 may include program code comprising computer operating instructions.

The processor 301 may be a central processing unit CPU or an application specific Integrated circuit asic or one or more Integrated circuits configured to implement an embodiment of the present invention. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 303 stores a program 305. The memory 303 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 305 may specifically be configured to cause the processor 301 to perform the following operations: preprocessing the picture containing the formula to obtain a preprocessed picture; carrying out formula symbol detection on the preprocessed picture to obtain the category information and the position information of the formula symbol contained in the formula; constructing a mixed feature vector based on the category information and the position information of the formula symbol; and identifying and converting formula symbols based on the mixed feature vector to obtain a character string corresponding to the formula contained in the picture.

In an alternative embodiment, the program 305 is further configured to cause the processor 301, when constructing the hybrid feature vector based on the category information and the position information of the formula notation: vectorizing the category information of the formula symbols to obtain category vectors; and splicing the category vector and the position information of the formula symbol to obtain a mixed feature vector.

In an alternative embodiment, the program 305 is further configured to enable the processor 301, when preprocessing the picture containing the formula to obtain a preprocessed picture: carrying out binarization processing on the picture containing the formula to obtain a binarized picture; determining a picture area where a formula is located from the binary picture; and obtaining a preprocessed picture according to the picture area cut out from the binary picture.

In an alternative embodiment, the program 305 is further configured to enable the processor 301, when determining, from the binarized picture, a picture region where the formula is located: acquiring a plurality of coordinate points with the pixel value of 1 from the binary image; and determining the cutting range of the binary image according to the plurality of coordinate points, and taking the cutting range as the image area where the formula is located.

In an alternative embodiment, the program 305 is further configured to enable the processor 301, when obtaining the pre-processed picture according to the picture region cut out from the binarized picture: and (4) zooming the picture area cut out from the binary picture according to a preset proportion, and then performing edge filling processing to obtain a preprocessed picture.

In an alternative embodiment, the program 305 is further configured to cause the processor 301 to, when determining the cutting range of the picture as the picture region where the formula is located according to the plurality of coordinate points: obtaining the vertex coordinates of the circumscribed quadrangle of the formula according to the maximum abscissa, the minimum abscissa, the maximum ordinate and the minimum ordinate in the plurality of coordinate points; and determining a cutting range according to the vertex coordinates, and cutting the binary image.

In an alternative embodiment, the program 305 is further configured to enable the processor 301 to, when performing formula symbol detection on the pre-processed picture to obtain category information and position information of a formula symbol included in the formula: and inputting the preprocessed picture into a first neural network model for formula symbol detection to obtain the category information and the position information of the formula symbols contained in the formula.

In an alternative embodiment, the program 305 is further configured to enable the processor 301, when inputting the preprocessed picture into the first neural network model for formula symbol detection, and obtaining category information and position information of formula symbols included in the formula: inputting the preprocessed picture into a first neural network model for formula symbol detection, and performing multi-scale feature extraction and symbol detection on the preprocessed picture through the first neural network model to obtain category information and position information of formula symbols contained in the formula.

In an alternative embodiment, the first neural network model is a neural network model based on a Yolo structure; the program 305 is further configured to enable the processor 301 to, when performing multi-scale feature extraction and symbol detection on the preprocessed picture through the first neural network model to obtain category information and position information of a formula symbol included in the formula: performing feature extraction of at least four scales through a neural network model based on a Yolo structure to obtain a corresponding feature mapping chart of at least four scales, wherein the feature extraction of at least four scales comprises low-scale feature extraction; and carrying out symbol identification and symbol frame detection based on the feature mapping chart with at least four scales to correspondingly obtain the category information and the position information of the formula symbols contained in the formula.

In an alternative embodiment, 12 anchor boxes with different scale are set in the neural network model, and the anchor boxes include anchor boxes for detecting the set symbols.

In an alternative embodiment, the set symbol comprises at least one of: the symbol with the symbol size within the preset size range and the symbol with the length-width ratio larger than the preset ratio.

In an alternative embodiment, the program 305 is further configured to enable the processor 301, when performing recognition and conversion of a formula symbol based on the mixed feature vector to obtain a character string corresponding to a formula contained in the picture: and inputting the mixed feature vector into a second neural network model for identifying and converting formula symbols to obtain a character string corresponding to a formula contained in the picture, wherein the second neural network model is a sequence-to-sequence model based on an attention mechanism.

For specific implementation of each step in the program 305, reference may be made to corresponding descriptions in corresponding steps in the above embodiment of the formula identification method, which is not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

By the electronic equipment, the picture containing the formula is preprocessed, and the preprocessed picture is subjected to formula symbol detection to obtain the category information and the position information of the formula symbol contained in the formula; constructing a mixed feature vector based on the category information and the position information of the formula symbol; and identifying and converting the formula symbols based on the mixed feature vector to obtain the character string corresponding to the formula contained in the picture. The mixed feature vector constructed by the scheme comprises the position information and the category information of the formula symbol, the category of the formula symbol can be accurately determined through the category information, and the position of the formula symbol can be clearly indicated through the position information, so that the information for recognizing and converting the formula symbol is more comprehensive and complete, the formula symbol can be recognized more accurately, and the accuracy and the efficiency for recognizing and converting the formula symbol are higher.

In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code configured to perform the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program performs the above-described functions defined in the method in the embodiment of the present invention when executed by a Central Processing Unit (CPU). It should be noted that the computer readable medium in the embodiments of the present invention may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access storage media (RAM), a read-only storage media (ROM), an erasable programmable read-only storage media (EPROM or flash memory), an optical fiber, a portable compact disc read-only storage media (CD-ROM), an optical storage media piece, a magnetic storage media piece, or any suitable combination of the foregoing. In embodiments of the invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In an embodiment of the invention, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code configured to carry out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may operate over any of a variety of networks: including a Local Area Network (LAN) or a Wide Area Network (WAN) -to the user's computer, or alternatively, to an external computer (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions configured to implement the specified logical function(s). In the above embodiments, specific precedence relationships are provided, but these precedence relationships are only exemplary, and in particular implementations, the steps may be fewer, more, or the execution order may be modified. That is, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an access module and a transmit module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.

As another aspect, an embodiment of the present invention further provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the formula identification method described in the above embodiment.

As another aspect, an embodiment of the present invention further provides a computer-readable medium, which may be included in the apparatus described in the above embodiment; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: preprocessing the picture containing the formula to obtain a preprocessed picture; carrying out formula symbol detection on the preprocessed picture to obtain the category information and the position information of the formula symbol contained in the formula; constructing a mixed feature vector based on the category information and the position information of the formula symbol; and identifying and converting formula symbols based on the mixed feature vector to obtain a character string corresponding to the formula contained in the picture.

The expressions "first", "second", "said first" or "said second" used in various embodiments of the invention may modify various components without relation to order and/or importance, but these expressions do not limit the respective components. The above description is only configured for the purpose of distinguishing elements from other elements.

The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the embodiments of the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept described above. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present invention are mutually replaced to form the technical solution.

Claims

1. A formula identification method, the method comprising:

preprocessing the picture containing the formula to obtain a preprocessed picture;

carrying out formula symbol detection on the preprocessed picture to obtain the category information and the position information of the formula symbols contained in the formula;

constructing a mixed feature vector based on the category information and the position information of the formula symbol;

and identifying and converting the formula symbols based on the mixed feature vector to obtain the character strings corresponding to the formula contained in the picture.

2. The method of claim 1, wherein constructing a hybrid feature vector based on the category information and the location information of the formula symbol comprises:

vectorizing the category information of the formula symbols to obtain category vectors;

and splicing the category vector and the position information of the formula symbol to obtain a mixed feature vector.

3. The method of claim 1, wherein the pre-processing the picture containing the formula to obtain a pre-processed picture comprises:

carrying out binarization processing on the picture containing the formula to obtain a binarized picture;

determining a picture area where the formula is located from the binarized picture;

and obtaining the preprocessed picture according to the picture region cut out from the binaryzation picture.

4. The method according to claim 3, wherein the determining the picture region where the formula is located from the binarized picture comprises:

acquiring a plurality of coordinate points with the pixel value of 1 from the binarization picture;

and determining the cutting range of the binarization picture according to the plurality of coordinate points to be used as the picture area where the formula is located.

5. The method according to claim 3, wherein the obtaining the preprocessed picture according to the picture region cut out from the binarized picture comprises:

and after the picture area cut out from the binarization picture is scaled according to a preset proportion, performing edge filling processing to obtain a preprocessed picture.

6. The method according to claim 4, wherein the determining the cutting range of the picture according to the plurality of coordinate points as the picture region where the formula is located comprises:

obtaining the vertex coordinates of the circumscribed quadrangle of the formula according to the maximum abscissa, the minimum abscissa, the maximum ordinate and the minimum ordinate in the plurality of coordinate points;

and determining a cutting range according to the vertex coordinates, and cutting the binary image.

7. The method according to claim 1, wherein the detecting formula symbols in the preprocessed picture to obtain category information and position information of the formula symbols included in the formula comprises:

and inputting the preprocessed picture into a first neural network model for formula symbol detection to obtain the category information and the position information of the formula symbols contained in the formula.

8. The method according to claim 7, wherein the inputting the preprocessed picture into a first neural network model for formula symbol detection to obtain category information and location information of formula symbols included in the formula comprises:

inputting the preprocessed picture into a first neural network model for formula symbol detection, and performing multi-scale feature extraction and symbol detection on the preprocessed picture through the first neural network model to obtain category information and position information of formula symbols contained in the formula.

9. The method of claim 8, wherein the first neural network model is a Yolo structure-based neural network model;

the obtaining of the category information and the position information of the formula symbol contained in the formula by performing multi-scale feature extraction and symbol detection on the preprocessed picture through the first neural network model comprises:

performing feature extraction of at least four scales through the neural network model based on the Yolo structure to obtain a corresponding feature mapping map of at least four scales, wherein the feature extraction of at least four scales comprises low-scale feature extraction;

and carrying out symbol identification and symbol frame detection based on the feature mapping maps of at least four scales, and correspondingly obtaining the category information and the position information of the formula symbols contained in the formula.

10. The method according to claim 9, wherein 12 anchor boxes with different scale are set in the neural network model, and the anchor boxes comprise anchor boxes for detecting the set symbols.

11. The method of claim 10, wherein the set symbol comprises at least one of: the symbol with the symbol size within the preset size range and the symbol with the length-width ratio larger than the preset ratio.

12. The method according to claim 1, wherein the identifying and converting the formula symbol based on the mixed feature vector to obtain a character string corresponding to the formula contained in the picture comprises:

and inputting the mixed feature vector into a second neural network model for recognition and conversion of the formula symbols to obtain a character string corresponding to the formula contained in the picture, wherein the second neural network model is a sequence-to-sequence model based on an attention mechanism.

13. An electronic device, characterized in that the device comprises:

one or more processors;

a computer readable medium configured to store one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the formula identification method of any one of claims 1-12.

14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the formula identification method as claimed in any one of claims 1 to 12.