CN115205623A - Formula identification method and device - Google Patents

Formula identification method and device Download PDF

Info

Publication number
CN115205623A
CN115205623A CN202210713591.1A CN202210713591A CN115205623A CN 115205623 A CN115205623 A CN 115205623A CN 202210713591 A CN202210713591 A CN 202210713591A CN 115205623 A CN115205623 A CN 115205623A
Authority
CN
China
Prior art keywords
formula
images
image
formulas
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210713591.1A
Other languages
Chinese (zh)
Inventor
石瑞姣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to CN202210713591.1A priority Critical patent/CN115205623A/en
Publication of CN115205623A publication Critical patent/CN115205623A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink

Abstract

The embodiment of the disclosure provides a formula identification method and device, relates to the technical field of computer vision, and is used for improving the accuracy of formula identification. The formula identification method comprises the following steps: the method comprises the steps of obtaining a training sample set, wherein the training sample set comprises K first images and K target format formulas, the first images comprise first formulas, the K first formulas are in one-to-one correspondence with the K target format formulas, and K is an integer larger than or equal to 1. And processing the M first images by adopting a data enhancement algorithm to obtain M second images, wherein M is more than or equal to 1 and less than or equal to K, and M is an integer. And performing model training on the formula recognition model according to the M second images, the K-M first images and the K target format formulas to obtain a trained formula recognition model, wherein the trained formula recognition model is used for recognizing the writing formula and obtaining the target format formula.

Description

Formula identification method and device
Technical Field
The disclosure relates to the technical field of computer vision, in particular to a formula identification method and device.
Background
Users often encounter the input of mathematical formulas when editing text. Compared with other text forms, the mathematical formula has a changeable form, and the spatial relationship between the characters is complex, so that the input process is complicated, and a great amount of time and energy are consumed for a user.
With the development of computer vision technology, although a user can write a formula on an electronic device in a direct handwriting mode and then convert the formula into an electronic text form, common writing habits of some users cannot be well recognized, so that the problems of low recognition accuracy and poor recognition effect are caused.
Disclosure of Invention
An object of the embodiments of the present disclosure is to provide a formula identification method and apparatus, which can improve accuracy of formula identification.
In order to achieve the purpose, the embodiment of the disclosure adopts the following technical scheme:
in one aspect, a formula identification method is provided. The formula identification method comprises the following steps: the method comprises the steps of obtaining a training sample set, wherein the training sample set comprises K first images and K target format formulas, the first images comprise first formulas, the K first formulas are in one-to-one correspondence with the K target format formulas, and K is an integer larger than or equal to 1. And processing the M first images by adopting a data enhancement algorithm to obtain M second images, wherein M is more than or equal to 1 and less than or equal to K, and M is an integer. And performing model training on the formula recognition model according to the M second images, the K-M first images and the K target format formulas to obtain a trained formula recognition model, wherein the trained formula recognition model is used for recognizing the writing formula and obtaining the target format formula.
Based on the technical scheme, the formula images of the training samples of the formula recognition model are subjected to data enhancement processing, and then the formula images subjected to data enhancement processing are trained, so that the trained formula recognition model can be suitable for some common writing habits of users, and the accuracy of formula recognition is improved.
In some embodiments, the data enhancement algorithm includes a dilation algorithm and/or a erosion algorithm.
In some embodiments, the data enhancement algorithm further includes an identifier recognition algorithm, and the processing the first image using the identifier recognition algorithm includes: identifying whether a first formula in a first image contains at least one preset identifier; each preset identifier in the at least one preset identifier corresponds to a designated operation, and the designated operation comprises an adding operation and/or a deleting operation. If the first formula in the first image contains the preset identifier, executing the designated operation corresponding to the preset identifier on the first formula in the first image.
In some embodiments, the data enhancement algorithm further includes a rotation correction algorithm, and the processing the first image using the rotation correction algorithm includes: the first image is rotated by a preset angle.
In some embodiments, the formula identification method further comprises: and acquiring a third image, wherein the third image comprises a formula to be identified. And inputting the third image into the trained formula recognition model to obtain a formula of a target format corresponding to the formula to be recognized.
In some embodiments, the trained formula recognition model includes an encoder and a decoder. The encoder adopts a CNN network with a DenseNet structure, and the decoder comprises a first GRU module, a second GRU module and an ATTENTION module.
In some embodiments, the acquiring the third image includes: and acquiring the track points of the formula to be identified. And generating a third image according to the track points of the formula to be recognized.
In another aspect, a formula recognition apparatus is provided. The formula display device includes: a transceiver and a processor. The processor is configured to: the method comprises the steps that a training sample set is obtained through a transceiver, the training sample set comprises K first images and K formulas of target formats, the first images comprise first formulas, the K first formulas correspond to the K formulas of the target formats in a one-to-one mode, and K is an integer larger than or equal to 1. And processing the M first images by adopting a data enhancement algorithm to obtain M second images, wherein M is more than or equal to 1 and less than or equal to K, and M is an integer. And performing model training on the formula recognition model according to the M second images, the K-M first images and the K target format formulas to obtain a trained formula recognition model, wherein the trained formula recognition model is used for recognizing the written formula and obtaining the target format formulas.
The formula identification apparatus has the same beneficial technical effects as the formula identification methods provided in the embodiments, and details are not repeated herein.
In yet another aspect, a non-transitory computer-readable storage medium is provided, which stores computer program instructions that, when executed by a formula recognition apparatus, implement a formula recognition method as in any one of the above embodiments.
In a further aspect, there is provided a computer program product stored on a non-transitory computer readable storage medium, the computer program product comprising computer program instructions for causing a computer to perform a formula identification method as in any one of the above embodiments.
In yet another aspect, an electronic device is provided, including: a processor and a memory. The memory is coupled to the processor. The memory is for storing computer program code; the computer program code comprises computer program instructions which, when executed by a processor, cause the electronic device to perform the formula identification method described in the above embodiments.
Drawings
In order to more clearly illustrate the technical solutions in the present disclosure, the drawings needed to be used in some embodiments of the present disclosure will be briefly described below, and it is apparent that the drawings in the following description are only drawings of some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art according to the drawings. Furthermore, the drawings in the following description may be regarded as schematic diagrams, and do not limit the actual size of products, the actual flow of methods, the actual timing of signals, and the like, involved in the embodiments of the present disclosure.
FIG. 1 is a block diagram of a formula identification system in accordance with some embodiments;
FIG. 2 is a flow diagram of a formula identification method according to some embodiments;
FIG. 3 is a schematic illustration of a pen break and stroke phenomenon according to some embodiments;
FIG. 4 is a schematic illustration of an identifier phenomenon according to some embodiments;
FIG. 5 is a schematic illustration of a tilt phenomenon according to some embodiments;
FIG. 6 is a network architecture diagram of a decoder according to some embodiments;
FIG. 7 is a network architecture diagram of another decoder according to some embodiments;
FIG. 8 is a diagram of a written formula and a LaTex formula, according to some embodiments;
FIG. 9 is a flow diagram of another formula identification method according to some embodiments;
FIG. 10 is a diagram of a LaTex format formula and a common format formula in accordance with some embodiments;
FIG. 11 is a schematic diagram of a scenario generated by a LaTex formula in accordance with some embodiments;
FIG. 12 is a block diagram of a formula identification apparatus according to some embodiments.
Detailed Description
The technical solutions in some embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present disclosure are within the scope of protection of the present disclosure.
Throughout the specification and claims, the term "comprising" is to be interpreted in an open, inclusive sense, i.e., as "including, but not limited to," unless the context requires otherwise. In the description herein, the terms "one embodiment," "some embodiments," "an example embodiment," "an example" or "some examples" or the like are intended to indicate that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the disclosure. The schematic representations of the above terms are not necessarily referring to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be included in any suitable manner in any one or more embodiments or examples.
In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present disclosure, "a plurality" means two or more unless otherwise specified.
"at least one of A, B and C" has the same meaning as "at least one of A, B or C" and includes the following combinations of A, B and C: a alone, B alone, C alone, a combination of A and B, A and C in combination, B and C in combination, and A, B and C in combination.
"A and/or B" includes the following three combinations: a alone, B alone, and a combination of A and B.
The use of "adapted to" or "configured to" herein is meant to be an open and inclusive language that does not exclude devices adapted to or configured to perform additional tasks or steps.
Additionally, the use of "based on" means open and inclusive, as a process, step, calculation, or other action that is "based on" one or more stated conditions or values may in practice be based on additional conditions or values beyond those stated.
Generally, in order to save the user time in editing a formula, the user may write a mathematical formula to be edited on the electronic device, and then the formula written by the user is converted into a form of electronic text, such as LaTex format text, by the electronic device. However, different users have different writing habits, and when the user writes the formula, irregular writing may occur, such as pen breaking, pen tracing, deleting, adding, inclining, and the like during writing. When these irregular writing conditions appear, electronic equipment can't accurate discernment, leads to the degree of accuracy of formula discernment not high, and the effect of discernment is not good.
Therefore, the embodiment of the disclosure provides a formula identification method, which performs data enhancement processing on samples in a training sample set of a formula identification model, so that the processed samples in the training sample set include the condition of writing irregularity, and therefore, the formula identification model obtained by training according to the samples can identify the formula written irregularity by a user, and the accuracy of formula identification is improved.
The formula identification method provided by some embodiments of the present disclosure may be used in a formula identification system 10 as shown in fig. 1. The formula recognition system 10 includes a formula recognition apparatus 11 and an electronic device 12. The formula identifying device 11 comprises an image converter 111 and a formula detecting device 112, and the electronic device 12 comprises a data collector 121 and a formula display 122.
Illustratively, the electronic device 12 includes a touch screen or tablet that supports handwriting functionality. The electronic device 12 may include, but is not limited to, a mobile terminal, a computer, a tablet computer, a mobile phone, etc., for example, and the disclosed embodiment is not limited to the type of the electronic device 12.
As shown in fig. 1, when a user writes a formula on a touch screen or a handwriting pad of the electronic device 12, the data collector 121 collects trace points of the formula input by the user, and sends the collected trace points to the formula identifying device 11. The image converter 111 in the formula recognition apparatus 11 receives the trace points, converts the trace points into a formula image, and sends the formula image to the formula detection apparatus 112. The formula detecting device 112 receives the formula image, inputs the formula image into a trained formula recognition model (the trained formula recognition model may be a model obtained by training in advance according to a training sample set), obtains a LaTex formula corresponding to a formula written by a user, and sends the LaTex formula to the electronic device 12. The formula display 122 in the electronic device 12 displays the LaTex formula, or the formula display 122 may also convert the LaTex formula into a general formula format for display, so as to meet the different format requirements of the user on the written formula.
In some embodiments, the formula identifying apparatus 11 may further include a model training apparatus (not shown in fig. 1) for training the formula identifying model according to the training sample set, so as to obtain a trained formula identifying model.
Illustratively, the formula detection device 112 may include an encoder, a decoder, and a parser. The encoder is used for encoding the formula image input to the formula detection device 112. The decoder is used for decoding the image coded by the coder. The parser is used for converting the output information of the decoder into LaTex format text.
In some embodiments, the formula identifying apparatus 11 may be integrated with the electronic device 12, or may be disposed on a cloud server. For example, when the formula recognition apparatus 11 is disposed on the cloud server, after the user writes the formula on the electronic device 12, the cloud server may generate a formula image according to the trace points of the written formula sent by the electronic device 12, input the formula image into the trained formula recognition model to complete recognition of the written formula, and return the recognition result to the electronic device 12.
Some embodiments of the present disclosure provide a formula identification method, as shown in fig. 2, which includes steps 210 to 230. The formula identifying method can be implemented by the formula identifying apparatus 11 described above.
Step 210, a training sample set is obtained. The training sample set comprises K first images and K formulas in a target format; the first image includes first formulas, the K first formulas correspond to the formulas of the K target formats one to one, and K is an integer greater than or equal to 1.
Illustratively, the formula of the target format includes a formula of a LaTex format, and the embodiments of the present disclosure do not limit the type of the target format, and the following embodiments exemplify the target format as the LaTex format.
Illustratively, the first image is an image generated by a first formula, the first formula being a formula written by a user. For example, the first image may be an image generated from the trace points of first formulas, one LaTex formula for each first formula. Since the first formulas correspond to the LaTex formulas one to one, the K first images correspond to the K LaTex formulas one to one.
In some embodiments, the training sample set includes K first images having the same pixel size, and the embodiments of the present disclosure do not limit the pixel size of the first images, and the following embodiments exemplify images in which the first images are 200 pixels high and 800 pixels wide.
It should be noted that, unlike the recognition of chinese and english characters, formula recognition involves two-dimensional structure information, such as index, fraction, logarithm, and root formula, and thus, the set training label needs to include both character information and structure information between characters. Therefore, the LaTex text formula is used as a label for training and testing the formula recognition model, and the LaTex text formula can better show the information of the written formula.
The training sample set of the formula recognition model comprises K first images and K LaTex text formulas, wherein K is an integer greater than or equal to 1. The larger the value of K is, the higher the quality of the trained formula recognition model is, and the recognition accuracy is also higher. However, if the value of K is too large, the training performance of the formula recognition model is also affected, and therefore, a more appropriate number of samples K can be selected for training. The number of training samples is not particularly limited in the embodiments of the present disclosure.
And step 220, processing the M first images by adopting a data enhancement algorithm to obtain M second images, wherein M is more than or equal to 1 and less than or equal to K, and M is an integer.
Because the writing habits of the users are different, and irregular conditions can occur in the writing process, and the first images in the training sample set can not contain the writing habits and irregular conditions, the formula recognition model trained by adopting the training sample set can not recognize the formula containing the writing habits and irregular conditions of the users, or the problem of recognition errors occurs.
Therefore, the data enhancement algorithm is adopted for the first images in the training sample set in the formula embodiment, so that the first images in the training sample set can contain more user writing habits and irregular writing conditions as much as possible, the formula identification model obtained by training the training sample set processed by the data enhancement algorithm can identify the formula containing the user writing habits and irregular writing conditions, and the accuracy of formula identification is improved.
And processing the M first images in the training sample set by adopting a data enhancement algorithm, namely processing part or all of the K first images by adopting the data enhancement algorithm. The present disclosure does not limit the specific number of first images processed using the data enhancement algorithm.
In some embodiments, the data enhancement algorithm includes a dilation algorithm and/or an erosion algorithm. Aiming at the phenomena of pen breaking and pen stroke which are frequently generated in the writing process of a user, the first image is processed based on an expansion and/or corrosion algorithm to simulate the phenomena of pen stroke and/or pen stroke breaking which are generated in the writing process of the user.
After model training is carried out according to the training sample set processed by the expansion algorithm and/or the corrosion algorithm, the obtained formula recognition model can recognize the pen breaking and/or pen tracing conditions in the writing process of the user. Illustratively, processing the first image in the sample set using the dilation algorithm and/or the erosion algorithm includes: processing the first image by adopting an expansion algorithm, so that the processed first image comprises the writing condition of the pen; and processing the first image by adopting a corrosion algorithm, so that the processed first image contains the writing condition of broken pen.
Illustratively, a preset 50 × 50 pixel image block is used to randomly walk the entire area of the first image with a pixel size of 200 × 800, and an expansion or erosion (e.g., a random number with an expansion erosion kernel of 1-3) algorithm is used to adjust the line width of the handwriting for the handwriting covered by the image block.
For example, taking the first image as the image shown in (a) in fig. 3 as an example, processing the first image shown in (a) in fig. 3 by using the erosion algorithm can reduce the line width of the stroke point of the formula in the first image, so that the pen break occurs in the formula, and the image shown in (b) in fig. 3 is obtained. The first image shown in (a) in fig. 3 is processed by using an expansion algorithm, and the line width of the stroke point of the formula in the first image can be enlarged, so that the stroke condition of the formula occurs, and the image shown in (c) in fig. 3 is obtained.
In some embodiments, the data enhancement algorithm further comprises an identifier recognition algorithm. The process of processing the samples in the sample set by using the identifier recognition algorithm comprises two steps.
The method comprises the following steps of firstly, identifying whether a first formula contains at least one preset identifier, wherein each preset identifier in the at least one preset identifier corresponds to a designated operation, and the designated operation comprises an adding operation and/or a deleting operation.
Before identifying the model for the training formula, it may be determined whether the first formula in the first image includes a predetermined identifier. For example, each preset identifier corresponds to a specific operation, or multiple preset identifiers may correspond to the same specific operation. The specified operations may include, for example, add and/or delete operations. For example, an identifier for which a parallel-bar is set may correspond to a deletion operation, or three identifiers, namely, a parallel-bar, a single-bar, and a cross may correspond to a deletion operation. The preset identifier may be set by the user, or may be a default setting in the formula recognition apparatus 11. The present disclosure does not specifically limit the setting manner of the preset identifier and the corresponding relationship between the preset identifier and the designated operation.
And secondly, if the first formula in the first image contains the preset identifier, executing the specified operation corresponding to the preset identifier on the first formula in the first image.
In the formula identifying process, if the first formula includes the preset identifier, the formula identifying apparatus 11 performs a corresponding specified operation on the preset identifier in the first formula. For example, when the formula identifying means 11 identifies one of the three identifiers, when the deletion operation is set to correspond to all of the three identifiers, the deletion operation is performed on the character at the position corresponding to the identifier in the first formula, that is, the character at the position corresponding to the identifier is not identified.
For example, as shown in fig. 4, taking a parallel-stroke corresponding deletion operation as an example, a first formula in which a parallel-stroke recognition is superimposed on an eighth character "0" in a first image shown in (a) in fig. 4 is "arctan 0054". If the formula identifying means 11 identifies that the first formula includes a parallel-stroke identifier, a delete operation is performed on "0" at the position of the parallel-stroke identifier, resulting in the processed first formula "arctan054" shown in (b) in fig. 4.
In some embodiments, the data enhancement algorithm further comprises a rotation correction algorithm, which is employed to rotate the first image by a preset angle.
For the tilted first image shown in fig. 5, a tilt correction method, such as a method of obtaining a rotation angle by straight line fitting, a minimum bounding rectangle method, and the like, is usually adopted for processing, but these methods are not robust enough and the implementation process is complex. In order to be different from the method for correcting the oblique writing image in the recognition process, the method for recognizing the oblique writing image randomly rotates the first image which is horizontally written in the training sample set by a preset angle, so that the trained formula recognition model can independently learn oblique information and can be suitable for the phenomenon of oblique writing. Illustratively, the M first images in the training sample set may be randomly rotated by an angle of-20 to 20.
In some embodiments, the data enhancement algorithms employed for the M first images may be the same or different. The data enhancement algorithm includes at least one of a dilation algorithm, a erosion algorithm, an identifier recognition algorithm, and a rotation correction algorithm. And processing the M first images in the training sample set by adopting a data enhancement algorithm, wherein the processing comprises processing the first images by adopting at least one algorithm of an expansion algorithm, a corrosion algorithm, an identifier recognition algorithm and a rotation correction algorithm. When the M first images are processed using the multiple data enhancement algorithms, the present disclosure does not specifically limit the processing order thereof. For example, the second image may be generated by processing the first image with the dilation algorithm and the erosion algorithm and then with the identifier recognition algorithm. For another example, the second image may be generated by processing the second image using one of an expansion algorithm, an erosion algorithm, an identifier recognition algorithm, or a rotation correction algorithm. The present disclosure is not limited as to the type of data enhancement algorithm employed for the first image.
And step 230, performing model training on the formula recognition model according to the M second images, the K-M first images and the K target format formulas to obtain a trained formula recognition model, wherein the trained formula recognition model is used for recognizing the written formula and obtaining the target format formulas.
And inputting the M second images obtained after the data enhancement algorithm processing in the step S220 and the K-M first images which are not subjected to the data enhancement processing in the training sample set as input images into a formula recognition model for training, wherein the trained formula recognition model can automatically recognize writing habits and irregular writing conditions in the formula writing process of a user, and the formula recognition accuracy is improved.
As previously mentioned, the formula recognition model is generated by training of a model training device, which, like the formula detection device 112, may also include an encoder, a decoder, and a parser.
The encoder may employ a Convolutional Neural Network (CNN) to perform an encoding process on the image input to the model training apparatus, which illustratively includes M second images and K-M first images.
In some embodiments, the encoder may employ a CNN network of the densneet (Densely Connected volumetric Networks) structure. It should be noted that the CNN network of the DenseNet structure adopts a dense connection method, and each layer receives the outputs of all previous layers as its additional input. Thus, compared with a CNN network with structures such as vgg and ResNet, the DenseNet structure can more fully utilize the characteristics of each layer and realize characteristic multiplexing. Meanwhile, the DenseNet structure can enhance the extraction of features, promote the propagation of gradients and facilitate the training of formula recognition models. On the other hand, the DenseNet structure does not need to relearn the redundant feature map, so that the required parameters are less compared with the traditional CNN network, and the performance of the formula identification model is improved.
The following is an exemplary explanation with DenseNet 121 as the encoder. DenseNet 121 performs feature extraction and coding processing on an input image of the model training device to obtain a high-dimensional feature vector corresponding to the input image. The high-dimensional feature vector includes formula features corresponding to a writing formula in the input image, such as feature information such as numbers, characters, operation symbols, position information, and the like.
In some embodiments, the decoder may include a GRU (Gated current Unit) module and an ATTENTION module. Illustratively, as shown in fig. 6, the decoder may include a first GRU module 610, a second GRU module 630 and an ATTENTION module.
The input to the first GRU module 610 is two vectors, a decoder _ hidden vector 611 and a decoder _ input vector 612, respectively, and after passing through the first GRU module 610, the decoder _hiddenvector 611 and the decoder _ input vector 612 output a vector st.
The ATTENTION module 620 can prompt the second GRU module 630 to focus on the local features of each input image, making the decoding process more targeted and improving the decoding capability of the decoder. As shown in fig. 7, the input to the ATTENTION module 620 is two vectors, namely an attribute _ sum vector 621 and a decoder _ attribute vector 622. The attribute _ sum vector 621 and the decoder _ attribute vector 622 are summed to obtain an attribute _ sum _ next vector 623. The attribute _ sum _ next vector 623, the encoder output encoder _ output vector 601, and the output vector st of the first GRU module 610 respectively pass through Full connection Layers (FCs), and the FC adjusts the respective vectors to the same dimension, and then the vector et is obtained through summation operation. The vector et is subjected to a series of operations to obtain a decoder _ annotation _ next vector 624, and the decoder _ annotation _ next vector 624 is a weight coefficient output by the ATTENTION module 620. Illustratively, the series of operations that the vector et undergoes include, but are not limited to, convolution operations (e.g., cov 3 × 3), batch normalization processing (BN), hyperbolic tangent function (tanh), FC, etc.
The encoder outputs an encoder _ output vector 601, which is multiplied by a decoder _ annotation _ next vector 624 output by the ATTENTION module 620, to obtain a vector ct. Illustratively, ct and st are vectors that may be of two identical dimensions.
With continued reference to FIG. 6, the vector st and the output vector ct output by the first GRU module 610 are input together into the second GRU module 630, and processed by the second GRU module 630 to obtain a decoder _ hidden _ next vector 631, the decoder _hidden _ next vector 631 being the prediction vector for the next character output by the second GRU module 630. Then, the decoder _ hidden _ next vector 631, the weighted encoder _ output vector 601, and the decoder _ input vector 612 input by the first GRU module 610 are respectively passed through FC, FC adjusts each information to the same dimension, and then passed through the FC + softmax module 240, so as to obtain a final output decoder _ output vector of the decoder, where the decoder _ output vector is a weight vector of a character output by the decoder. softmax is an activation function, and the output of softmax is the classification probability value of each character in the input image.
In some embodiments, the number of characters that the model training apparatus supports recognition may be 37, and thus, the number of channels output may also be compressed to 37 by FC, so that softmax outputs weight information of 37 characters (e.g., characters 0-36). Illustratively, FC may also be implemented using an nn.
Therefore, after the decoder finishes the recognition, a group of character strings is output, and the character strings are at least one of the preset 37 characters. The preset 37 characters may be, for example, characters (id) 0 to 36 in table 1 below. In some embodiments, the parser may convert the character string output by the decoder into a LaTex format according to the relationship between the character id and LaTex in table 1, to obtain a predicted LaTex formula.
In some embodiments, the loss function of the formula recognition model can be determined according to the K LaTex formulas in the training sample set and the predicted LaTex formula output by the formula recognition device 11. And then adjusting the relevant parameters of the formula recognition model according to the determined loss function until the determined formula recognition model is the trained formula recognition model when the loss function is converged.
The following table 1 shows a comparison table of character id, formula symbol and LaTex.
TABLE 1 LaTex notation comparison Table
id (symbol) LaTex represents id (symbol) LaTex represents
0 Ending symbol <eol> 18 3 3
1 ( ( 19 4 4
2 ) ) 20 5 5
3 \% 21 6 6
4 . . 22 7 7
5 + + 23 8 8
6 - - 24 9 9
7 × \times 25 sin \sin
8 ÷ \div 26 cos \cos
9 \sqrt 27 tan \tan
10 Score of \frac 28 cot \cot
11 log \log 29 arcsin \arcsin
12 ln \ln 30 arccos \arccos
13 lg \lg 31 arctan \arctan
14 Index of refraction ^ 32 arccot \arccot
15 0 0 33 π \pi
16 1 1 34 Φ \phi
17 2 2 35 e e
36 Initial symbol <sos>
As shown in table 1, for example, the character id string output by the decoder is: 17. 15, 5, 18, 7 and 33, and the LaTex formula obtained after the analysis by the analyzer is as follows: 20+3\ times \ pi (20 +3 pi).
The formula identification method provided by the embodiment of the disclosure can obtain the LaTex formula corresponding to the written formula. The LaTex formula contains various symbols in the written formula, such as numbers, letters, specific mathematical symbols, operators and the like, and also comprises various spatial relationships in the written formula, such as left and right, up and down, left and up, right and up, left and down, right and down, blank spaces and the like. As shown in fig. 8, after the images of the written formulas are input into the trained formula recognition model, the LaTex formula corresponding to each written formula can be obtained.
The trained formula recognition model is used for recognizing a formula containing writing habits and irregular writing conditions and obtaining a LaTex formula. The trained formula recognition model can be applied to a formula recognition device 11 shown in fig. 1 to recognize a written formula.
In some embodiments, as shown in FIG. 9, the process of identifying a user written formula using the formula identification device 11 includes steps 910 through 960.
Step 910, a third image is obtained. The third image includes a written formula to be recognized.
In some embodiments, acquiring the trace points of the user-written formula is further included before acquiring the third image. The user writes the formula to be recognized in the writing area of the electronic device 12, and the electronic device 12 collects the track points of the written formula to be recognized. The acquisition of the trajectory points can, for example, comprise the acquisition of the coordinates of the trajectory points. Illustratively, the track points may be split in strokes by recording whether the pen strokes leave the writing area of the electronic device 12.
And generating a third image according to the acquired track points of the formula to be recognized. Because the user may have differences in the font size, font style, etc. used during the process of writing the formula. In order to eliminate the difference and improve the performance of the formula identifying device 11, in some embodiments, before formula identification, the trajectory of the written formula is uniformly converted into a third image with the same size and size by means of mapping. For example, the original track points collected to the formula to be recognized are uniformly mapped into a 200 × 800 image, and then the track points in each stroke are connected by using pixels with the line width of 2. Thus, the image can be uniformly converted into a third image with the height of 200 pixels and the width of 800 pixels, and the line width of the formula in the third image is uniformly 2 pixels. The pixel size of the mapped height, width and line width may be set by a user, which is not limited by the present disclosure.
After the third image is obtained, the third image may be input into the trained formula recognition model, so as to obtain a LaTex formula corresponding to the writing formula to be recognized. Inputting the third image into the trained formula recognition model to obtain a LaTex formula corresponding to the written formula to be recognized may include steps 920 to 960.
Step 920, encoding the third image.
The third image generated in step 910 is input to an encoder for encoding processing. The encoder may be, for example, a CNN network of the above-described DenseNet 121 structure. Exemplarily, the DenseNet 121 may down-sample the third image by 16 times and output the down-sampled third image. The downsampling process of the third image by the DenseNet 121 structure includes generating the third image as a feature map of width x height channel number, that is, as a feature map of width x height channel number. For example, the third image of 200 × 800 was down-sampled by 16 times by DenseNet 121 to output a 12 × 50 × 684 signature, i.e., 684 12 × 50 signature of the encoder output.
Step 930, initializing the input information of the decoder.
In some embodiments, when the formula identification model is decoded for the first time, it is necessary to perform random initialization on the input information of the decoder, that is, to perform random assignment on the input information of the decoder.
As shown in fig. 6, the input information of the decoder includes input information decoder _ highest vector 611 and decoder _ input vector 612 of the first GRU module 610, and input information attribute _ sum vector 621 and decoder _ attribute vector 622 of the attention module 620. Illustratively, the random initialization includes configuring the individual input information as an all-0 or all-1 vector. For example, decoder _ input vector 612 is initialized to 36, corresponding to the start symbol in table 1, and then mapped to 256-dimensional vector by embedding; initialize decoder _ hidden vector 611 to 256 full 0 vectors; the orientation _ sum vector 621 and decoder _ orientation vector 622 are initialized to the full 0 tensor with dimensions [1, 12,50], respectively.
Step 940, decoding processing.
The output information of the encoder and the input information of the decoder after random initialization are input to the decoder for decoding processing. The decoder performs cyclic decoding on the feature map output in the encoding stage of step 930, and outputs one character at a time. The characters output by the decoder include the end characters as in table 1.
In step 950, it is determined whether the character output by the decoder is an end symbol or whether the number of cycles of the decoder is equal to the maximum number of cycles (max _ len).
If the character output by the decoder is not an end character or the number of cycles of the decoder does not reach the maximum number of cycles, then execution continues at step 940. If the character is an end character or the cycle times reach the maximum cycle times, the recognition is finished, and a recognition result is obtained, wherein the recognition result is a group of character strings.
And step 960, analyzing the identification result to obtain a LaTex formula.
And after the identification is finished, outputting an identification result, wherein the identification result is the character string output by the decoder. Illustratively, the string may be converted to a LaTex formula by a parser.
In some embodiments, the process of recognizing the user-written formula further includes converting the LaTex formula generated by the formula recognizing apparatus 11 into a formula in a common format for display. For example, it may be implemented using the formula display 122 shown in FIG. 1. As shown in fig. 10, the formula display 122 can convert the LaTex format text output by the formula recognition apparatus into a formula in a common format.
In some embodiments, the formula identification method in the above embodiments may be applied to scenes such as scientific paper writing. LaTex tools are often required to be used for automatic typesetting in the process of writing scientific and technical papers. Especially for the science and engineering type papers, the papers usually contain many formulas, and it takes much time to convert the formulas into the LaTex format and edit the formulas. The formula identification method provided by the disclosure can also realize automatic generation of LaTex format formulas.
Therefore, when a user needs to express the formula in the LaTex format in the writing process of the paper, the LaTex expression method does not need to be memorized and manually edited, only the handwritten formula needs to be input on the text editor, and the text editor finishes the output of the LaTex format by adopting the formula identification method in the embodiment, so that the formula editing time of the user is saved, and the working efficiency is improved.
Fig. 11 is a schematic view of a scenario generated by the LaTex formula. As shown in fig. 11, in this scenario, the formula identification process is:
in response to a user clicking a "start" button in the text editor, a handwriting recognition function is initiated. The user writes the formula to be edited once in the handwriting area. Illustratively, a "handwriting area" supports both user writing with a mouse and PC-linked writing with a tablet. The "handwriting area" also supports the function of uploading pictures, that is, the user can recognize by uploading pictures with formulas.
In response to the user clicking the "identify" button in the text editor, the text editor identifies the formula written by the user using the formula identification method of the above embodiment, and displays the LaTex formula corresponding to the written formula in the "display area". Exemplarily, the identification result of the LaTex formula supports copying, and a user can directly copy and paste the identified and output LaTex formula to a region needing to be subjected to text editing.
As shown in fig. 12, some embodiments of the present disclosure provide a formula identifying apparatus 11, where the formula identifying apparatus 11 can implement the formula identifying method provided in any of the above embodiments. As shown in fig. 12, the formula identifying apparatus 11 includes a transceiver 113 and a processor 114.
In some embodiments, the processor 114 is configured to:
a training sample set is obtained through the transceiver 113, where the training sample set includes K first images and K formulas in target formats, the first images include first formulas, the K first formulas are in one-to-one correspondence with the formulas in the K target formats, and K is an integer greater than or equal to 1.
And processing the M first images by adopting a data enhancement algorithm to obtain M second images, wherein M is more than or equal to 1 and less than or equal to K, and M is an integer.
And performing model training on the formula recognition model according to the M second images, the K-M first images and the K target format formulas to obtain a trained formula recognition model, wherein the trained formula recognition model is used for recognizing the written formula and obtaining the target format formulas.
In some embodiments, the data enhancement includes a dilation algorithm and/or an erosion algorithm.
In some embodiments, the data enhancement further comprises an identifier recognition algorithm. The processor 114 is configured to process the first image using an identifier recognition algorithm, including: and identifying whether the first formula in the first image contains at least one preset identifier, wherein each preset identifier in the at least one preset identifier corresponds to a designated operation, and the designated operation comprises an adding operation and/or a deleting operation. If the first formula in the first image contains the preset identifier, executing the designated operation corresponding to the preset identifier on the first formula in the first image.
In some embodiments, the data enhancement further comprises a rotation correction algorithm, and the processor 114 is configured to process the first image using the rotation correction algorithm, including: and rotating the first image by a preset angle.
In some embodiments, the processor 114 is further configured to: a third image is acquired by the transceiver 113, the third image comprising the formula to be identified.
And inputting the third image into the trained formula recognition model to obtain a formula of a target format corresponding to the formula to be recognized.
In some embodiments, the trained formula recognition model comprises an encoder and a decoder; the encoder adopts a CNN network with a DenseNet structure; the decoder includes a first GRU module, a second GRU module, and an ATTENTION module.
In some embodiments, the processor 114 is further configured to: and acquiring the track points of the formula to be identified through the transceiver 113. And generating the third image according to the track points of the formula to be recognized.
Illustratively, the processor 114 is a Central Processing Unit (CPU), and may be an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present disclosure, such as: one or more microprocessors (digital signal processors, DSPs), or one or more Field Programmable Gate Arrays (FPGAs).
The transceiver 113 is used for communication with other communication devices, for example, the electronic device 12. Of course, the transceiver 113 may also be used for communicating with a communication network, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc. The transceiver 113 may include a receiving unit implementing a receiving function and a transmitting unit implementing a transmitting function.
In some embodiments, the formula recognition apparatus 11 may further include a memory 115, where the memory 115 is used for storing program codes and data corresponding to the formula recognition apparatus 11 executing any one of the formula recognition methods provided above. The memory 115 may be a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM), or the like.
Some embodiments of the present disclosure provide a computer-readable storage medium (e.g., a non-transitory computer-readable storage medium) having computer program instructions stored therein, which, when executed on a computer (e.g., a formula recognition apparatus), cause the computer to perform a formula recognition method as described in any one of the above embodiments.
By way of example, such computer-readable storage media may include, but are not limited to: magnetic storage devices (e.g., hard Disk, floppy Disk, magnetic tape, etc.), optical disks (e.g., CD (Compact Disk), DVD (Digital Versatile Disk), etc.), smart cards, and flash Memory devices (e.g., EPROM (Erasable Programmable Read-Only Memory), card, stick, key drive, etc.). Various computer-readable storage media described in this disclosure can represent one or more devices and/or other machine-readable storage media for storing information. The term "machine-readable storage medium" can include, without being limited to, wireless channels and various other media capable of storing, containing, and/or carrying instruction(s) and/or data.
Some embodiments of the present disclosure also provide a computer program product, for example, stored on a non-transitory computer-readable storage medium. The computer program product includes computer program instructions which, when executed on a computer (e.g. a formula recognition apparatus), cause the computer to perform the formula recognition method as described in the embodiments above.
Some embodiments of the present disclosure also provide a computer program. When the computer program is executed on a computer (e.g., a formula recognition apparatus), the computer program causes the computer to execute the formula recognition method as described in the above embodiments.
The beneficial effects of the above computer-readable storage medium, the computer program product, and the computer program are the same as those of the formula identification method described in some embodiments above, and are not described herein again.
Some embodiments of the present disclosure also provide an electronic device, including: a processor and a memory. The memory is coupled with the processor. The memory is for storing computer program code; the computer program code comprises computer program instructions which, when executed by a processor, cause the electronic device to perform the formula identification method described in the above embodiments.
The above description is only for the specific implementation method of the present disclosure, but the protection scope of the present disclosure is not limited thereto, and any person skilled in the art can appreciate that changes or substitutions within the technical scope of the present disclosure should be covered by the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. A method of formula identification, comprising:
acquiring a training sample set, wherein the training sample set comprises K first images and K target format formulas, the first images comprise first formulas, the K first formulas are in one-to-one correspondence with the K target format formulas, and K is an integer greater than or equal to 1;
processing the M first images by adopting a data enhancement algorithm to obtain M second images, wherein M is more than or equal to 1 and less than or equal to K, and M is an integer;
and performing model training on the formula recognition model according to the M second images, the K-M first images and the K target format formulas to obtain a trained formula recognition model, wherein the trained formula recognition model is used for recognizing a written formula and obtaining a target format formula.
2. The method of claim 1, wherein the data enhancement algorithm comprises a dilation algorithm and/or a erosion algorithm.
3. The method of claim 1, wherein the data enhancement algorithm further comprises an identifier recognition algorithm with which the first image is processed, comprising:
identifying whether the first formula in the first image contains at least one preset identifier, wherein each preset identifier in the at least one preset identifier corresponds to a designated operation, and the designated operation comprises an adding operation and/or a deleting operation;
if the first formula in the first image comprises the preset identifier, executing a specified operation corresponding to the preset identifier on the first formula in the first image.
4. The method of claim 1, wherein the data enhancement algorithm further comprises a rotation correction algorithm, and wherein processing the first image using the rotation correction algorithm comprises:
and rotating the first image by a preset angle.
5. The method according to any one of claims 1-4, further comprising:
acquiring a third image, wherein the third image comprises a formula to be identified;
and inputting the third image into the trained formula recognition model to obtain a formula of a target format corresponding to the formula to be recognized.
6. The method of claim 1, wherein the trained formula recognition model comprises an encoder, a decoder; the encoder adopts a CNN network with a Densenet structure, and the decoder comprises a first GRU module, a second GRU module and an ATTENTION module.
7. The method of claim 5, wherein the acquiring a third image comprises:
acquiring track points of the formula to be identified;
and generating the third image according to the track points of the formula to be recognized.
8. An apparatus for formula recognition, comprising: the formula identification device comprises a transceiver and a processor;
the processor configured to:
acquiring a training sample set through the transceiver, wherein the training sample set comprises K first images and K target format formulas, the first images comprise first formulas, the K first formulas are in one-to-one correspondence with the K target format formulas, and K is an integer greater than or equal to 1;
processing the M first images by adopting a data enhancement algorithm to obtain M second images, wherein M is more than or equal to 1 and less than or equal to K, and M is an integer;
and performing model training on the formula recognition model according to the M second images, the K-M first images and the K target format formulas to obtain a trained formula recognition model, wherein the trained formula recognition model is used for recognizing a writing formula and obtaining a target format formula.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer program instructions which, when executed by the formula identification apparatus, implement the formula identification method of any one of claims 1-7.
10. An electronic device, comprising: a processor and a memory; the memory is coupled with the processor; the memory for storing computer program code; the computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the formula identification method of any one of claims 1-7.
CN202210713591.1A 2022-06-22 2022-06-22 Formula identification method and device Pending CN115205623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210713591.1A CN115205623A (en) 2022-06-22 2022-06-22 Formula identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210713591.1A CN115205623A (en) 2022-06-22 2022-06-22 Formula identification method and device

Publications (1)

Publication Number Publication Date
CN115205623A true CN115205623A (en) 2022-10-18

Family

ID=83575426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210713591.1A Pending CN115205623A (en) 2022-06-22 2022-06-22 Formula identification method and device

Country Status (1)

Country Link
CN (1) CN115205623A (en)

Similar Documents

Publication Publication Date Title
RU2661750C1 (en) Symbols recognition with the use of artificial intelligence
AU2006252025B2 (en) Recognition of parameterised shapes from document images
CN110084239B (en) Method for reducing overfitting of network training during off-line handwritten mathematical formula recognition
US20210295114A1 (en) Method and apparatus for extracting structured data from image, and device
CN111476284A (en) Image recognition model training method, image recognition model training device, image recognition method, image recognition device and electronic equipment
WO2021164481A1 (en) Neural network model-based automatic handwritten signature verification method and device
CN113254654B (en) Model training method, text recognition method, device, equipment and medium
AU2006252019A1 (en) Method and Apparatus for Dynamic Connector Analysis
CN111507330B (en) Problem recognition method and device, electronic equipment and storage medium
CN111191649A (en) Method and equipment for identifying bent multi-line text image
CN113762269B (en) Chinese character OCR recognition method, system and medium based on neural network
CN110610180A (en) Method, device and equipment for generating recognition set of wrongly-recognized words and storage medium
Dhanikonda et al. An efficient deep learning model with interrelated tagging prototype with segmentation for telugu optical character recognition
CN110414622B (en) Classifier training method and device based on semi-supervised learning
JP2023062150A (en) Character recognition model training, character recognition method, apparatus, equipment, and medium
CN111815748B (en) Animation processing method and device, storage medium and electronic equipment
WO2021146005A1 (en) Stroke attribute matrices
CN112749639A (en) Model training method and device, computer equipment and storage medium
CN112840622B (en) Pushing method and related product
CN113468906B (en) Graphic code extraction model construction method, identification device, equipment and medium
CN115205623A (en) Formula identification method and device
Chi et al. Handwriting Recognition Based on Resnet-18
CN115731561A (en) Hand-drawn diagram recognition using visual arrow relationship detection
Stahovich Pen-based interfaces for engineering and education
CN113128496B (en) Method, device and equipment for extracting structured data from image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination