CN116386062A - Formula identification method, device, electronic equipment and storage medium - Google Patents

Formula identification method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116386062A
CN116386062A CN202310362262.1A CN202310362262A CN116386062A CN 116386062 A CN116386062 A CN 116386062A CN 202310362262 A CN202310362262 A CN 202310362262A CN 116386062 A CN116386062 A CN 116386062A
Authority
CN
China
Prior art keywords
feature
formula
image
feature map
formula image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310362262.1A
Other languages
Chinese (zh)
Inventor
李泊翰
吴亮
吕鹏原
章成全
姚锟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310362262.1A priority Critical patent/CN116386062A/en
Publication of CN116386062A publication Critical patent/CN116386062A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning and the like, and particularly relates to a formula identification method, a device, electronic equipment and a storage medium. The specific implementation scheme is as follows: performing feature extraction on the handwriting formula image through a first feature extraction network to obtain a first feature map; converting the first feature map into a second feature map based on a mapping relation between the handwriting formula image features and the printing formula image features through an image conversion network, and generating a printing formula image based on the second feature map; performing feature extraction on the printing formula image through a second feature extraction network to obtain a third feature map; and carrying out recognition based on the third feature map to obtain a formula recognition result. The image conversion is used as a pre-task of formula identification, the training model learning maps the complex and changeable handwriting formulas into the printing formulas, and then the printing formulas are identified, so that the identification accuracy and the identification efficiency of the handwriting formulas are improved.

Description

Formula identification method, device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning and the like, and particularly relates to a formula identification method, a device, electronic equipment and a storage medium.
Background
Mathematical formulas widely exist in the fields of education, office, library files and the like, and are different from conventional texts, so that the technical problems of inaccurate identification and low identification efficiency exist in the identification of the manual mathematical formulas. The handwritten mathematical formula has a complex space structure and various writing styles, wherein the complex space structure is mainly caused by unique components, upper and lower marks, root numbers and other structures of the mathematical formula. The diversity of structures and writing styles results in existing handwritten mathematical recognition methods that have less than ideal recognition effects on handwritten mathematical formulas.
Disclosure of Invention
The present disclosure provides a formula identification method, apparatus, electronic device, storage medium, and computer program product.
According to a first aspect of the present disclosure, there is provided a formula identification method, including:
performing feature extraction on the handwriting formula image through a first feature extraction network to obtain a first feature map;
converting the first feature map into a second feature map based on a mapping relation between the handwriting formula image features and the printing formula image features through an image conversion network, and generating a printing formula image based on the second feature map;
performing feature extraction on the printing formula image through a second feature extraction network to obtain a third feature image;
and carrying out recognition based on the third feature map to obtain a formula recognition result.
According to a second aspect of the present disclosure, there is provided a formula recognition apparatus including:
the first feature extraction module is configured to perform feature extraction on the handwriting formula image through a first feature extraction network to obtain a first feature map;
an image conversion module configured to convert the first feature map into a second feature map based on a mapping relationship between handwriting formula image features and printing formula image features through an image conversion network, and generate a printing formula image based on the second feature map;
the second feature extraction module is configured to perform feature extraction on the printing formula image through a second feature extraction network to obtain a third feature map;
and the identification module is configured to identify based on the third feature map to obtain a formula identification result.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above claims.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of any one of the above-mentioned technical solutions.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the above technical solutions.
The disclosure provides a formula identification method, a device, electronic equipment, a storage medium and a computer program product, which can simultaneously improve the accuracy and the identification efficiency of formula identification.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of steps of a formula identification method in an embodiment of the present disclosure;
FIG. 2 is a diagram of a model architecture for a formula identification method in an embodiment of the present disclosure;
FIG. 3 is a functional block diagram of a formula identification device in an embodiment of the present disclosure;
fig. 4 is a block diagram of an electronic device for implementing the formula identification method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Current handwritten mathematical formula recognition methods can be divided into three categories: the first type is a traditional handwritten mathematical formula recognition method, which generally comprises three steps of symbol segmentation, symbol recognition and structural analysis, wherein in the step of symbol recognition, a hidden Markov algorithm, an elastic matching algorithm and a support vector machine algorithm are frequently used; the second type is a sequence handwriting mathematical formula recognition method based on deep learning, which regards handwriting mathematical formula recognition as a translation task from an image to a sequence, and sequentially decodes each symbol in the mathematical formula by using an attention mechanism, wherein the method has a more random moving track of attention in the image, so that inaccurate attention phenomenon easily occurs when decoding a more complex mathematical formula, and a symbol is repeatedly recognized or is not recognized; the third is a structured handwritten mathematical formula recognition method based on deep learning, which regards each mathematical formula as a grammar tree, decodes the grammar tree in a mode of adopting the father node to the child node of the tree, and then converts the tree decoding result into the mathematical formula according to a predefined rule.
Aiming at the technical problems of low formula identification accuracy and low identification efficiency in the prior art, the disclosure provides a formula identification method, as shown in fig. 1, comprising the following steps:
step S101, feature extraction is carried out on the handwriting formula image through a first feature extraction network to obtain a first feature map. Illustratively, as shown in fig. 2, the first feature extraction network may employ a DenseNet (Densely Connected Convolutional Networks, densely connected convolutional network), input the handwriting formula image 201 into the first feature extraction network 202, and perform feature extraction on the handwriting formula image 201 to obtain the first feature map 203.
Step S102, converting the first feature map into a second feature map based on the mapping relation between the handwriting formula image features and the printing formula image features through the image conversion network, and generating the printing formula image based on the second feature map. As shown in fig. 2, the image conversion network includes a feature conversion unit 205, and the feature conversion unit 205 map-converts the first feature map 203 into the second feature map 204, that is, into the feature of the printing formula, based on the mapping relationship between the feature of the handwriting formula and the feature of the printing formula, and generates a printing formula image 206 based on the feature. Wherein the feature conversion unit 205 may employ a convolution kernel of 1x 1.
And step S103, performing feature extraction on the printing formula image through a second feature extraction network to obtain a third feature map. As shown in fig. 2, the second feature extraction network 207 may have the same network structure as the first feature extraction network, or may perform feature extraction using a DenseNet network. The second feature extraction network 207 is configured to perform feature extraction on the print formula image 206 to obtain a third feature map 208.
In step S104, the formula recognition result 210 is obtained by performing recognition based on the third feature map 208. After extracting the third feature map corresponding to the print formula image 206 based on the second feature extraction network 207, the formula recognition result 210 may be obtained by decoding by the decoder 209.
Through the technical scheme, the image is converted to serve as a pre-task of formula recognition, the training model is used for learning to map the complex and changeable handwriting formulas into the printing formulas, and then the printing formula images are recognized, so that the recognition difficulty of the complex handwriting formulas is greatly reduced, and the recognition accuracy and recognition efficiency of the handwriting formulas are improved.
As an alternative embodiment, converting, by the image conversion network, the first feature map corresponding to the handwriting formula to the second feature map corresponding to the printing formula, and generating the printing formula image based on the second feature map includes: the first feature map is converted into a second feature map by a feature conversion unit of the image conversion network. As shown in fig. 2, a print formula image 206 is generated by a generator 211 based on the second feature map.
In particular, the handwriting mathematical formula style is changeable, the structure is complex, and a great challenge is brought to recognition. To solve this problem, the present embodiment performs conversion of a formula image using an architecture that generates an countermeasure network. Specifically, given a handwritten mathematical formula image as input, generator 211 converts the handwritten mathematical formula image into a fixed-font, spatially structured print formula image. The subsequent recognition model only needs to recognize the printing formula image, so that the recognition difficulty of the complex handwritten mathematical formula can be greatly reduced.
As an alternative embodiment, the first feature extraction network comprises a first convolution layer, a first max-pooling layer, a plurality of second convolution layers, and a plurality of third convolution layers; the step of extracting features of the handwritten formula image through the first feature extraction network to obtain a first feature map comprises the following steps:
and carrying out convolution processing on the handwritten formula image by adopting a first convolution layer. The first convolution layer may be a convolution layer with kernel 7 and stride 2, for example, the handwriting formula image is a three-channel image with width W and height H, and the feature images with width W/2 and height H/2 are output after convolution processing of the first convolution layer.
And downsampling the handwritten formula image by adopting a first maximum pooling layer. The first max pooling layer may downsamples the feature map with width and height of W/2 and H/2, respectively, using a max pooling layer of kernel 3 and stride 2.
And carrying out convolution processing on the handwritten formula image by adopting a first convolution module formed by connecting a plurality of second convolution layers and a plurality of third convolution layers in series to obtain a first feature map. The kernel of the second convolutional layer may be 1 and the kernel of the third convolutional layer may be 3. The feature map is further processed by using a combination of a plurality of convolution layers with kernel 1 and kernel 3 to obtain a final output first feature map, wherein the width and the height of the first feature map are W/16 and H/16 respectively, and the channel number is 684.
As an optional implementation manner, the second feature extraction network includes a fourth convolution layer, a second max-pooling layer, a plurality of fifth convolution layers, and a plurality of sixth convolution layers; the step of extracting the characteristics of the printing formula image through the second characteristic extraction network to obtain a third characteristic diagram comprises the following steps: carrying out convolution processing on the printing formula image by adopting a fourth convolution layer; downsampling the printing formula image by adopting a second maximum pooling layer; and carrying out convolution processing on the printing formula image by adopting a second convolution module formed by connecting a plurality of fifth convolution layers and a plurality of sixth convolution layers in series to obtain a third characteristic diagram.
Specifically, the fourth convolution layer may also be a convolution layer with kernel of 7 and stride of 2; the second maximum pooling layer can adopt a maximum pooling layer with kernel of 3 and stride of 2; the kernel of the fifth convolutional layer may be 1 and the kernel of the sixth convolutional layer may be 3. The second feature extraction network may have the same network structure as the first feature extraction network, and the feature extraction process is the same as that of the first feature extraction network, and only the object of feature extraction becomes a printing formula image, which will not be described in detail.
As an optional implementation manner, the identifying based on the third printing formula feature map to obtain the formula identification result includes:
and carrying out feature extraction on the third feature map through a seventh convolution layer to obtain the first intermediate layer feature. The seventh convolution layer may be a 1x1 convolution kernel by which the number of channels of the third feature map is changed, with the widths and heights being unchanged.
And performing position coding on the first intermediate layer characteristic, and adding the position coding and the first intermediate layer characteristic to obtain a second intermediate layer characteristic. After the position encoding is obtained, it is added to the first intermediate layer feature, which can enhance the distinguishing ability of the model for each position in the image.
And decoding the second middle layer features to sequentially obtain a plurality of symbols corresponding to the handwritten formula image as formula recognition results.
As an optional implementation manner, decoding the second intermediate layer feature to sequentially obtain a plurality of symbols corresponding to the handwritten formula image as a formula recognition result includes: extracting feature codes corresponding to the symbols from the symbols obtained by decoding in each step; inputting the feature codes into a gating cyclic neural network (Gated Recurrent Units, GRU), and obtaining the current hidden layer state through the gating cyclic neural network; and decoding the current hidden layer state to obtain the next symbol.
In this embodiment, in order to enhance the attention of the model to the symbol decoded in the last step and thus promote accurate positioning of the current symbol, when each step decodes the symbol, the symbol decoded in the last step is also required to be used, and the feature code is taken and input into the GRU network to obtain the current hidden layer state. The attention result of the current model can be calculated by combining the middle layer characteristics, the position codes and the current hidden layer state. The attention result of the model represents the local position of the image noticed by each decoding step, and the local feature is taken out and input into the fully connected layer to obtain the predicted symbol of the current decoding step. In short, in the process of decoding each symbol, the current hidden layer state of the model needs to be obtained by simultaneously utilizing the last decoded symbol, the current hidden layer state is also used as input information when the next symbol is decoded, and the decoder can more accurately predict the next symbol by combining with the current hidden layer state.
As an alternative embodiment, after generating the print formula image based on the second feature map, the method further includes: as shown in fig. 2, the conversion effect of the print formula image is judged by the discriminator 212, and the image conversion network is updated according to the conversion effect.
The combination of the generator 211 and the arbiter 212 is used in this embodiment to compose the generative model. In practice it has been found that if only the generator 211 is used to transform the image, the effect is not very ideal, since the model is required to understand the content of the image to some extent if the correct transformation of the formula image is required. Simple image conversion is only a low-semantic task at the image level, and cannot enable a model to have high-semantic-level understanding capability. The present embodiment combines two tasks of formula identification and image generation simultaneously, which has two benefits: 1) If the effect of the printed formula image generated by the generator is poor, then the decoding process of formula recognition has difficulty predicting the correct result, which in turn can facilitate the generator to generate a formula image that is as realistic as possible. 2) By sharing the first feature extraction network and the feature conversion unit, the two tasks of image generation and formula identification can jointly promote feature learning to obtain features with higher robustness, so that the effects of the two tasks of image generation and formula identification are improved.
The present disclosure also provides a formula identification apparatus 300, as shown in fig. 3, including:
the first feature extraction module 301 is configured to perform feature extraction on the handwriting formula image through a first feature extraction network to obtain a first feature map. Illustratively, as shown in fig. 2, the first feature extraction network may employ DenseNet, input the handwritten formula image 201 into the first feature extraction network 202, and perform feature extraction on the handwritten formula image 201 to obtain the first feature map 203.
The image conversion module 302 is configured to convert the first feature map into a second feature map based on a mapping relationship between the handwriting formula image features and the printing formula image features through the image conversion network, and generate a printing formula image based on the second feature map. As shown in fig. 2, the first feature map 203 may be mapped and converted into the second feature map 204, i.e., into the feature of the printing formula, based on the mapping relationship between the feature of the handwriting formula and the feature of the printing formula by the feature conversion unit 205, and the printing formula image 206 may be generated based on the feature.
The second feature extraction module 303 is configured to perform feature extraction on the printing formula image through the second feature extraction network to obtain a third feature map. As shown in fig. 2, the second feature extraction network 207 may have the same network structure as the first feature extraction network, or may perform feature extraction using a DenseNet network. The second feature extraction network 207 is configured to perform feature extraction on the print formula image 206 to obtain a third feature map 208.
The recognition module 304 is configured to recognize based on the third feature map to obtain a formula recognition result. After extracting the third feature map corresponding to the print formula image 206 based on the second feature extraction network 207, the formula recognition result 210 may be obtained by decoding by the decoder 209.
Through the technical scheme, the formula recognition device maps the complex and changeable handwriting formulas into the printing formulas through training model learning by taking image conversion as a pre-task of formula recognition, and then recognizes the images of the printing formulas, so that the recognition difficulty of the complex handwriting formulas is greatly reduced, and the recognition accuracy and recognition efficiency of the handwriting formulas are improved.
As an alternative embodiment, the image conversion module includes: a feature conversion unit configured to convert the first feature map into a second feature map; and a generator configured to generate a print formula image based on the second feature map. As shown in fig. 2, a print formula image 206 is generated by a generator 211 based on the second feature map.
In particular, the handwriting mathematical formula style is changeable, the structure is complex, and a great challenge is brought to recognition. To solve this problem, the present embodiment performs conversion of a formula image using an architecture that generates an countermeasure network. Specifically, given a handwritten mathematical formula image as input, generator 211 converts the handwritten mathematical formula image into a fixed-font, spatially structured print formula image. The subsequent recognition model only needs to recognize the printing formula image, so that the recognition difficulty of the complex handwritten mathematical formula can be greatly reduced.
As an alternative embodiment, the first feature extraction network includes: a first convolution layer configured to convolve a handwritten formula image; a first max pooling layer configured to downsample the handwritten formula image; the first convolution module comprises a plurality of second convolution layers and a plurality of third convolution layers which are connected in series and is configured to obtain a first characteristic diagram after convolution processing is carried out on the handwritten formula image. The first convolution layer may be a convolution layer with kernel of 7 and stride of 2, for example, the handwriting formula image is a three-channel image with width W and height H, and the feature images with width W/2 and height H/2 are output after convolution processing of the first convolution layer. And then adopting a maximum pooling layer with a kernel of 3 and a stride of 2 to downsample the characteristic graphs with the width and the height of W/2 and H/2 respectively. Finally, the characteristic diagram is further processed by using a combination of a plurality of convolution layers with kernel of 1 and kernel of 3 to obtain a first characteristic diagram of final output, wherein the width and the height of the first characteristic diagram are W/16 and H/16 respectively, and the channel number is 684.
As an alternative embodiment, the second feature extraction network comprises: a fourth convolution layer configured to convolve the printing formula image; a second max-pooling layer configured to downsample the print formula image; the second convolution module comprises a plurality of fifth convolution layers and a plurality of sixth convolution layers which are connected in series and is configured to obtain a third feature map after convolution processing is carried out on the printing formula image. The fourth convolution layer can also adopt a convolution layer with kernel of 7 and stride of 2; the second maximum pooling layer can adopt a maximum pooling layer with kernel of 3 and stride of 2; the kernel of the fifth convolutional layer may be 1 and the kernel of the sixth convolutional layer may be 3. The second feature extraction network may have the same network structure as the first feature extraction network, and the feature extraction process is the same as that of the first feature extraction network, and only the object of feature extraction becomes a printing formula image, which will not be described in detail.
As an alternative embodiment, the identification module includes:
and a seventh convolution layer configured to perform feature extraction on the third feature map to obtain the first intermediate layer feature. The seventh convolution layer may be a 1x1 convolution kernel by which the number of channels of the third feature map is changed, with the widths and heights being unchanged.
And a position encoding unit configured to position encode the first intermediate layer feature.
And a position adding unit configured to add the position code to the first intermediate layer feature to obtain a second intermediate layer feature. After the position encoding is obtained, it is added to the first intermediate layer feature, which can enhance the distinguishing ability of the model for each position in the image.
And the decoding unit is configured to decode the second middle layer characteristic to sequentially obtain a plurality of symbols corresponding to the handwritten formula image as a formula recognition result.
As an optional implementation manner, decoding the second intermediate layer feature to sequentially obtain a plurality of symbols corresponding to the handwritten formula image as a formula recognition result includes: extracting feature codes corresponding to the symbols from the symbols obtained by decoding in each step; inputting the feature codes into a GRU network, and obtaining the current hidden layer state through the GRU network; and decoding the current hidden layer state to obtain the next symbol.
In this embodiment, in order to enhance the attention of the model to the symbol decoded in the last step and thus promote accurate positioning of the current symbol, when each step decodes the symbol, the symbol decoded in the last step is also required to be used, and the feature code is taken and input into the GRU network to obtain the current hidden layer state. The attention result of the current model can be calculated by combining the middle layer characteristics, the position codes and the current hidden layer state. The attention result of the model represents the local position of the image noticed by each decoding step, and the local feature is taken out and input into the fully connected layer to obtain the predicted symbol of the current decoding step. In short, in the process of decoding each symbol, the current hidden layer state of the model needs to be obtained by simultaneously utilizing the last decoded symbol, the current hidden layer state is also used as input information when the next symbol is decoded, and the decoder can more accurately predict the next symbol by combining with the current hidden layer state.
As an alternative embodiment, the formula identification device further includes: and the discriminator is configured to judge the conversion effect of the printing formula image and update the image conversion network according to the conversion effect. The combination of the generator 211 and the arbiter 212 is used in this embodiment to compose the generative model. In practice it has been found that if only the generator 211 is used to transform the image, the effect is not very ideal, since the model is required to understand the content of the image to some extent if the correct transformation of the formula image is required. Simple image conversion is only a low-semantic task at the image level, and cannot enable a model to have high-semantic-level understanding capability. The present embodiment combines two tasks of formula identification and image generation simultaneously, which has two benefits: 1) If the effect of the printed formula image generated by the generator is poor, then the decoding process of formula recognition has difficulty predicting the correct result, which in turn can facilitate the generator to generate a formula image that is as realistic as possible. 2) By sharing the first feature extraction network and the feature conversion unit, the two tasks of image generation and formula identification can jointly promote feature learning to obtain features with higher robustness, so that the effects of the two tasks of image generation and formula identification are improved.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In RAM403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Various components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning objective function algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above, such as the formula recognition method. For example, in some embodiments, the formula identification method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM403 and executed by computing unit 401, one or more steps of the formula identification method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the formula identification method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (17)

1. A formula identification method, comprising:
performing feature extraction on the handwriting formula image through a first feature extraction network to obtain a first feature map;
converting the first feature map into a second feature map based on a mapping relation between the handwriting formula image features and the printing formula image features through an image conversion network, and generating a printing formula image based on the second feature map;
performing feature extraction on the printing formula image through a second feature extraction network to obtain a third feature image;
and carrying out recognition based on the third feature map to obtain a formula recognition result.
2. The method of claim 1, the converting, by the image conversion network, the first feature map to a second feature map based on a mapping relationship between handwritten formula image features and print formula image features, and generating a print formula image based on the second feature map comprising:
converting the first feature map into the second feature map by a feature conversion unit of the image conversion network;
the print formula image is generated by a generator based on the second feature map.
3. The method of claim 1, wherein the first feature extraction network comprises a first convolution layer, a first max-pooling layer, a plurality of second convolution layers, and a plurality of third convolution layers; the step of extracting the features of the handwritten formula image through the first feature extraction network to obtain a first feature map comprises the following steps:
carrying out convolution processing on the handwritten formula image by adopting the first convolution layer;
downsampling the handwritten formula image by adopting the maximum pooling layer;
and carrying out convolution processing on the handwriting formula image by adopting a first convolution module formed by connecting a plurality of second convolution layers and a plurality of third convolution layers in series to obtain the first feature map.
4. A method according to any of claims 1-3, wherein the second feature extraction network comprises a fourth convolution layer, a second max-pooling layer, a plurality of fifth convolution layers, and a plurality of sixth convolution layers; the step of extracting the features of the printing formula image through the second feature extraction network to obtain a third feature map includes:
carrying out convolution processing on the printing formula image by adopting the fourth convolution layer;
downsampling the print formula image with the second maximum pooling layer;
and carrying out convolution processing on the printing formula image by adopting a second convolution module formed by connecting a plurality of fifth convolution layers and a plurality of sixth convolution layers in series to obtain the third characteristic diagram.
5. The method of claim 1, wherein the identifying based on the third printed formula feature map results in a formula identification result comprising:
performing feature extraction on the third feature map through a seventh convolution layer to obtain a first intermediate layer feature;
performing position coding on the first interlayer feature;
adding the position code and the first intermediate layer feature to obtain a second intermediate layer feature;
and decoding the second middle layer characteristic to sequentially obtain a plurality of symbols corresponding to the handwritten formula image as the formula recognition result.
6. The method of claim 5, wherein the decoding the second intermediate layer feature sequentially obtains a plurality of symbols corresponding to the handwritten formula image as the formula recognition result comprises:
extracting feature codes corresponding to the symbols from the symbols obtained by decoding in each step;
inputting the feature codes into a gating cyclic neural network, and obtaining the current hidden layer state through the gating cyclic neural network;
and decoding the current hidden layer state to obtain the next symbol.
7. The method of any of claims 1-6, after generating a print formula image based on the second feature map, further comprising:
and judging the conversion effect of the printing formula image through a discriminator, and updating the image conversion network according to the conversion effect.
8. A formula identification device comprising:
the first feature extraction module is configured to perform feature extraction on the handwriting formula image through a first feature extraction network to obtain a first feature map;
an image conversion module configured to convert the first feature map into a second feature map based on a mapping relationship between handwriting formula image features and printing formula image features through an image conversion network, and generate a printing formula image based on the second feature map;
the second feature extraction module is configured to perform feature extraction on the printing formula image through a second feature extraction network to obtain a third feature map;
and the identification module is configured to identify based on the third feature map to obtain a formula identification result.
9. The apparatus of claim 8, the image conversion module comprising:
a feature conversion unit configured to convert the first feature map into the second feature map;
a generator configured to generate the print formula image based on the second feature map.
10. The apparatus of claim 8, the first feature extraction network comprising:
a first convolution layer configured to convolve the handwritten formula image;
a first max pooling layer configured to downsample the handwritten formula image;
the first convolution module comprises a plurality of second convolution layers and a plurality of third convolution layers which are connected in series and is configured to obtain the first feature map after the convolution processing is carried out on the handwriting formula image.
11. The apparatus according to any of claims 8-10, the second feature extraction network comprising:
a fourth convolution layer configured to convolve the printing formula image;
a second max-pooling layer configured to downsample the print formula image;
the second convolution module comprises a plurality of fifth convolution layers and a plurality of sixth convolution layers which are connected in series and is configured to obtain the third feature map after the convolution processing is performed on the printing formula image.
12. The apparatus of claim 8, the identification module comprising:
a seventh convolution layer configured to perform feature extraction on the third feature map to obtain a first intermediate layer feature;
a position encoding unit configured to position encode the first intermediate layer feature;
a position adding unit configured to add the position code to the first intermediate layer feature to obtain a second intermediate layer feature;
and the decoding unit is configured to decode the second middle layer characteristic to sequentially obtain a plurality of symbols corresponding to the handwriting formula image as the formula recognition result.
13. The apparatus of claim 12, the decoding unit decoding the second intermediate layer feature to sequentially obtain a plurality of symbols corresponding to the handwritten formula image as the formula recognition result includes:
extracting feature codes corresponding to the symbols from the symbols obtained by decoding in each step;
inputting the feature codes into a gating cyclic neural network, and obtaining the current hidden layer state through the gating cyclic neural network;
and decoding the current hidden layer state to obtain the next symbol.
14. The apparatus of any of claims 8-13, further comprising:
and the discriminator is configured to judge the conversion effect of the printing formula image and update the image conversion network according to the conversion effect.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.
CN202310362262.1A 2023-04-06 2023-04-06 Formula identification method, device, electronic equipment and storage medium Pending CN116386062A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310362262.1A CN116386062A (en) 2023-04-06 2023-04-06 Formula identification method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310362262.1A CN116386062A (en) 2023-04-06 2023-04-06 Formula identification method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116386062A true CN116386062A (en) 2023-07-04

Family

ID=86976396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310362262.1A Pending CN116386062A (en) 2023-04-06 2023-04-06 Formula identification method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116386062A (en)

Similar Documents

Publication Publication Date Title
CN113657390B (en) Training method of text detection model and text detection method, device and equipment
CN113129870B (en) Training method, device, equipment and storage medium of speech recognition model
CN112926306B (en) Text error correction method, device, equipment and storage medium
CN113642583B (en) Deep learning model training method for text detection and text detection method
CN113901907A (en) Image-text matching model training method, image-text matching method and device
CN112559885B (en) Training model determining method and device for map interest points and electronic equipment
CN113657274B (en) Table generation method and device, electronic equipment and storage medium
CN113792526B (en) Training method of character generation model, character generation method, device, equipment and medium
CN113204615A (en) Entity extraction method, device, equipment and storage medium
CN112966744A (en) Model training method, image processing method, device and electronic equipment
CN114511743B (en) Detection model training, target detection method, device, equipment, medium and product
CN114022887B (en) Text recognition model training and text recognition method and device, and electronic equipment
CN113095421A (en) Method for generating font database, and training method and device of neural network model
CN112580666A (en) Image feature extraction method, training method, device, electronic equipment and medium
CN115565177A (en) Character recognition model training method, character recognition device, character recognition equipment and medium
CN114495101A (en) Text detection method, and training method and device of text detection network
CN117746125A (en) Training method and device of image processing model and electronic equipment
CN113361523A (en) Text determination method and device, electronic equipment and computer readable storage medium
CN114973333A (en) Human interaction detection method, human interaction detection device, human interaction detection equipment and storage medium
CN116386062A (en) Formula identification method, device, electronic equipment and storage medium
JP2023039891A (en) Training method for character generation model, character generating method, device, and apparatus
CN114463361A (en) Network model training method, device, equipment, medium and program product
CN113947195A (en) Model determination method and device, electronic equipment and memory
CN113657353B (en) Formula identification method and device, electronic equipment and storage medium
CN114840656B (en) Visual question-answering method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination