CN116206314A - Model training method, formula identification method, device, medium and equipment - Google Patents

Model training method, formula identification method, device, medium and equipment Download PDF

Info

Publication number
CN116206314A
CN116206314A CN202310216179.3A CN202310216179A CN116206314A CN 116206314 A CN116206314 A CN 116206314A CN 202310216179 A CN202310216179 A CN 202310216179A CN 116206314 A CN116206314 A CN 116206314A
Authority
CN
China
Prior art keywords
formula
training
sample
encoder
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310216179.3A
Other languages
Chinese (zh)
Inventor
刘腾龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Oriental Education Technology Group Co ltd
Original Assignee
New Oriental Education Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New Oriental Education Technology Group Co ltd filed Critical New Oriental Education Technology Group Co ltd
Priority to CN202310216179.3A priority Critical patent/CN116206314A/en
Publication of CN116206314A publication Critical patent/CN116206314A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to the technical field of computers, and relates to a model training method, a formula identification method, a device, a medium and equipment, so as to improve the accuracy of a formula identification model obtained by training. The formula identification model comprises an encoding layer and a decoding layer, and the model training method comprises the following steps: acquiring a sample image, wherein the sample image comprises a sample formula, and the sample formula is marked with a sample formula label; inputting the sample image into a coding layer provided with a first training parameter to obtain sample formula image characteristics output by the coding layer; inputting the image characteristics of the sample formula into a decoding layer provided with second training parameters to obtain a sample formula identification result output by the decoding layer; calculating a target loss value according to the sample formula identification result and the sample formula label; and adjusting the first training parameter and the second training parameter of the formula identification model according to the target loss value.

Description

Model training method, formula identification method, device, medium and equipment
Technical Field
The disclosure relates to the field of computer technology, and in particular relates to a model training method, a formula identification method, a device, a medium and equipment.
Background
In intelligent education, intelligent technology is generally used to assist in teaching, for example, formula recognition technology is used to assist in developing an AI learning machine, and intelligent recording and playback, intelligent correction, and the like are performed. Because of the formulas of mathematics, physics, chemistry and other subjects, such as a matrix, a differential formula, a root opening number, a chemical reaction formula and the like in mathematics, complex structural features generally exist, the labeling cost is high, and the learning of an intelligent agent is not facilitated, so that the formula identification technology is one of the difficulties in the technical field of intelligent technology all the time.
The related art may employ a conventional method and a deep learning method for formula recognition. The traditional method is complex in preprocessing operation, high in labeling cost and weak in model generalization capability. The deep learning method generally adopts algorithms such as CRNN+CTC and Seq2Seq to identify formulas, and the algorithms cannot identify formulas with complex structures and have long-term dependence. As can be seen, the related art has low accuracy in formula recognition.
Disclosure of Invention
The disclosure aims to provide a model training method, a formula identification method, a device, a medium and equipment, so as to solve the problems in the related art.
To achieve the above object, according to a first aspect of embodiments of the present disclosure, there is provided a model training method for training a formula recognition model including an encoding layer and a decoding layer, the method comprising:
acquiring a sample image, wherein the sample image comprises a sample formula, and the sample formula is marked with a sample formula label;
inputting the sample image into the coding layer provided with a first training parameter to obtain sample formula image characteristics output by the coding layer, wherein an initial value of the first training parameter is obtained by pre-training from an encoder based on a mask;
inputting the sample formula image characteristics into the decoding layer provided with second training parameters to obtain a sample formula identification result output by the decoding layer, wherein the initial value of the second training parameters is obtained based on the pre-training of a bidirectional self-encoder;
calculating a target loss value according to the sample formula identification result and the sample formula label;
and adjusting the first training parameter and the second training parameter of the formula identification model according to the target loss value.
Optionally, the mask self-encoder includes a visual deformation encoder and a deformation decoder, and the pre-training process of the mask self-encoder includes:
Acquiring a first sample image, the first sample image comprising a first sample formula;
carrying out random masking processing on the first sample image to obtain a masked sub-image block and an unmasked sub-image block;
inputting the unmasked sub-blocks into the visual deformation encoder to obtain initial vector features of the unmasked sub-blocks;
inputting mask vector features and the initial vector features of the mask sub-blocks into the deformation decoder to obtain a prediction sample image;
a first loss value is calculated from the predicted sample image and the first sample image, and a first pre-training parameter of the mask self-encoder is adjusted according to the first loss value.
Optionally, the first pre-training parameters include coding training parameters of the visual deformation encoder, and the mask after pre-training characterizes initial values of the first training parameters of the coding layer from values of the coding training parameters of the visual deformation encoder in the encoder.
Optionally, the pre-training process of the bidirectional self-encoder includes:
acquiring a sample formula sequence;
carrying out random mask processing on the sample formula sequence to obtain a random mask sample formula sequence;
Determining a character vector, a segmentation vector and a position vector of the random mask sample formula sequence, and calculating to obtain an addition vector of the character vector, the segmentation vector and the position vector;
inputting the addition vector into the bidirectional self-encoder to obtain a predicted sample formula sequence;
and calculating a second loss value according to the predicted sample formula sequence and the sample formula sequence, and adjusting a second pre-training parameter of the bidirectional self-encoder according to the second loss value.
Optionally, the method further comprises:
determining a target network layer according to the network architecture of the deformation decoder and the network architecture of the bidirectional self-encoder, wherein the network architecture of the deformation decoder comprises the target network layer, and the network architecture of the bidirectional self-encoder does not comprise the target network layer;
randomly initializing training parameters of the target network layer to obtain initialization parameters;
and taking the initialization parameter and a second pre-training parameter of the bi-directional self-encoder with the pre-training as initial values of the second training parameter of the decoding layer.
According to a second aspect of embodiments of the present disclosure, there is provided a formula identification method, the method comprising:
Acquiring an image to be identified, wherein the image to be identified comprises a formula to be identified;
inputting the image to be identified into a formula identification model to obtain a formula identification result output by the formula identification model, wherein the formula identification model is obtained by training by the model training method according to any one of the first aspects.
According to a third aspect of embodiments of the present disclosure, there is provided a model training apparatus for training a formula recognition model including an encoding layer and a decoding layer, the model training apparatus comprising:
the first acquisition module is used for acquiring a sample image, wherein the sample image comprises a sample formula, and the sample formula is marked with a sample formula label;
the first input module is used for inputting the sample image into the coding layer provided with a first training parameter to obtain sample formula image characteristics output by the coding layer, wherein an initial value of the first training parameter is obtained by pre-training from an encoder based on a mask;
the second input module is used for inputting the image characteristics of the sample formula into the decoding layer provided with second training parameters to obtain a sample formula identification result output by the decoding layer, wherein the initial value of the second training parameters is obtained based on the pre-training of the bidirectional self-encoder;
The calculation module is used for calculating a target loss value according to the sample formula identification result and the sample formula label;
and the adjusting module is used for adjusting the first training parameter and the second training parameter of the formula identification model according to the target loss value.
According to a fourth aspect of embodiments of the present disclosure, there is provided a formula identification apparatus including:
the second acquisition module is used for acquiring an image to be identified, wherein the image to be identified comprises a formula to be identified;
and the third input module is used for inputting the image to be identified into a formula identification model to obtain a formula identification result output by the formula identification model, wherein the formula identification model is obtained by training by the model training method in any one of the first aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any one of the first or second aspects described above.
According to a sixth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:
a memory having a computer program stored thereon;
A processor for executing the computer program in the memory to implement the steps of the method of any one of the first or second aspects above.
Through the technical scheme, the coding layer and the decoding layer of the formula identification model are trained. In the training process, the initial value of the first training parameter of the coding layer can be obtained by pre-training the mask self-encoder, and the initial value of the second training parameter of the decoding layer can be obtained by pre-training the bidirectional self-encoder. Therefore, priori knowledge can be added to the formula recognition model through the initial value of the first training parameter and the initial value of the second training parameter, so that the formula recognition model can perform supervised learning on the basis of preliminary formula recognition capability. Because the formula recognition model has the preliminary formula recognition capability, the first training parameters and the second training parameters of the model can be finely adjusted based on a small amount of marking data, so that the marking cost is reduced, and the accuracy of the formula recognition model obtained through training is improved.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:
FIG. 1 is a flow chart of a model training method, as shown in an exemplary embodiment of the present disclosure.
FIG. 2 is a schematic diagram of a training formula recognition model, as shown in an exemplary embodiment of the present disclosure.
FIG. 3 is a flow chart of a formula identification method shown in an exemplary embodiment of the present disclosure.
FIG. 4 is a block diagram of a model training apparatus, as shown in an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram of a formula identification device according to an exemplary embodiment of the present disclosure.
Fig. 6 is a block diagram of an electronic device, as shown in an exemplary embodiment of the present disclosure.
Detailed Description
Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the disclosure, are not intended to limit the disclosure.
The related art may employ a conventional method and a deep learning method for formula recognition.
The traditional method generally obtains Lataz (Latex) representation corresponding to a formula image according to related position information and semantic information of characters in the formula image, wherein Lataz is a typesetting system based on TEX and can be used for generating a formula. However, the conventional method relies on complex preprocessing operation on the formula image, has high requirements on quality, precision and granularity of the labeling data, causes high character-level labeling cost, and cannot guarantee generalization capability of a model and robustness of formula identification.
The deep learning method generally adopts algorithms such as CRNN+CTC and Seq2Seq to perform formula recognition. The CRNN+CTC method has higher accuracy in identifying formulas of left and right structural features, but has poorer identification effect on complex structural formulas such as root numbers of surrounding structures, scores of upper and lower structures and the like. The sequence 2Seq adopts a convolutional neural network to extract formula image characteristics in the encoding stage, and adopts sequence identification structures such as RNN, LSTM and the like to extract semantic characteristics of the formula in the decoding stage so as to improve the identification effect on the complex structural characteristic formula. However, after many stages of computation, the nodes of the RNN, LSTM, etc. neural networks may cover the features of the previous longer time slices, that is, the ability to learn to connect the remote information is lost, and there is a long-term dependence problem, so the formula recognition effect on the complex structural features is not good enough.
In view of this, the present disclosure provides a model training method, a formula recognition method, a device, a medium, and a device, which train an encoding layer and a decoding layer of a formula recognition model. In the training process, the initial value of the first training parameter of the coding layer can be obtained by pre-training the mask self-encoder, and the initial value of the second training parameter of the decoding layer can be obtained by pre-training the bidirectional self-encoder. Therefore, priori knowledge can be added to the formula recognition model through the initial value of the first training parameter and the initial value of the second training parameter, so that the formula recognition model can perform supervised learning on the basis of preliminary formula recognition capability. Because the formula recognition model has the preliminary formula recognition capability, the first training parameters and the second training parameters of the model can be finely adjusted based on a small amount of marking data, so that the marking cost is reduced, and the accuracy of the formula recognition model obtained through training is improved.
The following describes the embodiments of the present disclosure in detail.
FIG. 1 is a flow chart of a model training method, as shown in FIG. 1, for training a formula identification model including an encoding layer and a decoding layer, according to an exemplary embodiment of the present disclosure, the model training method comprising:
s101, acquiring a sample image, wherein the sample image comprises a sample formula, and the sample formula is marked with a sample formula label.
The sample formula label may be a Latex sequence representation corresponding to the sample formula.
Figure BDA0004115840740000071
For example, the Latex sequence representation corresponding to the above formula may be:
H_{L C}=\int_{0}^{l}\mathcal{H}_{L C}。
the specific meaning of H and LC may be determined according to practical situations, which is not specifically limited in the present disclosure.
S102, inputting the sample image into a coding layer provided with a first training parameter to obtain sample formula image characteristics output by the coding layer, wherein an initial value of the first training parameter is obtained by pre-training from an encoder based on a mask.
Note that the mask self-encoder may refer to MAE (Masked Autoencoders). The MAE model may learn the ability to recognize an image by destroying the original signal that was input and reconstructing the original signal. The initial value of the first training parameter can be obtained by pre-training the MAE, and on the basis, priori knowledge can be added to the formula recognition model through the initial value of the first training parameter, so that the formula recognition model has the capability of preliminarily extracting image features.
Illustratively, the sample image is input to the coding layer provided with the first training parameter, and the sample formula image feature output by the coding layer can be obtained.
S103, inputting the image features of the sample formula into a decoding layer provided with a second training parameter to obtain a sample formula identification result output by the decoding layer, wherein the initial value of the second training parameter is obtained based on the pre-training of the bidirectional self-encoder.
Note that the bi-directional self-encoder may refer to BERT (Bidirectional Encoder Representations from Transformers). The BERT model may learn the text recognition capabilities by running a self-supervised learning method on the basis of massive corpora. The initial value of the second training parameter can be obtained by pre-training the BERT, and on the basis, priori knowledge can be added to the formula recognition model through the initial value of the second training parameter, so that the formula recognition model has the capability of preliminarily extracting semantic features.
Illustratively, the sample formula image features are input to a decoding layer provided with the second training parameters, and a sample formula identification result output by the decoding layer can be obtained. The sample formula identification result may be a Latex sequence representation.
In addition, when the mask self-encoder and the bidirectional self-encoder are pre-trained, large-scale non-labeling data can be adopted to perform non-supervision learning/self-supervision learning, so that the pre-trained model has preliminary recognition capability, for example, the bidirectional self-encoder has preliminary text recognition capability. On the basis, the coding layer and the decoding layer of the formula recognition model can be initialized according to the mask self-encoder and the bi-directional self-encoder which are finished through pre-training, namely initial values of training parameters of the formula recognition model are set, wherein the training parameters comprise a first training parameter and a second training parameter, and priori knowledge is added to the formula recognition model by taking the first training parameter and the second training parameter as the formula recognition model, so that the formula recognition model has preliminary formula recognition capability.
S104, calculating a target loss value according to the sample formula identification result and the sample formula label.
S105, adjusting the first training parameters and the second training parameters of the formula identification model according to the target loss value.
Because the formula recognition model has the capability of primarily extracting image features and extracting semantic features, namely the formula recognition capability, the model can be further trained based on a small amount of labeling data so as to fine tune the first training parameters and the second training parameters, wherein the labeling data can be sample images labeled with sample formula labels. For example, the target loss value may be calculated based on the sample formula recognition result and the sample formula label, and the first training parameter and the second training parameter of the formula recognition model may be adjusted based on the target loss value. The loss function for calculating the target loss value may be a loss function of a transducer model, which is not described herein.
In addition, it should be noted that, in the process of fine tuning the training parameters of the formula recognition model through the small-scale data, the coding layer of the formula recognition model may be used to extract image features, and then the image features may be converted into sequence features, so that the decoding layer extracts semantic features from the sequence features, and obtains the sample formula recognition result according to the extracted semantic features.
Through the technical scheme, the coding layer and the decoding layer of the formula identification model are trained. In the training process, the initial value of the first training parameter of the coding layer can be obtained by pre-training the mask self-encoder, and the initial value of the second training parameter of the decoding layer can be obtained by pre-training the bidirectional self-encoder. Therefore, priori knowledge can be added to the formula recognition model through the initial value of the first training parameter and the initial value of the second training parameter, so that the formula recognition model can perform supervised learning on the basis of preliminary formula recognition capability. Because the formula recognition model has the preliminary formula recognition capability, the first training parameters and the second training parameters of the model can be finely adjusted based on a small amount of marking data, so that the marking cost is reduced, and the accuracy of the formula recognition model obtained through training is improved.
Optionally, the mask self-encoder includes a visual deformation encoder and a deformation decoder, and the pre-training process of the mask self-encoder includes:
acquiring a first sample image, wherein the first sample image comprises a first sample formula;
carrying out random masking processing on the first sample image to obtain a masked sub-block and an unmasked sub-block;
inputting the unmasked sub-blocks into a visual deformation encoder to obtain initial vector features of the unmasked sub-blocks;
inputting mask vector features and initial vector features of the mask sub-blocks into a deformation decoder to obtain a prediction sample image;
a first loss value is calculated from the predicted sample image and the first sample image, and a first pre-training parameter of the mask self-encoder is adjusted according to the first loss value.
It is understood that the first sample image may be unlabeled data. The performing the random masking processing on the first sample image may refer to dividing the first sample image into non-overlapping sub-tiles (patches), and selecting some sub-tiles from the sub-tiles uniformly and non-repeatedly according to the distribution of the sub-tiles in the first sample image to perform the masking operation, so as to obtain masked sub-tiles (masked patches) and unmasked sub-tiles (unmasked patches).
It should be noted that the visual morphing encoder may employ the ViT (Vision Transformer) architecture and encode the unmasked sub-blocks. The morph decoder may employ a Transformer architecture. For example, the visual deformation encoder may encode the unmasked sub-tiles by linear mapping, and add position encoding, and then perform a transform encoding process to obtain the initial vector features of the unmasked sub-tiles. Thereafter, the initial vector features and mask vector features corresponding to the mask sub-blocks may be input to a warp decoder. Wherein the mask vector feature indicates what the morphed decoder is to predict. The last layer of the morph decoder may be a linear mapping layer that outputs a number of channels equal to the number of pixels of the first sample image. On the basis of this, the warping decoder may perform a warping operation (reshape) on the predicted content output from the last layer, so that a predicted sample image having the same shape as the first sample image may be obtained.
It will be appreciated that since the first sample image is subjected to the random masking process and the mask sub-block is reconstructed by the warped decoder, the first penalty value may be calculated using the predicted sample image and the first sample image output by the warped decoder and the first pre-training parameter of the mask self-encoder may be adjusted by the first penalty value. The formula for calculating the first loss value may be a loss function of the MAE model, which is a prior art and will not be described herein.
Optionally, the first pre-training parameters include encoded training parameters of the visual deformation encoder, and the pre-trained mask characterizes initial values of the first training parameters of the encoding layer from values of the encoded training parameters of the visual deformation encoder in the encoder.
It should be noted that the coding layer of the formula identification model may employ the network architecture of the visual deformation encoder, i.e., the network architecture of ViT. On the basis, the pre-trained mask can be used as an initial value of a first pre-training parameter of an encoding layer from the encoding training parameters of the visual deformation encoder in the encoder.
The mask self-encoder is pre-trained, so that the mask self-encoder can learn general feature expression from a large amount of non-labeling data, and on the basis, the encoding layer of the formula recognition model can be initialized according to the mask self-encoder, so that the formula recognition model has the capability of preliminarily extracting image features.
Optionally, the pre-training process of the bi-directional self-encoder may include:
acquiring a sample formula sequence;
carrying out random mask processing on the sample formula sequence to obtain a random mask sample formula sequence;
determining character vectors, segmentation vectors and position vectors of the random mask sample formula sequence, and calculating to obtain addition vectors of the character vectors, the segmentation vectors and the position vectors;
Inputting the addition vector into a bidirectional self-encoder to obtain a predicted sample formula sequence;
and calculating a second loss value according to the predicted sample formula sequence and the sample formula sequence, and adjusting a second pre-training parameter of the bidirectional self-encoder according to the second loss value.
It is understood that the sample formula sequence may be represented as an unlabeled Latex sequence. The performing a random masking process on the sample formula sequence may refer to randomly selecting a portion of the sample formula sequence and performing a masking operation to obtain a random masked sample formula sequence, where the random masked sample formula sequence includes a first sample formula sequence that is masked and a second sample formula sequence that is not masked. Then, a character vector (token weighting), a segmentation vector (segment embedding) and a position vector (position embedding) of the random mask sample formula sequence can be determined through a preset weight matrix, and an addition vector of the character vector, the segmentation vector and the position vector is obtained through calculation, wherein the weight matrix is a training parameter and can be adjusted according to the result of model iterative training. On this basis, an additional vector may be input to the bi-directional self-encoder to predict the first sequence of sample formulas to obtain a sequence of predicted sample formulas including a prediction result corresponding to the first sequence of sample formulas and the second sequence of sample formulas.
It will be appreciated that since the sequence of sample formulas is randomly masked and the first sequence of masked sample formulas is predicted, the predicted sequence of sample formulas and the sequence of sample formulas may be used to calculate a second loss value and adjust the second pre-training parameter of the bi-directional self-encoder based on the second loss value. The formula for calculating the second loss value may be a loss function of the BERT model, where the loss function of the BERT model is a prior art and is not described herein.
Optionally, the technical solution provided by the embodiment of the present disclosure may further include:
determining a target network layer according to the network architecture of the deformation decoder and the network architecture of the bidirectional self-encoder, wherein the network architecture of the deformation decoder comprises the target network layer, and the network architecture of the bidirectional self-encoder does not comprise the target network layer;
randomly initializing training parameters of a target network layer to obtain initialization parameters;
and taking the initialization parameter and the second pre-training parameter of the pre-trained bidirectional self-encoder as initial values of the second training parameter of the decoding layer.
It should be noted that the decoding layer of the formula identification model may employ a network architecture of a deformable decoder, i.e., a network architecture of a transducer. On the basis of this, a target network layer may be determined from the network architecture of the morph decoder and the network architecture of the bidirectional self-encoder, which target network layer is included in the network architecture of the morph decoder and is not included in the network architecture of the bidirectional self-encoder, for example may be an interactive attention layer of the morph decoder. And then, randomly initializing the training parameters of the target to obtain the initialization parameters, and taking the initialization parameters and the second pre-training parameters of the pre-trained bidirectional self-encoder as initial values of the second training parameters of the decoding layer.
Through pre-training the bidirectional self-encoder, the bidirectional self-encoder can learn general feature expression from a large amount of unlabeled data, and on the basis, a decoding layer of the formula recognition model can be initialized according to the bidirectional self-encoder, so that the formula recognition model has the capability of preliminarily extracting semantic features.
In addition, a supervised learning method is adopted when the training parameters of the formula identification model are finely tuned, namely, a small number of labeling samples are adopted to finely tune the model parameters. Because the initial value of the first training parameter and the initial value of the second training parameter obtained through pre-training add priori knowledge to the model, the model has the preliminary formula recognition capability, and the training parameters of the model are finely adjusted on the basis of the preliminary formula recognition capability, so that the formula recognition model with higher accuracy can be obtained.
FIG. 2 is a schematic diagram of a training formula recognition model, as shown in an exemplary embodiment of the present disclosure. As shown in fig. 2, the pre-training phase may pre-train the mask self-encoder model 10 and the bi-directional self-encoder model 20, and initialize the encoding layer 30 of the equation identification model through the pre-trained mask self-encoder model 10, and initialize the decoding layer 40 of the equation identification model through the pre-trained bi-directional self-encoder model 20. The coding layer 30 may employ a ViT architecture, which ViT architecture may include a first Multi-Head Attention layer (Multi-Head Attention) 31 and a first Feed Forward neural network layer (Feed Forward) 32. The decoding layer 40 may employ a transform architecture that may include a Mask Multi-Head Attention layer (Mask Multi-Head Attention) 41, a second Multi-Head Attention layer (Multi-Head Attention) 42, and a second Feed Forward neural network layer (Feed Forward) 43. Furthermore, the decoding layer 40 may also include an interactive attention layer that may be used to convert image features into sequence features. The true value (Ground Truth) may be a sample formula label. The fine tuning stage may obtain sample formula image features by inputting the sample image into the encoding layer 30, and then input the sample formula image features and the sample formula label into the decoding layer 40, and then calculate a target loss value according to the obtained sample formula recognition result and the sample formula label, and adjust the first training parameter and the second training parameter of the formula recognition model according to the target loss value.
Through the technical scheme, the coding layer and the decoding layer of the formula identification model are trained. In the training process, the initial value of the first training parameter of the coding layer can be obtained by pre-training the mask self-encoder, and the initial value of the second training parameter of the decoding layer can be obtained by pre-training the bidirectional self-encoder. Therefore, priori knowledge can be added to the formula recognition model through the initial value of the first training parameter and the initial value of the second training parameter, so that the formula recognition model can perform supervised learning on the basis of preliminary formula recognition capability. Because the formula recognition model has the preliminary formula recognition capability, the first training parameters and the second training parameters of the model can be finely adjusted based on a small amount of marking data, so that the marking cost is reduced, and the accuracy of the formula recognition model obtained through training is improved.
FIG. 3 is a flow chart of a formula identification method shown in an exemplary embodiment of the present disclosure. As shown in fig. 3, the method includes:
s201, acquiring an image to be identified, wherein the image to be identified comprises a formula to be identified;
s202, inputting the image to be identified into a formula identification model to obtain a formula identification result output by the formula identification model.
The formula recognition model is obtained through training by the model training method.
It will be appreciated that since during the training process, the initial value of the first training parameter of the encoding layer is obtained by pre-training the mask self-encoder, and the initial value of the second training parameter of the decoding layer is obtained by pre-training the bi-directional self-encoder. Therefore, priori knowledge can be added to the formula recognition model through the initial value of the first training parameter and the initial value of the second training parameter, so that the formula recognition model can perform supervised learning on the basis of preliminary formula recognition capability. Because the formula recognition model has the preliminary formula recognition capability, the first training parameters and the second training parameters of the model can be finely adjusted based on a small amount of marking data, so that the marking cost is reduced, and the accuracy of the formula recognition model obtained through training is improved. Therefore, the formula recognition model obtained through training by the model training method has better accuracy and robustness on formula recognition.
It should be noted that, after the trained formula recognition model is obtained, the formula recognition model may be applied to the AI learning machine scene. Specifically, the formula content selected by the user in the AI learning machine can be accurately identified through the formula identification model, and the identification result is used for intelligent correction service, or similar problem recommendation is performed according to the identification result, so as to assist learning. In addition, the recognition dictionary adopted by the formula recognition model can comprise a large number of formula types, so that the applicability of the formula recognition model to different formula recognition scenes can be greatly improved.
Based on the same inventive concept, the present disclosure further provides a model training apparatus, referring to fig. 4, and fig. 4 is a block diagram of a model training apparatus according to an exemplary embodiment of the present disclosure. As shown in fig. 4, the model training apparatus 300 is for training a formula recognition model including an encoding layer and a decoding layer, and the model training apparatus 300 includes:
a first obtaining module 301, configured to obtain a sample image, where the sample image includes a sample formula, and the sample formula is labeled with a sample formula label;
the first input module 302 is configured to input the sample image into the encoding layer provided with a first training parameter, and obtain a sample formula image feature output by the encoding layer, where an initial value of the first training parameter is obtained by pre-training from an encoder based on a mask;
the second input module 303 is configured to input the sample formula image feature to the decoding layer provided with a second training parameter, and obtain a sample formula recognition result output by the decoding layer, where an initial value of the second training parameter is obtained based on bi-directional self-encoder pre-training;
a calculating module 304, configured to calculate a target loss value according to the sample formula identification result and the sample formula label;
An adjustment module 305 is configured to adjust the first training parameter and the second training parameter of the formula identification model according to the target loss value.
The coding layer and decoding layer of the formula recognition model are trained using the model training apparatus 300 described above. In the training process, the initial value of the first training parameter of the coding layer can be obtained by pre-training the mask self-encoder, and the initial value of the second training parameter of the decoding layer can be obtained by pre-training the bidirectional self-encoder. Therefore, priori knowledge can be added to the formula recognition model through the initial value of the first training parameter and the initial value of the second training parameter, so that the formula recognition model can perform supervised learning on the basis of preliminary formula recognition capability. Because the formula recognition model has the preliminary formula recognition capability, the first training parameters and the second training parameters of the model can be finely adjusted based on a small amount of marking data, so that the marking cost is reduced, and the accuracy of the formula recognition model obtained through training is improved.
Optionally, the mask self-encoder includes a visual deformation encoder and a deformation decoder, and the model training apparatus 300 further includes a first pre-training module, where the first pre-training module is configured to pre-train the mask self-encoder, and the first pre-training module is configured to:
Acquiring a first sample image, the first sample image comprising a first sample formula;
carrying out random masking processing on the first sample image to obtain a masked sub-image block and an unmasked sub-image block;
inputting the unmasked sub-blocks into the visual deformation encoder to obtain initial vector features of the unmasked sub-blocks;
inputting mask vector features and the initial vector features of the mask sub-blocks into the deformation decoder to obtain a prediction sample image;
a first loss value is calculated from the predicted sample image and the first sample image, and a first pre-training parameter of the mask self-encoder is adjusted according to the first loss value.
Optionally, the first pre-training parameters include coding training parameters of the visual deformation encoder, and the mask after pre-training characterizes initial values of the first training parameters of the coding layer from values of the coding training parameters of the visual deformation encoder in the encoder.
Optionally, the model training apparatus 300 further includes a second pre-training module, where the second pre-training module is configured to pre-train the bidirectional self-encoder, and the second pre-training module is configured to:
Acquiring a sample formula sequence;
carrying out random mask processing on the sample formula sequence to obtain a random mask sample formula sequence;
determining a character vector, a segmentation vector and a position vector of the random mask sample formula sequence, and calculating to obtain an addition vector of the character vector, the segmentation vector and the position vector;
inputting the addition vector into the bidirectional self-encoder to obtain a predicted sample formula sequence;
and calculating a second loss value according to the predicted sample formula sequence and the sample formula sequence, and adjusting a second pre-training parameter of the bidirectional self-encoder according to the second loss value.
Optionally, the model training apparatus 300 further comprises an execution module, where the execution module is configured to:
determining a target network layer according to the network architecture of the deformation decoder and the network architecture of the bidirectional self-encoder, wherein the network architecture of the deformation decoder comprises the target network layer, and the network architecture of the bidirectional self-encoder does not comprise the target network layer;
randomly initializing training parameters of the target network layer to obtain initialization parameters;
and taking the initialization parameter and a second pre-training parameter of the bi-directional self-encoder with the pre-training as initial values of the second training parameter of the decoding layer.
With respect to the model training apparatus 300 in the above embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment regarding the method, and will not be described in detail here.
Based on the same inventive concept, the present disclosure also provides a formula recognition apparatus, referring to fig. 5, and fig. 5 is a block diagram of a formula recognition apparatus shown in an exemplary embodiment of the present disclosure. As shown in fig. 5, the formula recognition apparatus 400 includes:
a second obtaining module 401, configured to obtain an image to be identified, where the image to be identified includes a formula to be identified;
and a third input module 402, configured to input the image to be identified into a formula identification model, and obtain a formula identification result output by the formula identification model, where the formula identification model is obtained by training by using the model training method described above.
Since in the training process, the initial value of the first training parameter of the coding layer is obtained by pre-training the mask self-encoder, and the initial value of the second training parameter of the decoding layer is obtained by pre-training the bi-directional self-encoder. Therefore, priori knowledge can be added to the formula recognition model through the initial value of the first training parameter and the initial value of the second training parameter, so that the formula recognition model can perform supervised learning on the basis of preliminary formula recognition capability. Because the formula recognition model has the preliminary formula recognition capability, the first training parameters and the second training parameters of the model can be finely adjusted based on a small amount of marking data, so that the marking cost is reduced, and the accuracy of the formula recognition model obtained through training is improved. Therefore, the formula recognition device 400 is better in accuracy and robustness for formula recognition.
With respect to the formula recognition apparatus 400 in the above embodiment, the specific manner in which the respective modules perform the operations has been described in detail in the embodiment regarding the method, and will not be described in detail herein.
Based on the same inventive concept, the embodiments of the present disclosure further provide an electronic device, including:
a memory having a computer program stored thereon;
and a processor for executing the computer program in the memory to implement the steps of the model training method or the formula identification method.
Fig. 6 is a block diagram of an electronic device, as shown in an exemplary embodiment of the present disclosure. As shown in fig. 6, the electronic device 500 may include: a processor 501, a memory 502. The electronic device 500 may also include one or more of a multimedia component 503, an input/output interface 504, and a communication component 505.
Wherein the processor 501 is configured to control the overall operation of the electronic device 500 to perform all or part of the steps of the model training method or the formula recognition method described above. The memory 502 is used to store various types of data to support operation at the electronic device 500, which may include, for example, instructions for any application or method operating on the electronic device 500, as well as application-related data, such as contact data, messages sent and received, pictures, audio, video, and so forth. The Memory 502 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 503 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may be further stored in the memory 502 or transmitted through the communication component 505. The audio assembly further comprises at least one speaker for outputting audio signals. The input/output interface 504 provides an interface between the processor 501 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 505 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless communications, such as Wi-Fi, bluetooth, near field communications (Near Field Communication, NFC for short), 2G, 3G, 4G or 5G, nb-IOT (Narrow Band Internet of Things ), or a combination of one or more thereof, the respective communication component 505 may thus comprise: wi-Fi module, bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic device 500 may be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), digital signal processors (Digital Signal Processor, abbreviated as DSP), digital signal processing devices (Digital Signal Processing Device, abbreviated as DSPD), programmable logic devices (Programmable Logic Device, abbreviated as PLD), field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), controllers, microcontrollers, microprocessors, or other electronic components for performing the model training method or the formula identification method described above.
In another exemplary embodiment, a computer readable storage medium is also provided, comprising program instructions which, when executed by a processor, implement the steps of the model training method or the formula identification method described above. For example, the computer readable storage medium may be the memory 502 described above including program instructions executable by the processor 501 of the electronic device 500 to perform the model training method or the formula identification method described above.
With respect to the computer-readable storage medium in the above-described embodiments, the steps for implementing the model training method or the formula recognition method when the computer program stored thereon is executed have been described in detail in the embodiments related to the method, and will not be described in detail herein.
In another exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the model training method or the formula identification method described above when executed by the programmable apparatus.
The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solutions of the present disclosure within the scope of the technical concept of the present disclosure, and all the simple modifications belong to the protection scope of the present disclosure.
In addition, the specific features described in the foregoing embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, the present disclosure does not further describe various possible combinations.
Moreover, any combination between the various embodiments of the present disclosure is possible as long as it does not depart from the spirit of the present disclosure, which should also be construed as the disclosure of the present disclosure.

Claims (10)

1. A method of model training for training a formula recognition model, the formula recognition model including an encoding layer and a decoding layer, the method comprising:
Acquiring a sample image, wherein the sample image comprises a sample formula, and the sample formula is marked with a sample formula label;
inputting the sample image into the coding layer provided with a first training parameter to obtain sample formula image characteristics output by the coding layer, wherein an initial value of the first training parameter is obtained by pre-training from an encoder based on a mask;
inputting the sample formula image characteristics into the decoding layer provided with second training parameters to obtain a sample formula identification result output by the decoding layer, wherein the initial value of the second training parameters is obtained based on the pre-training of a bidirectional self-encoder;
calculating a target loss value according to the sample formula identification result and the sample formula label;
and adjusting the first training parameter and the second training parameter of the formula identification model according to the target loss value.
2. The method of claim 1, wherein the masked self-encoder comprises a visual deformation encoder and a deformation decoder, and wherein the masked self-encoder pre-training process comprises:
acquiring a first sample image, the first sample image comprising a first sample formula;
Carrying out random masking processing on the first sample image to obtain a masked sub-image block and an unmasked sub-image block;
inputting the unmasked sub-blocks into the visual deformation encoder to obtain initial vector features of the unmasked sub-blocks;
inputting mask vector features and the initial vector features of the mask sub-blocks into the deformation decoder to obtain a prediction sample image;
a first loss value is calculated from the predicted sample image and the first sample image, and a first pre-training parameter of the mask self-encoder is adjusted according to the first loss value.
3. The method of claim 2, wherein the first pre-training parameters comprise encoded training parameters of the visual deformation encoder, and wherein the pre-trained mask characterizes initial values of the first training parameters of the encoded layer from values of the encoded training parameters of the visual deformation encoder in the encoder.
4. The method of claim 2, wherein the bi-directional self-encoder pre-training process comprises:
acquiring a sample formula sequence;
carrying out random mask processing on the sample formula sequence to obtain a random mask sample formula sequence;
Determining a character vector, a segmentation vector and a position vector of the random mask sample formula sequence, and calculating to obtain an addition vector of the character vector, the segmentation vector and the position vector;
inputting the addition vector into the bidirectional self-encoder to obtain a predicted sample formula sequence;
and calculating a second loss value according to the predicted sample formula sequence and the sample formula sequence, and adjusting a second pre-training parameter of the bidirectional self-encoder according to the second loss value.
5. The method according to claim 4, wherein the method further comprises:
determining a target network layer according to the network architecture of the deformation decoder and the network architecture of the bidirectional self-encoder, wherein the network architecture of the deformation decoder comprises the target network layer, and the network architecture of the bidirectional self-encoder does not comprise the target network layer;
randomly initializing training parameters of the target network layer to obtain initialization parameters;
and taking the initialization parameter and a second pre-training parameter of the bi-directional self-encoder with the pre-training as initial values of the second training parameter of the decoding layer.
6. A method of formula identification, the method comprising:
acquiring an image to be identified, wherein the image to be identified comprises a formula to be identified;
inputting the image to be identified into a formula identification model to obtain a formula identification result output by the formula identification model, wherein the formula identification model is obtained by training by the model training method according to any one of claims 1-5.
7. A model training apparatus for training a formula recognition model, the formula recognition model including an encoding layer and a decoding layer, the model training apparatus comprising:
the first acquisition module is used for acquiring a sample image, wherein the sample image comprises a sample formula, and the sample formula is marked with a sample formula label;
the first input module is used for inputting the sample image into the coding layer provided with a first training parameter to obtain sample formula image characteristics output by the coding layer, wherein an initial value of the first training parameter is obtained by pre-training from an encoder based on a mask;
the second input module is used for inputting the image characteristics of the sample formula into the decoding layer provided with second training parameters to obtain a sample formula identification result output by the decoding layer, wherein the initial value of the second training parameters is obtained based on the pre-training of the bidirectional self-encoder;
The calculation module is used for calculating a target loss value according to the sample formula identification result and the sample formula label;
and the adjusting module is used for adjusting the first training parameter and the second training parameter of the formula identification model according to the target loss value.
8. A formula identification device, the formula identification device comprising:
the second acquisition module is used for acquiring an image to be identified, wherein the image to be identified comprises a formula to be identified;
and the third input module is used for inputting the image to be identified into a formula identification model to obtain a formula identification result output by the formula identification model, wherein the formula identification model is obtained by training the model training method according to any one of claims 1-5.
9. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the method according to any of claims 1-6.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-6.
CN202310216179.3A 2023-03-01 2023-03-01 Model training method, formula identification method, device, medium and equipment Pending CN116206314A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310216179.3A CN116206314A (en) 2023-03-01 2023-03-01 Model training method, formula identification method, device, medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310216179.3A CN116206314A (en) 2023-03-01 2023-03-01 Model training method, formula identification method, device, medium and equipment

Publications (1)

Publication Number Publication Date
CN116206314A true CN116206314A (en) 2023-06-02

Family

ID=86519045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310216179.3A Pending CN116206314A (en) 2023-03-01 2023-03-01 Model training method, formula identification method, device, medium and equipment

Country Status (1)

Country Link
CN (1) CN116206314A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116911384A (en) * 2023-06-13 2023-10-20 电子科技大学 Zero-suppression incremental knowledge optimization method and device and electronic equipment
CN117352120A (en) * 2023-06-05 2024-01-05 北京长木谷医疗科技股份有限公司 GPT-based intelligent self-generation method, device and equipment for knee joint lesion diagnosis

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117352120A (en) * 2023-06-05 2024-01-05 北京长木谷医疗科技股份有限公司 GPT-based intelligent self-generation method, device and equipment for knee joint lesion diagnosis
CN116911384A (en) * 2023-06-13 2023-10-20 电子科技大学 Zero-suppression incremental knowledge optimization method and device and electronic equipment
CN116911384B (en) * 2023-06-13 2024-01-26 电子科技大学 Zero-suppression incremental knowledge optimization method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN110119757B (en) Model training method, video category detection method, device, electronic equipment and computer readable medium
CN107391646B (en) Semantic information extraction method and device for video image
CN116206314A (en) Model training method, formula identification method, device, medium and equipment
CN111916067A (en) Training method and device of voice recognition model, electronic equipment and storage medium
CN111368118B (en) Image description generation method, system, device and storage medium
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN110472255B (en) Neural network machine translation method, model, electronic terminal, and storage medium
CN107463928A (en) Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM
CN114926835A (en) Text generation method and device, and model training method and device
CN112804558B (en) Video splitting method, device and equipment
CN116884391B (en) Multimode fusion audio generation method and device based on diffusion model
CN116050496A (en) Determination method and device, medium and equipment of picture description information generation model
CN116205820A (en) Image enhancement method, target identification method, device and medium
CN115731552A (en) Stamp character recognition method and device, processor and electronic equipment
CN116306603A (en) Training method of title generation model, title generation method, device and medium
Uddin et al. A perceptually inspired new blind image denoising method using $ L_ {1} $ and perceptual loss
CN115565177A (en) Character recognition model training method, character recognition device, character recognition equipment and medium
CN117034951A (en) Digital person with specific language style based on large language model
CN116975347A (en) Image generation model training method and related device
CN115496134A (en) Traffic scene video description generation method and device based on multi-modal feature fusion
CN115905613A (en) Audio and video multitask learning and evaluation method, computer equipment and medium
CN115204366A (en) Model generation method and device, computer equipment and storage medium
CN115115972A (en) Video processing method, video processing apparatus, computer device, medium, and program product
CN114792388A (en) Image description character generation method and device and computer readable storage medium
CN110321802B (en) Face image generation method and apparatus, storage device and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination