CN113919294A

CN113919294A - Formula recognition model training method and device for model training

Info

Publication number: CN113919294A
Application number: CN202111156277.XA
Authority: CN
Inventors: 秦波; 辛晓哲
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-01-11

Abstract

The embodiment of the invention provides a method and a device for training a formula recognition model and a device for training the model. The method comprises the following steps: obtaining a formula identification sample and a target identification result corresponding to the formula identification sample; iteratively training a formula recognition model based on the formula recognition samples, the formula recognition model comprising an encoder and a decoder; in each round of training, calculating a loss value of the formula recognition model according to a formula recognition result output by the decoder and the target recognition result, and adjusting model parameters of the formula recognition model according to the loss value; wherein the decoder outputs a formula identification result based on the decoding results of the current encoding characteristic and the last encoding characteristic of the formula identification sample; and when the joint loss value meets a convergence condition, obtaining a trained formula recognition model. The embodiment of the invention can improve the recognition accuracy of the formula recognition model.

Description

Formula recognition model training method and device for model training

Technical Field

The invention relates to the technical field of intelligent control, in particular to a method and a device for training a formula recognition model and a device for training the model.

Background

The formula recognition technology has wide application requirements in the education fields of automatic paper marking, shooting and question searching, test question digitalization and the like. The accuracy of formula identification is not high because the formula itself may have a complex two-dimensional structure, a great number of symbols, similarity of symbols, and diversity of meanings.

In recent years, with the remarkable improvement of computer computing capability and the arrival of big data era, formula recognition algorithms based on neural networks are widely applied to various formula recognition scenes. Currently, there are two types of formula recognition algorithms in common use: one is a staged recognition algorithm based on a processing flow of 'segmentation-recognition-reprocessing' and the like, and the other is an end-to-end formula recognition algorithm.

However, in the staged recognition algorithm, each stage causes an error, and the errors of the respective stages are accumulated, resulting in poor accuracy. In the end-to-end formula identification algorithm, an attention mechanism is mainly adopted in a decoding stage, so that identification error deviation is easily caused, and the longer the identification time slice is, the worse the accuracy is.

Disclosure of Invention

The embodiment of the invention provides a training method and a training device for a formula recognition model and a device for model training, which can improve the recognition accuracy of the formula recognition model.

In order to solve the above problem, an embodiment of the present invention discloses a method for training a formula recognition model, where the method includes:

obtaining a formula identification sample and a target identification result corresponding to the formula identification sample;

iteratively training a formula recognition model based on the formula recognition samples, the formula recognition model comprising an encoder and a decoder;

in each round of training, calculating a loss value of the formula recognition model according to a formula recognition result output by the decoder and the target recognition result, and adjusting model parameters of the formula recognition model according to the loss value; wherein the decoder outputs a formula identification result based on the decoding results of the current encoding characteristic and the last encoding characteristic of the formula identification sample;

and when the joint loss value meets a convergence condition, obtaining a trained formula recognition model.

Optionally, the calculating the loss value of the formula recognition model according to the formula recognition result and the target recognition result output by the decoder includes:

inputting the formula identification sample into an encoder of the formula identification model for encoding to obtain a first characteristic diagram of the formula identification sample;

splitting the first characteristic diagram to obtain M second characteristic diagrams;

sequencing the M second feature graphs according to a preset rule;

inputting a decoding result corresponding to the (N-1) th second feature map and the Nth second feature map into a decoder of the formula identification model for decoding processing until the Mth second feature map is input into the decoder, and obtaining a formula identification result corresponding to the formula identification sample; wherein M and N are positive integers, and N is less than or equal to M;

and calculating the loss value of the formula recognition model according to the formula recognition result and the target recognition result.

Optionally, the splitting the first feature map to obtain M second feature maps includes:

determining a relative position code corresponding to the first feature map;

adding the relative position code to the first characteristic diagram to obtain a third characteristic diagram;

and splitting the third characteristic diagram to obtain M second characteristic diagrams.

determining a width value of the first feature map;

and splitting the first characteristic diagram according to the width value of the first characteristic diagram to obtain M second characteristic diagrams.

Optionally, if the value of N is 1, the decoding result corresponding to the N-1 th second feature map is a preset value.

Optionally, the method further comprises:

acquiring a target image, wherein the target image comprises a formula to be identified;

and inputting the target image into the trained formula recognition model for formula recognition processing to obtain a formula recognition result corresponding to the target image.

On the other hand, the embodiment of the invention discloses a training device of a formula recognition model, which comprises:

the training data acquisition module is used for acquiring a formula identification sample and a target identification result corresponding to the formula identification sample;

the iterative training module is used for iteratively training a formula recognition model based on the formula recognition sample, and the formula recognition model comprises an encoder and a decoder;

the loss value calculation module is used for calculating the loss value of the formula recognition model according to the formula recognition result output by the decoder and the target recognition result in each round of training and adjusting the model parameters of the formula recognition model according to the loss value; wherein the decoder outputs a formula identification result based on the decoding results of the current encoding characteristic and the last encoding characteristic of the formula identification sample;

and the training completion condition determining module is used for obtaining a trained formula recognition model when the joint loss value meets the convergence condition.

Optionally, the encoding feature includes a feature map, and the loss value calculation module includes:

the coding processing submodule is used for inputting the formula identification sample into a coder of the formula identification model for coding processing to obtain a first characteristic diagram of the formula identification sample;

the splitting processing submodule is used for splitting the first characteristic diagram to obtain M second characteristic diagrams;

the characteristic diagram sorting submodule is used for sorting the M second characteristic diagrams according to a preset rule;

the decoding processing sub-module is used for inputting the decoding result corresponding to the (N-1) th second feature map and the Nth second feature map into a decoder of the formula identification model for decoding processing until the Mth second feature map is input into the decoder to obtain a formula identification result corresponding to the formula identification sample; wherein M and N are positive integers, and N is less than or equal to M;

and the loss value operator module is used for calculating the loss value of the formula identification model according to the formula identification result and the target identification result.

Optionally, the splitting processing sub-module includes:

a relative position code determining unit, configured to determine a relative position code corresponding to the first feature map;

a relative position code adding unit, configured to add the relative position code to the first feature map to obtain a third feature map;

and the first splitting processing unit is used for splitting the third characteristic diagram to obtain M second characteristic diagrams.

Optionally, the splitting processing sub-module includes:

a width value determination unit, configured to determine a width value of the first feature map;

and the second splitting processing unit is used for splitting the first feature map according to the width value of the first feature map to obtain M second feature maps.

Optionally, the apparatus further comprises:

the target image acquisition module is used for acquiring a target image, and the target image comprises a formula to be identified;

and the formula recognition processing module is used for inputting the target image into the trained formula recognition model to perform formula recognition processing so as to obtain a formula recognition result corresponding to the target image.

In yet another aspect, an embodiment of the present invention discloses an apparatus for model training, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors comprise instructions for performing a training method of formula recognition models according to one or more of the foregoing.

In yet another aspect, embodiments of the present invention disclose a machine-readable medium having instructions stored thereon, which, when executed by one or more processors, cause an apparatus to perform a method of training a formula recognition model as described in one or more of the preceding.

The embodiment of the invention has the following advantages:

according to the embodiment of the invention, a formula identification sample and a target identification result corresponding to the formula identification sample are obtained; then, iteratively training a formula recognition model based on the formula recognition sample, wherein the formula recognition model comprises an encoder and a decoder; in each round of training, calculating a loss value of the formula recognition model according to a formula recognition result output by the decoder and the target recognition result, and adjusting model parameters of the formula recognition model according to the loss value; wherein the decoder outputs a formula identification result based on the decoding results of the current encoding characteristic and the last encoding characteristic of the formula identification sample; and when the joint loss value meets a convergence condition, obtaining a trained formula recognition model. Compared with the prior art that the decoder is constructed by adopting an attention mechanism, so that the decoder only focuses on limited key information, in the decoding process, the decoding result of the current coding feature is predicted by introducing the decoding result corresponding to the last coding feature of the formula identification sample, the formula identification result is determined based on the decoding result of each coding feature, the correlation among the coding features is fully considered, and the identification accuracy is favorably improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of a computer system according to the present invention;

FIG. 2 is a flow chart of the steps of an embodiment of a method for training a formula recognition model of the present invention;

FIG. 3 is a schematic diagram of a formula recognition model according to the present invention;

FIG. 4 is a block diagram of an embodiment of an apparatus for training a formula recognition model according to the present invention;

FIG. 5 is a block diagram of an apparatus 800 for model training of the present invention;

fig. 6 is a schematic diagram of a server in some embodiments of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Method embodiment

The embodiment of the invention provides a method for training a formula recognition model, which can be applied to computer equipment with data processing capacity. In an optional embodiment, the method for training the formula recognition model provided by the embodiment of the present invention may be applied to a personal computer, a workstation, or a server, that is, the formula recognition and the formula recognition model training may be implemented by the personal computer, the workstation, or the server.

The trained formula recognition model can become a part of an application program and is installed in the terminal, so that the terminal outputs a formula recognition result when receiving the content to be recognized; or the trained formula recognition model is arranged in a background server of the application program, so that the terminal provided with the application program realizes the formula recognition function by means of the background server.

Referring to fig. 1, a schematic structural diagram of a computer system provided by an embodiment of the present invention is shown, where the computer system includes a terminal 110 and a server 120. The terminal 110 and the server 120 perform data communication via a communication network. Optionally, the communication network may be a wired network or a wireless network, and the communication network may be at least one of a local area network, a metropolitan area network, and a wide area network.

The terminal 110 is installed with an application program supporting a formula recognition function, where the application program may be an electronic book reading application program, a web browsing application program, a paper marking application program, a photo taking and question searching application program, a social contact application program, and the like, and the embodiment of the present invention is not limited thereto.

Optionally, the terminal 110 may be a mobile terminal such as a smart phone, a smart watch, a tablet computer, a laptop portable notebook computer, an intelligent robot, or a terminal such as a desktop computer, a projection computer, and the like, and the type of the terminal is not limited in the embodiment of the present invention.

The server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, cloud communication, a Network service, a middleware service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. In an alternative embodiment, server 120 may be a backend server for applications in terminal 110.

In some embodiments, terminal 110 includes a camera assembly therein; the terminal 110 acquires picture content including a formula to be recognized through the camera assembly, and the terminal 110 transmits the picture to the server 120. The server 120 includes a formula recognition module, which includes a trained formula recognition model; the server 120 receives the picture sent by the terminal 110, identifies the formula in the picture through the formula identification model, and sends the identification result to the terminal 110 for display.

Alternatively, the terminal 110 includes a formula recognition module, and the formula recognition module includes a trained formula recognition model. After the terminal acquires the picture, the formula in the picture is identified through the formula identification model to obtain an identification result, and the identification result is displayed.

It should be noted that, in the above embodiment, the display mode of the terminal includes a picture form or a text form.

For convenience of description, the following embodiments are described as examples in which the training method of the formula recognition model is performed by a server.

Referring to fig. 2, a flowchart illustrating steps of an embodiment of a method for training a formula recognition model according to the present invention is shown, where the method specifically includes the following steps:

step 201, obtaining a formula identification sample and a target identification result corresponding to the formula identification sample.

And 202, carrying out iterative training on a formula recognition model based on the formula recognition sample, wherein the formula recognition model comprises an encoder and a decoder.

Step 203, in each round of training, calculating a loss value of the formula recognition model according to the formula recognition result output by the decoder and the target recognition result, and adjusting model parameters of the formula recognition model according to the loss value; wherein the decoder outputs a formula identification result based on the decoding results of the current encoding characteristic and the last encoding characteristic of the formula identification sample.

And step 204, when the joint loss value meets a convergence condition, obtaining a trained formula recognition model.

The formula recognition sample is an image used for training, and the image contains a formula to be recognized. And the target identification result corresponding to the formula identification sample is the real identification result of the formula identification sample and is used for checking the accuracy of the output result of the formula identification model.

There are many ways to obtain a formula identification sample and a target tag corresponding to the formula identification sample, for example, a formula picture and a formula identification result existing on a network, a formula picture and an identification result stored on a computer device, and the like may be used as the formula identification sample of the embodiment of the present invention.

It should be noted that, in the embodiment of the present invention, the size of the formula identification sample is not specifically limited. In order to accelerate the model training speed, the formula identification sample can be preprocessed in advance. For example, the image height of each formula identification sample may be uniformly adjusted to a preset height, for example, the image height of each formula identification sample is fixed to 64 pixels; then, the image width is scaled equally based on the height adjustment scale of each formula identification sample, for example, for formula identification sample a whose image height before preprocessing is 32 pixels and image width is 128 pixels, the image height of formula identification sample a is adjusted to 64 pixels, and the image height of formula identification sample a is elongated by 2 times, and therefore, the image width thereof is also elongated by 2 times and adjusted to 256 pixels. In order to avoid that the size difference of the formula identification samples is too large in the training process and the training difficulty is increased, the scaled formula identification samples can be sequentially sorted from small to large or from large to small according to the image width, adjacent m formula identification samples are used as a sample group, and the formula identification samples of the same sample group are synchronously processed.

The formula identification model is used for identifying formula contents in the input picture data. In the embodiment of the present invention, the formula identification model may be constructed using an end-to-end network structure. As an example, the formula recognition model may employ an Encoder-Decoder structure including an Encoder and a Decoder.

Wherein, the encoder may adopt a CNN (convolutional neural Networks) structure. Illustratively, an optimal CNN network structure can be determined as an Encoder of the codec module through NAS (Neural Architecture Search) without a small influence on accuracy. The decoder may employ a transform network structure.

Referring to fig. 3, a schematic structural diagram of a formula recognition model according to an embodiment of the present invention is shown. As shown in fig. 3, the formula recognition model includes an encoder and a decoder that employ a Transformer network structure. The decoder may include a plurality of decoding modules, Nx in fig. 3 represents the number of the decoding modules, and in the embodiment of the present invention, the value of Nx may be 3, in other words, the decoder may include 3 decoding modules. It should be noted that the Network structure of each decoding module is the same, and mainly includes a Masked Multi-Head Attention layer (MMHA), a Multi-Head Attention layer (MHA), and a Feed Forward Neural Network layer (FFN). The mask multi-head attention layer is mainly used for adding masks to input data to cover partial parameter values, so that the partial parameter values do not produce effects when the parameters are updated. The input data of the multi-head attention layer comprises the output data of the encoder of the formula recognition model and the output data of the mask multi-head attention layer, namely the decoding results of the current encoding characteristic output by the encoder and the last encoding characteristic added with the mask. The multi-head attention layer can extract the correlation between the current coding feature and the decoding result of the previous coding feature, and the influence of the decoding result of the previous coding feature on the decoding process of the current coding feature is considered. The feedforward neural network layer is used for carrying out nonlinear transformation processing on output data of the multi-head attention layer.

Besides the decoding module, the decoder also comprises an output module which mainly comprises a Linear transformation layer (Linear) and a fully-connected network layer. Wherein the fully connected network layer may employ a softmax activation function. The output module firstly carries out linear transformation on the output data of the decoding module, then obtains the probability distribution of the prediction result based on the full-connection network layer, and outputs the prediction result with the maximum output probability as a formula identification result.

In each round of training, the formula identification sample to be processed is input to an encoder of the formula identification model, the encoder performs encoding processing on the formula identification sample, and the encoding characteristics of the formula identification sample are output. The decoder outputs a formula identification result based on the decoding results of the current encoding characteristic and the last encoding characteristic of the formula identification sample. And then, calculating a loss value of the formula recognition model according to the formula recognition result and a target recognition result of the formula recognition sample, and adjusting model parameters of the formula recognition model according to the loss value until a convergence condition is met to obtain a trained formula recognition model. Wherein the loss value can be determined according to the cross entropy of the formula recognition result output by the formula recognition model and the target recognition result. The convergence condition may be that the loss values of the formula recognition model are smaller than a preset threshold value in multiple rounds of training, or that the error between the loss values is smaller than a preset value.

Compared with the prior art that the decoder is constructed by adopting an attention mechanism, so that the decoder only focuses on limited key information, in the decoding process, the decoding result of the current coding feature is predicted by introducing the decoding result corresponding to the last coding feature of the formula identification sample, the formula identification result is determined based on the decoding result of each coding feature, the correlation among the coding features is fully considered, and the identification accuracy is favorably improved.

In addition, in the embodiment of the invention, a decoder constructed by a transform network structure can be adopted, so that information from different representation subspaces at different positions is noticed based on a multi-head attention layer, the diversity of the reference characteristic information is increased, the identification error offset can be avoided, and the identification accuracy is further improved.

In an optional embodiment of the present invention, the encoding feature includes a feature map, and the calculating, in step 203, a loss value of the formula recognition model according to the formula recognition result and the target recognition result output by the decoder includes:

step S11, inputting the formula identification sample into an encoder of the formula identification model for encoding processing to obtain a first characteristic diagram of the formula identification sample;

step S12, splitting the first feature map to obtain M second feature maps;

s13, sequencing the M second feature graphs according to a preset rule;

step S14, inputting the decoding result corresponding to the (N-1) th second feature map and the Nth second feature map into a decoder of the formula identification model for decoding processing until the Mth second feature map is input into the decoder, and obtaining a formula identification result corresponding to the formula identification sample; wherein M and N are positive integers, and N is less than or equal to M;

and step S15, calculating the loss value of the formula recognition model according to the formula recognition result and the target recognition result.

The encoder encodes the input formula identification sample, and essentially performs feature extraction on the formula identification sample. Taking the example that the encoder of the formula identification model adopts a CNN network, the encoder performs convolution processing on the formula identification sample, extracts a plurality of features in the formula identification sample into one feature, and the extracted feature forms a feature map (feature map) of the formula identification sample.

The first feature map is a four-dimensional feature map, and the second feature map is a three-dimensional feature map.

It should be noted that, in the embodiment of the present invention, since the formula identifies that the sample is a picture containing the formula, and includes height, length, width, and RGB (RGB color mode) channel characteristics, the encoding process of the encoder usually obtains the height, length, width, and four-dimensional characteristics corresponding to the RGB channel, and in the transform network structure, the input data of the decoder is required to be three-dimensional characteristics. Therefore, in order to meet the requirement of the decoder on the input data in the present invention, the first feature map may be split to obtain M second feature maps. And sequencing the M second feature maps, taking the (N-1) th second feature map as the last coding feature of the formula identification sample, and taking the Nth second feature map as the current coding feature of the formula identification sample, thereby predicting the decoding result of the current coding feature based on the decoding result of the last coding feature. Wherein M and N are positive integers, and N is less than or equal to M; and if the value of N is 1, the decoding result corresponding to the (N-1) th second characteristic diagram is a preset value.

In an optional embodiment of the present invention, the splitting processing performed on the first feature map in step S12 to obtain M second feature maps includes:

a11, determining a relative position code corresponding to the first feature map;

a12, adding the relative position code to the first feature map to obtain a third feature map;

and A13, splitting the third feature map to obtain M second feature maps.

The spatial structure of the formula is generally complex and includes various positional relationships, such as front-back, top-bottom, inclusive, etc. Therefore, in the embodiment of the present invention, after the encoder performs feature extraction on the formula recognition sample, a relative position code may be further added to the extracted first feature map, and based on the extracted first feature map and the relative position code, a formula recognition result is jointly determined, so as to more accurately determine a relative position relationship between the recognized characters, and improve the recognition accuracy of the formula recognition model.

a21, determining the width value of the first feature map;

a22, splitting the first feature map according to the width value of the first feature map to obtain M second feature maps.

Assuming the first feature obtainedThe diagram can be expressed as

B is a formula identification sample contained in a training sample group (batch); 684 show the sequence length of the first feature map, but of course the sequence length of the first feature map may have other values; 4, the number of extracted features is shown, in the embodiment of the present invention, the extracted features may include height features, length features, width features, RGB features of the formula recognition sample, and thus, the number of extracted features is 4; w represents a width value of the first feature map.

Representing the coding characteristics corresponding to B formula identification samples, wherein the coding characteristics of each formula identification sample comprise coding matrixes corresponding to 4 picture characteristics, and each column in the coding matrixes has the length of 684 and the width of 684

When the first feature map is split, the first feature map F may be split into M second feature maps according to a width value of the first feature map, and the second feature maps may be represented as M second feature maps

That is, the first feature map F is divided into

A second characteristic diagram F_d，

Wherein the content of the first and second substances,

representing the coding characteristics corresponding to the B formula identification samples, wherein the coding characteristics of each formula identification sample comprise

And the length of each coding sequence is 684.

Of course, the first feature diagram may also be split in other ways, and thus, the embodiment of the present invention is not limited in particular.

In an optional embodiment of the invention, the method further comprises:

step S21, obtaining a target image, wherein the target image comprises a formula to be identified;

and S22, inputting the target image into the trained formula recognition model for formula recognition processing to obtain a formula recognition result corresponding to the target image.

In the embodiment of the invention, after the trained formula recognition model is obtained, the trained formula recognition model can be used for carrying out formula recognition processing on the target image to obtain the formula recognition result corresponding to the target image. Compared with the existing formula recognition model, the formula recognition result obtained by the trained formula recognition model in the embodiment of the invention has higher accuracy.

It should be noted that the decoding result of the last coding feature of the formula recognition sample is extracted only in the model training process, and when the formula recognition is performed after the training is completed, the result of the last coding feature of the formula recognition sample is not required to be extracted, and the formula recognition result corresponding to the formula recognition sample is directly output.

In summary, in the decoding process, the decoding result of the current coding feature is predicted by introducing the decoding result corresponding to the last coding feature of the formula identification sample, and the formula identification result is determined based on the decoding results of the coding features, so that the correlation among the coding features is fully considered, and the identification accuracy is favorably improved. In addition, in the embodiment of the invention, a decoder constructed by a transform network structure can be adopted, so that information from different representation subspaces at different positions is noticed based on a multi-head attention layer, the diversity of the reference characteristic information is increased, the identification error offset can be avoided, and the identification accuracy is further improved.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Device embodiment

Referring to fig. 4, a block diagram of an embodiment of the present invention of an apparatus for training a formula recognition model is shown, where the apparatus may include:

a training data obtaining module 401, configured to obtain a formula recognition sample and a target recognition result corresponding to the formula recognition sample;

an iterative training module 402, configured to iteratively train a formula recognition model based on the formula recognition sample, where the formula recognition model includes an encoder and a decoder;

a loss value calculation module 403, configured to calculate a loss value of the formula recognition model according to the formula recognition result output by the decoder and the target recognition result in each round of training, and adjust a model parameter of the formula recognition model according to the loss value; wherein the decoder outputs a formula identification result based on the decoding results of the current encoding characteristic and the last encoding characteristic of the formula identification sample;

and a training completion condition determining module 404, configured to obtain a trained formula recognition model when the joint loss value satisfies a convergence condition.

Optionally, the splitting processing sub-module includes:

Optionally, the apparatus further comprises:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present invention provides an apparatus for formula recognition model training, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors comprise instructions for:

p11, acquiring a formula identification sample and a target identification result corresponding to the formula identification sample;

p12, iteratively training a formula recognition model based on the formula recognition samples, wherein the formula recognition model comprises an encoder and a decoder;

p13, calculating the loss value of the formula recognition model according to the formula recognition result output by the decoder and the target recognition result in each training cycle, and adjusting the model parameters of the formula recognition model according to the loss value; wherein the decoder outputs a formula identification result based on the decoding results of the current encoding characteristic and the last encoding characteristic of the formula identification sample;

and P14, when the joint loss value meets the convergence condition, obtaining a trained formula recognition model.

sequencing the M second feature graphs according to a preset rule;

determining a relative position code corresponding to the first feature map;

determining a width value of the first feature map;

Optionally, the device is also configured to execute the one or more programs by the one or more processors including instructions for:

FIG. 5 is a block diagram illustrating an apparatus 800 for model training in accordance with an exemplary embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed status of the device 800, the relative positioning of components, such as a display and keypad of the apparatus 800, the change in position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, the orientation or acceleration/deceleration of the device 800, and the change in temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 6 is a schematic diagram of a server in some embodiments of the invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.

The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a training method of a formula recognition model shown in fig. 2.

A non-transitory computer readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a method of training a formula recognition model, the method comprising: obtaining a formula identification sample and a target identification result corresponding to the formula identification sample; iteratively training a formula recognition model based on the formula recognition samples, the formula recognition model comprising an encoder and a decoder; in each round of training, calculating a loss value of the formula recognition model according to a formula recognition result output by the decoder and the target recognition result, and adjusting model parameters of the formula recognition model according to the loss value; wherein the decoder outputs a formula identification result based on the decoding results of the current encoding characteristic and the last encoding characteristic of the formula identification sample; and when the joint loss value meets a convergence condition, obtaining a trained formula recognition model.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

The above detailed description is provided for a formula recognition model training method, a formula recognition model training device and a model training device, and the specific examples are applied in this document to explain the principle and implementation of the present invention, and the description of the above embodiments is only used to help understand the method and core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for training a formula recognition model, the method comprising:

2. The method of claim 1, wherein the encoding features comprise a feature map, and wherein calculating the loss value of the formula recognition model according to the formula recognition result and the target recognition result output by the decoder comprises:

sequencing the M second feature graphs according to a preset rule;

3. The method according to claim 2, wherein the splitting the first feature map to obtain M second feature maps comprises:

determining a relative position code corresponding to the first feature map;

4. The method according to claim 2, wherein the splitting the first feature map to obtain M second feature maps comprises:

determining a width value of the first feature map;

5. The method of claim 2, wherein if the value of N is 1, the decoding result corresponding to the N-1 th second feature map is a preset value.

6. The method of any of claims 1 to 5, further comprising:

7. An apparatus for training a formula recognition model, the apparatus comprising:

8. The apparatus of claim 7, wherein the coding features comprise a feature map, and wherein the loss value calculation module comprises:

9. The apparatus of claim 8, wherein the split processing sub-module comprises:

10. The apparatus of claim 8, wherein the split processing sub-module comprises:

11. The apparatus of claim 8, wherein if the value of N is 1, the decoding result corresponding to the N-1 th second feature map is a preset value.

12. The apparatus of any of claims 7 to 11, further comprising:

13. An apparatus for model training, the apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs configured to be executed by the one or more processors comprise instructions for performing a method of training a formula recognition model according to any one of claims 1-6.

14. A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform a method of training a formula recognition model according to any one of claims 1 to 6.