CN113903034A - Formula-based data processing method and device - Google Patents

Formula-based data processing method and device Download PDF

Info

Publication number
CN113903034A
CN113903034A CN202111155055.6A CN202111155055A CN113903034A CN 113903034 A CN113903034 A CN 113903034A CN 202111155055 A CN202111155055 A CN 202111155055A CN 113903034 A CN113903034 A CN 113903034A
Authority
CN
China
Prior art keywords
character string
formula
training
string
grammar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111155055.6A
Other languages
Chinese (zh)
Inventor
秦波
辛晓哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN202111155055.6A priority Critical patent/CN113903034A/en
Publication of CN113903034A publication Critical patent/CN113903034A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/111Mathematical or scientific formatting; Subscripts; Superscripts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Optimization (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Algebra (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The application discloses a formula-based data processing method, which can obtain a formula character string to be processed with grammar errors, and then input the formula character string to be processed into a machine learning model to obtain a target formula character string. The machine learning model mentioned here can correct the formula character string with grammar error into the formula character string meeting the grammar requirement. Therefore, in the embodiment of the present application, after the formula character string to be processed is input into the machine learning model, a target formula character string meeting the grammar requirement can be obtained, in other words, the obtained target formula character string has no grammar error. Therefore, after the target formula character string is obtained, the target formula can be obtained according to the target formula character string. Therefore, by using the scheme of the embodiment of the application, the grammar errors of the formula character string to be processed, which have grammar errors, can be corrected, so that the target formula is obtained.

Description

Formula-based data processing method and device
Technical Field
The present application relates to the field of data processing, and in particular, to a formula-based data processing method and apparatus.
Background
In some scenarios, it is desirable to identify formulas in the picture. The identified formula may be described using a formula description language, such as the latex language. However, when describing a formula using a formula description language, it is necessary to meet the grammar standard of the formula description language, otherwise, the identified formula cannot be correctly output. Therefore, if there is a grammar error due to erroneous recognition occurring at the time of formula recognition, the recognized formula cannot be output.
Therefore, there is a need for a scheme that can correctly output a corresponding formula even when a syntax error exists in the identified formula.
Disclosure of Invention
The technical problem that this application will solve is: when the identified formula has grammar errors, the identified formula cannot be output. A formula-based data processing method and device are provided.
In a first aspect, an embodiment of the present application provides a formula-based data processing method, where the method includes:
acquiring a formula character string to be processed, wherein the formula character string to be processed is a character string with grammar errors;
inputting the formula character string to be processed into a machine learning model to obtain a target formula character string; the machine learning model is to: correcting the formula character string with grammar error into a formula character string according with grammar requirements;
and obtaining a target formula according to the target formula character string.
Optionally, the method further includes:
acquiring a training formula character string with grammar errors and a label of the training formula character string, wherein the label of the training character string is used for indicating the grammar errors in the training formula character string;
and training to obtain the machine learning model according to the training formula character string and the label of the training formula character string.
Optionally, the training formula character strings include a character string that affects grammatical validity and a character string that does not affect grammatical validity, where characters included in the character string that does not affect grammatical validity are all the same character.
Optionally, the obtaining of the training formula character string includes:
and obtaining the formula character string without grammar error, and processing the formula character string without grammar error to obtain the training formula character string.
Optionally, the machine learning model is a bert model, and an algorithm used for training the machine learning model is a mask language model MLM algorithm.
Optionally, the formula string to be processed is a latex string.
Optionally, before obtaining the character string to be processed, the method further includes:
and determining that the character string to be processed has grammar errors.
In a second aspect, an embodiment of the present application provides a formula-based data processing apparatus, where the apparatus includes:
the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a formula character string to be processed, and the formula character string to be processed is a character string with a grammar error;
the first processing unit is used for inputting the formula character string to be processed into a machine learning model to obtain a target formula character string; the machine learning model is to: correcting the formula character string with grammar error into a formula character string according with grammar requirements;
and the second processing unit is used for obtaining a target formula according to the target formula character string.
Optionally, the apparatus further comprises:
the second obtaining unit is used for obtaining a training formula character string with grammar errors and a label of the training formula character string, wherein the label of the training character string is used for indicating the grammar errors in the training formula character string;
and the training unit is used for training to obtain the machine learning model according to the training formula character string and the label of the training formula character string.
Optionally, the training formula character strings include a character string that affects grammatical validity and a character string that does not affect grammatical validity, where characters included in the character string that does not affect grammatical validity are all the same character.
Optionally, the obtaining of the training formula character string includes:
and obtaining the formula character string without grammar error, and processing the formula character string without grammar error to obtain the training formula character string.
Optionally, the machine learning model is a bert model, and an algorithm used for training the machine learning model is a mask language model MLM algorithm.
Optionally, the formula string to be processed is a latex string.
Optionally, the apparatus further comprises:
and the determining unit is used for determining that the character string to be processed has grammar errors before the character string to be processed is obtained.
In a third aspect, an embodiment of the present application provides a formula-based data processing apparatus, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for:
acquiring a formula character string to be processed, wherein the formula character string to be processed is a character string with grammar errors;
inputting the formula character string to be processed into a machine learning model to obtain a target formula character string; the machine learning model is to: correcting the formula character string with grammar error into a formula character string according with grammar requirements;
and obtaining a target formula according to the target formula character string.
Optionally, the operations further include:
acquiring a training formula character string with grammar errors and a label of the training formula character string, wherein the label of the training character string is used for indicating the grammar errors in the training formula character string;
and training to obtain the machine learning model according to the training formula character string and the label of the training formula character string.
Optionally, the training formula character strings include a character string that affects grammatical validity and a character string that does not affect grammatical validity, where characters included in the character string that does not affect grammatical validity are all the same character.
Optionally, the obtaining of the training formula character string includes:
and obtaining the formula character string without grammar error, and processing the formula character string without grammar error to obtain the training formula character string.
Optionally, the machine learning model is a bert model, and an algorithm used for training the machine learning model is a mask language model MLM algorithm.
Optionally, the formula string to be processed is a latex string.
Optionally, before obtaining the character string to be processed, the operations further include:
and determining that the character string to be processed has grammar errors.
In a fourth aspect, embodiments of the present application provide a computer-readable medium having stored thereon instructions, which, when executed by one or more processors, cause an apparatus to perform the method of any of the above first aspects.
Compared with the prior art, the embodiment of the application has the following advantages:
the embodiment of the application provides a data processing method based on a formula, wherein the method comprises the following steps: the method comprises the steps of obtaining a formula character string to be processed with grammar errors, inputting the formula character string to be processed into a machine learning model, and obtaining a target formula character string. The machine learning model mentioned here can correct the formula character string with grammar error into the formula character string meeting the grammar requirement. Therefore, in the embodiment of the present application, after the formula character string to be processed is input into the machine learning model, a target formula character string meeting the grammar requirement can be obtained, in other words, the obtained target formula character string has no grammar error. Therefore, after the target formula character string is obtained, the target formula can be obtained according to the target formula character string. Therefore, by using the scheme of the embodiment of the application, the grammar errors of the formula character string to be processed, which have grammar errors, can be corrected, so that the target formula is obtained.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flowchart of a formula-based data processing method according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a formula-based data processing apparatus according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a client according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The inventors of the present application have found through research that the identified formula can be described using a formula description language such as a latex language. However, when a formula is described using a formula description language, it is necessary to comply with the grammatical standard of the formula description language. When there is a syntax error in the identified formula described using the formula description language, the identified formula cannot be correctly output. For example, if there is one "missing" in the identified formula described in the latex language, the identified formula cannot be output. Therefore, if syntax error correction can be performed on the formula described by the formula description language, the above problem can be solved.
In view of this, the present application provides a data processing method and apparatus based on a formula.
Various non-limiting embodiments of the present application are described in detail below with reference to the accompanying drawings.
Exemplary method
Referring to fig. 1, the figure is a schematic flowchart of a formula-based data processing method according to an embodiment of the present application.
The method provided by the embodiment of the present application may be performed, for example, by a first device, where the first device mentioned herein includes, but is not limited to, a terminal device and a server. The terminal device mentioned here may be a mobile terminal such as a smart phone or a tablet computer, or may be a terminal device such as a desktop computer.
The data processing method based on the formula provided by the embodiment of the application can be applied to the post-processing stage of formula identification. Specifically, the method comprises the following steps: after the formula character string is obtained through image recognition, whether a grammar error exists in the formula character string obtained through recognition is judged according to a grammar discriminator, and if the grammar error does not exist, the formula can be directly output. If a grammar error exists, the identified formula character string can be further used as a formula character string to be processed in the embodiment of the application, and the data processing method based on the formula provided by the embodiment of the application is executed, so that the grammar error correction of the formula character string to be processed is realized, and the formula included in the image is correctly output.
The method shown in fig. 1 can be implemented, for example, by the following S101 to S103.
S101: and acquiring a formula character string to be processed, wherein the formula character string to be processed is a character string with grammar errors.
In the embodiment of the present application, the formula string to be processed is a string described by using a formula description language, and in one example, the formula string to be processed is a latex string.
And if the formula character string to be processed has grammar errors, the formula character string to be processed does not accord with the grammar standard of the formula description language corresponding to the formula character string to be processed. For example, the string to be processed includes "{" but not "}"; for another example, the pending string includes "begin" but not "end".
In an example, before executing S101, the obtained formula character string to be processed may also be judged to determine whether there is a syntax error in the formula character string to be processed, and after determining that there is a syntax error in the formula character string to be processed, S101 is executed again. In one example, a grammar discriminator may be used to determine whether there is a grammar error in the formula string to be processed, and will not be described in detail herein with respect to the grammar discriminator.
S102: inputting the formula character string to be processed into a machine learning model to obtain a target formula character string; the machine learning model is to: and correcting the formula character string with the grammar error into the formula character string meeting the grammar requirement.
In the embodiment of the present application, the formula string that meets the grammar requirement, that is, the formula string that has no grammar error.
In the embodiment of the application, after the character string of the formula to be processed is obtained, a machine learning model can be used for syntax error correction of the character string to be processed. Wherein the machine learning model is capable of outputting a formula string without a grammatical error according to a formula string with a grammatical error. In other words, after the to-be-processed formula character string is input into the machine learning model, the machine learning model may correct the to-be-processed formula character string and output a target formula character string without a grammatical error.
In the embodiment of the present application, the machine learning model may be obtained by training in advance, and as to a specific manner of obtaining the machine learning model by training, reference may be made to the following relevant description section, which is not described in detail herein.
The embodiments of the present application do not specifically limit the machine learning model, which in one example may be a bert model. In addition, an algorithm used for training the machine learning Model is not specifically limited in the embodiment of the present application, and in an example, when the machine learning Model is a bert Model, the algorithm used for training the machine learning Model is a Mask Language Model (MLM) algorithm.
S103: and obtaining a target formula according to the target formula character string.
It can be understood that there is no syntax error in the target formula string, and therefore, the target formula can be obtained from the target formula string. In one example, the target formula character string may be identified by a formula identifier that supports a description language corresponding to the target formula character string, so as to obtain a target formula. For example, a formula recognizer supporting the latex language may be used to recognize the target formula string, thereby obtaining the target formula.
As can be seen from the above description, with the scheme of the embodiment of the present application, for a formula character string to be processed having a syntax error, the syntax error of the formula character string to be processed can be corrected, so as to obtain a target formula.
Next, a method of training the machine learning model will be described.
In the embodiment of the application, the machine learning model can be obtained through the following steps A-B training.
In the following description, the formula string to be processed is taken as a latex string as an example for explanation.
Step A: and acquiring a training formula character string with grammar errors and a label of the training formula character string, wherein the label of the training character string is used for indicating the grammar errors in the training formula character string.
In the embodiment of the present application, when the training formula character string is obtained, there may be a plurality of implementation manners. In one example, a formula string with grammatical errors obtained from image recognition may be collected as the training formula string.
In yet another example, a formula string without a syntax error may be obtained (e.g., generated) first, and then the formula string without the syntax error may be processed to obtain a formula string with a syntax error. For example, several characters in the character string without the syntax error may be deleted randomly, and for example, the positions of several characters in the character string without the syntax error may be modified randomly, and so on.
In the embodiment of the present application, the label of the training formula character string may be manually labeled in advance. In one example, a label may be added between any two characters of the training formula string to indicate whether the position requires the addition or deletion of a character. Wherein: when the position needs to be added with characters, the label is also used for indicating the characters needing to be added specifically. The character in the position needs to be deleted, which may be a previous character in the position or a next character in the position, and the embodiment of the present application is not particularly limited. For example: there is a label 1 between character a and character B, which label 1 can be used to indicate that the position needs to be added with the character "{". For another example: there is a label 2 between character a and character B, which label 2 may be used to indicate that character a needs to be deleted. It is understood that the label set formed by the labels between any two characters of the training formula character string is the label of the training formula character string.
It will be appreciated that not all of the characters in the training formula string can affect the legitimacy of the training formula string for the training formula string. In other words, the training formula strings include strings that affect grammatical validity and strings that do not affect grammatical validity. Legality of said training formula string as referred to hereinIt means whether the training formula string conforms to the syntax of the formula description language corresponding to the training formula string. For example, for a formula
Figure BDA0003288399340000081
The corresponding latex string is: "\\ frac { a } { b }", the characters "\\ frac", "{", "}" are characters that affect grammatical validity, and the characters "a" and "b" are characters that do not affect grammatical validity. In other words, a string affecting grammatical legitimacy includes five characters, respectively the character "\ frac", two characters "{" and two characters "}", and a string not affecting grammatical legitimacy includes two characters, respectively the character "a" and the character "b".
It will be appreciated that for a string of training formulas, individual characters in the string that do not affect legitimacy may be used with individual possible characters. However, the character string which does not affect the legality of the character string of the training formula, so the value of the character string which does not affect the legality does not affect the recognition effect of the trained machine learning model. Therefore, in one example, to facilitate management of the training formula string, the string that does not affect legitimacy is also prevented from interfering with the string that affects legitimacy. And the characters included in the character string which does not influence the legality of the grammar are all the same. For example, all of the characters "a" and all of the characters "b" are used, and the embodiments of the present application are not particularly limited.
And B: and training to obtain the machine learning model according to the training formula character string and the label of the training formula character string.
After the training formula character string and the label of the training formula character string are obtained, the machine learning model can be obtained through training by using the training formula character string and the label of the training formula character string. In one example, the bert model may be trained using an MLM algorithm, the training formula string, and a label of the training formula string.
Exemplary device
Based on the method provided by the above embodiment, the embodiment of the present application further provides an apparatus, which is described below with reference to the accompanying drawings.
Referring to fig. 2, the figure is a schematic structural diagram of a formula-based data processing apparatus according to an embodiment of the present application. The apparatus 200 may specifically include, for example: a first acquisition unit 201, a first processing unit 202 and a second processing unit 203.
A first obtaining unit 201, configured to obtain a formula character string to be processed, where the formula character string to be processed is a character string with a syntax error;
the first processing unit 202 is configured to input the formula character string to be processed into a machine learning model, so as to obtain a target formula character string; the machine learning model is to: correcting the formula character string with grammar error into a formula character string according with grammar requirements;
and the second processing unit 203 is configured to obtain a target formula according to the target formula character string.
Optionally, the apparatus further comprises:
the second obtaining unit is used for obtaining a training formula character string with grammar errors and a label of the training formula character string, wherein the label of the training character string is used for indicating the grammar errors in the training formula character string;
and the training unit is used for training to obtain the machine learning model according to the training formula character string and the label of the training formula character string.
Optionally, the training formula character strings include a character string that affects grammatical validity and a character string that does not affect grammatical validity, where characters included in the character string that does not affect grammatical validity are all the same character.
Optionally, the obtaining of the training formula character string includes:
and obtaining the formula character string without grammar error, and processing the formula character string without grammar error to obtain the training formula character string.
Optionally, the machine learning model is a bert model, and an algorithm used for training the machine learning model is a mask language model MLM algorithm.
Optionally, the formula string to be processed is a latex string.
Optionally, the apparatus further comprises:
and the determining unit is used for determining that the character string to be processed has grammar errors before the character string to be processed is obtained.
Since the apparatus 200 is an apparatus corresponding to the method provided in the above method embodiment, and the specific implementation of each unit of the apparatus 200 is the same as that of the above method embodiment, for the specific implementation of each unit of the apparatus 200, reference may be made to the description part of the above method embodiment, and details are not repeated here.
The method provided by the embodiment of the present application may be executed by a client or a server, and the client and the server that execute the method are described below separately.
Fig. 3 shows a block diagram of a client 300. For example, the client 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
Referring to fig. 3, client 300 may include one or more of the following components: processing component 302, memory 304, power component 306, multimedia component 308, audio component 310, input/output (I/O) interface 33, sensor component 314, and communication component 316.
The processing component 302 generally controls overall operation of the client 300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 302 may include one or more processors 320 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 302 can include one or more modules that facilitate interaction between the processing component 302 and other components. For example, the processing component 302 can include a multimedia module to facilitate interaction between the multimedia component 308 and the processing component 302.
The memory 304 is configured to store various types of data to support operations at the client 300. Examples of such data include instructions for any application or method operating on the client 300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 304 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power component 306 provides power to the various components of the client 300. The power components 306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the client 300.
The multimedia component 308 comprises a screen providing an output interface between the client 300 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 308 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the client 300 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 310 is configured to output and/or input audio signals. For example, the audio component 310 includes a Microphone (MIC) configured to receive external audio signals when the client 300 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 304 or transmitted via the communication component 316. In some embodiments, audio component 310 also includes a speaker for outputting audio signals.
The I/O interface provides an interface between the processing component 302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
Sensor component 314 includes one or more sensors for providing status assessment of various aspects to client 300. For example, sensor component 314 may detect an open/closed state of device 300, the relative positioning of components, such as a display and keypad of client 300, sensor component 314 may also detect a change in the position of client 300 or a component of client 300, the presence or absence of user contact with client 300, client 300 orientation or acceleration/deceleration, and a change in the temperature of client 300. Sensor assembly 314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 316 is configured to facilitate communications between the client 300 and other devices in a wired or wireless manner. The client 300 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication section 316 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the client 300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the following methods:
acquiring a formula character string to be processed, wherein the formula character string to be processed is a character string with grammar errors;
inputting the formula character string to be processed into a machine learning model to obtain a target formula character string; the machine learning model is to: correcting the formula character string with grammar error into a formula character string according with grammar requirements;
and obtaining a target formula according to the target formula character string.
Optionally, the method further includes:
acquiring a training formula character string with grammar errors and a label of the training formula character string, wherein the label of the training character string is used for indicating the grammar errors in the training formula character string;
and training to obtain the machine learning model according to the training formula character string and the label of the training formula character string.
Optionally, the training formula character strings include a character string that affects grammatical validity and a character string that does not affect grammatical validity, where characters included in the character string that does not affect grammatical validity are all the same character.
Optionally, the obtaining of the training formula character string includes:
and obtaining the formula character string without grammar error, and processing the formula character string without grammar error to obtain the training formula character string.
Optionally, the machine learning model is a bert model, and an algorithm used for training the machine learning model is a mask language model MLM algorithm.
Optionally, the formula string to be processed is a latex string.
Optionally, before obtaining the character string to be processed, the method further includes:
and determining that the character string to be processed has grammar errors.
Fig. 4 is a schematic structural diagram of a server in an embodiment of the present application. The server 400 may vary significantly due to configuration or performance, and may include one or more Central Processing Units (CPUs) 422 (e.g., one or more processors) and memory 432, one or more storage media 430 (e.g., one or more mass storage devices) storing applications 442 or data 444. Wherein the memory 432 and storage medium 430 may be transient or persistent storage. The program stored on the storage medium 430 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 422 may be arranged to communicate with the storage medium 430, and execute a series of instruction operations in the storage medium 430 on the server 400.
Still further, the central processor 422 may perform the following method:
acquiring a formula character string to be processed, wherein the formula character string to be processed is a character string with grammar errors;
inputting the formula character string to be processed into a machine learning model to obtain a target formula character string; the machine learning model is to: correcting the formula character string with grammar error into a formula character string according with grammar requirements;
and obtaining a target formula according to the target formula character string.
Optionally, the method further includes:
acquiring a training formula character string with grammar errors and a label of the training formula character string, wherein the label of the training character string is used for indicating the grammar errors in the training formula character string;
and training to obtain the machine learning model according to the training formula character string and the label of the training formula character string.
Optionally, the training formula character strings include a character string that affects grammatical validity and a character string that does not affect grammatical validity, where characters included in the character string that does not affect grammatical validity are all the same character.
Optionally, the obtaining of the training formula character string includes:
and obtaining the formula character string without grammar error, and processing the formula character string without grammar error to obtain the training formula character string.
Optionally, the machine learning model is a bert model, and an algorithm used for training the machine learning model is a mask language model MLM algorithm.
Optionally, the formula string to be processed is a latex string.
Optionally, before obtaining the character string to be processed, the method further includes:
and determining that the character string to be processed has grammar errors.
The server 400 may also include one or more power supplies 426, one or more wired or wireless network interfaces 450, one or more input-output interfaces 456, one or more keyboards 456, and/or one or more operating systems 441, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.
Embodiments of the present application also provide a computer-readable medium having stored thereon instructions that, when executed by one or more processors, cause an apparatus to perform a method of:
acquiring a formula character string to be processed, wherein the formula character string to be processed is a character string with grammar errors;
inputting the formula character string to be processed into a machine learning model to obtain a target formula character string; the machine learning model is to: correcting the formula character string with grammar error into a formula character string according with grammar requirements;
and obtaining a target formula according to the target formula character string.
Optionally, the method further includes:
acquiring a training formula character string with grammar errors and a label of the training formula character string, wherein the label of the training character string is used for indicating the grammar errors in the training formula character string;
and training to obtain the machine learning model according to the training formula character string and the label of the training formula character string.
Optionally, the training formula character strings include a character string that affects grammatical validity and a character string that does not affect grammatical validity, where characters included in the character string that does not affect grammatical validity are all the same character.
Optionally, the obtaining of the training formula character string includes:
and obtaining the formula character string without grammar error, and processing the formula character string without grammar error to obtain the training formula character string.
Optionally, the machine learning model is a bert model, and an algorithm used for training the machine learning model is a mask language model MLM algorithm.
Optionally, the formula string to be processed is a latex string.
Optionally, before obtaining the character string to be processed, the method further includes:
and determining that the character string to be processed has grammar errors.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (15)

1. A formula-based data processing method, the method comprising:
acquiring a formula character string to be processed, wherein the formula character string to be processed is a character string with grammar errors;
inputting the formula character string to be processed into a machine learning model to obtain a target formula character string; the machine learning model is to: correcting the formula character string with grammar error into a formula character string according with grammar requirements;
and obtaining a target formula according to the target formula character string.
2. The method of claim 1, further comprising:
acquiring a training formula character string with grammar errors and a label of the training formula character string, wherein the label of the training character string is used for indicating the grammar errors in the training formula character string;
and training to obtain the machine learning model according to the training formula character string and the label of the training formula character string.
3. The method of claim 2, wherein the training formula strings include strings that affect grammatical validity and strings that do not affect grammatical validity, and wherein the characters included in the strings that do not affect grammatical validity are all the same character.
4. The method of claim 2, wherein obtaining the training formula string comprises:
and obtaining the formula character string without grammar error, and processing the formula character string without grammar error to obtain the training formula character string.
5. The method according to any one of claims 1-4, wherein the machine learning model is a bert model, and the algorithm used for training the machine learning model is a Mask Language Model (MLM) algorithm.
6. The method of claim 1, wherein the formula string to be processed is a latex string.
7. The method of claim 1, wherein prior to obtaining the pending string, the method further comprises:
and determining that the character string to be processed has grammar errors.
8. A formula-based data processing apparatus, the apparatus comprising:
the device comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a formula character string to be processed, and the formula character string to be processed is a character string with a grammar error;
the first processing unit is used for inputting the formula character string to be processed into a machine learning model to obtain a target formula character string; the machine learning model is to: correcting the formula character string with grammar error into a formula character string according with grammar requirements;
and the second processing unit is used for obtaining a target formula according to the target formula character string.
9. The apparatus of claim 8, further comprising:
the second obtaining unit is used for obtaining a training formula character string with grammar errors and a label of the training formula character string, wherein the label of the training character string is used for indicating the grammar errors in the training formula character string;
and the training unit is used for training to obtain the machine learning model according to the training formula character string and the label of the training formula character string.
10. The apparatus of claim 9, wherein the training formula string comprises a string that affects grammatical validity and a string that does not affect grammatical validity, and wherein the characters included in the string that does not affect grammatical validity are all the same character.
11. The apparatus of claim 9, wherein obtaining the training formula string comprises:
and obtaining the formula character string without grammar error, and processing the formula character string without grammar error to obtain the training formula character string.
12. The apparatus according to any one of claims 8-11, wherein the machine learning model is a bert model, and the algorithm used for training the machine learning model is a mask language model MLM algorithm.
13. The apparatus of claim 8, wherein the formula string to be processed is a latex string.
14. A formula-based data processing apparatus comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein execution of the one or more programs by one or more processors comprises instructions for:
acquiring a formula character string to be processed, wherein the formula character string to be processed is a character string with grammar errors;
inputting the formula character string to be processed into a machine learning model to obtain a target formula character string; the machine learning model is to: correcting the formula character string with grammar error into a formula character string according with grammar requirements;
and obtaining a target formula according to the target formula character string.
15. A computer-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the method of any one of claims 1 to 7.
CN202111155055.6A 2021-09-29 2021-09-29 Formula-based data processing method and device Pending CN113903034A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111155055.6A CN113903034A (en) 2021-09-29 2021-09-29 Formula-based data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111155055.6A CN113903034A (en) 2021-09-29 2021-09-29 Formula-based data processing method and device

Publications (1)

Publication Number Publication Date
CN113903034A true CN113903034A (en) 2022-01-07

Family

ID=79189529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111155055.6A Pending CN113903034A (en) 2021-09-29 2021-09-29 Formula-based data processing method and device

Country Status (1)

Country Link
CN (1) CN113903034A (en)

Similar Documents

Publication Publication Date Title
US9942690B2 (en) Method and device for information push
US10643054B2 (en) Method and device for identity verification
US10949490B2 (en) Method and apparatus for displaying webpage content
JP6918181B2 (en) Machine translation model training methods, equipment and systems
CN107423106B (en) Method and apparatus for supporting multi-frame syntax
EP3176709A1 (en) Video categorization method and apparatus, computer program and recording medium
CN111221559B (en) Application updating method, device, storage medium, terminal and server
CN110781813B (en) Image recognition method and device, electronic equipment and storage medium
US20140380292A1 (en) Method, device, and storage medium for upgrading operating system
KR20160059455A (en) Method and device for identifying encoding of web page
CN107463372B (en) Data-driven page updating method and device
CN115185717A (en) Interface calling method and device, electronic equipment and storage medium
CN108153540B (en) System upgrading method, device, terminal and storage medium
EP2963561A1 (en) Method and device for updating user data
CN113033538A (en) Formula identification method and device
CN109842688B (en) Content recommendation method and device, electronic equipment and storage medium
US20170169467A1 (en) Information push method and device
CN108628883B (en) Data processing method and device and electronic equipment
CN113903034A (en) Formula-based data processing method and device
CN111104110B (en) Global style sharing method, global style sharing device and electronic device
CN107526683B (en) Method and device for detecting functional redundancy of application program and storage medium
CN113807540A (en) Data processing method and device
CN112242142B (en) Voice recognition input method and related device
CN117093267B (en) Storage method, device, equipment and storage medium for branch instruction jump address
CN112612516B (en) Code processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination