CN108228568B - Mathematical problem semantic understanding method - Google Patents

Mathematical problem semantic understanding method Download PDF

Info

Publication number
CN108228568B
CN108228568B CN201810067659.7A CN201810067659A CN108228568B CN 108228568 B CN108228568 B CN 108228568B CN 201810067659 A CN201810067659 A CN 201810067659A CN 108228568 B CN108228568 B CN 108228568B
Authority
CN
China
Prior art keywords
mathematical
text
semantic understanding
entity
steps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810067659.7A
Other languages
Chinese (zh)
Other versions
CN108228568A (en
Inventor
谢德刚
李巧艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Mutual Education Intelligent Technology Co.,Ltd.
Original Assignee
Shanghai Hujiao Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hujiao Education Technology Co ltd filed Critical Shanghai Hujiao Education Technology Co ltd
Priority to CN201810067659.7A priority Critical patent/CN108228568B/en
Publication of CN108228568A publication Critical patent/CN108228568A/en
Application granted granted Critical
Publication of CN108228568B publication Critical patent/CN108228568B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

A mathematical problem semantic understanding method comprises S1, preprocessing a mathematical problem text to standardize the mathematical problem text; s2, identifying entity types of mathematical symbols and formulas in the mathematical subject texts, and converting the mathematical symbols and the formulas into indication pronouns; s3, dividing the long text in the math question text into short texts with complete and independent semantics; s4, taking the short text with the labels as a sample, building a multi-classification neural network model, and carrying out model training; and S5, based on the classification result of the mathematical knowledge type expressed by the first-order logic language, entity filling is carried out to obtain complete mathematical knowledge expressed by the first-order logic language, and the semantic understanding of the mathematical subject is completed.

Description

Mathematical problem semantic understanding method
Technical Field
The invention belongs to the technical field of intelligent teaching, and particularly relates to a mathematical topic semantic understanding method.
Background
With the continuous development of artificial intelligence technology, the combination of deep learning and natural language processing technology makes breakthrough progress in the aspect of natural language. Research in educational AI is also receiving increasing attention. Among them, the automatic problem solving technology is the research enthusiasm. The precondition for the automatic problem solving of the computer is to make the computer understand the problem meaning. At present, the semantic understanding of the mathematical topics requires a large amount of work based on the traditional natural language processing technology, and the topic information extraction effect is not satisfactory.
Disclosure of Invention
The embodiment of the invention provides a mathematical problem semantic understanding method, and aims to solve the problem that the existing mathematical problem semantic understanding only utilizes the traditional natural language processing technology.
In order to solve the above technical problem, an embodiment of the present invention provides a mathematical topic semantic understanding method, including the following steps:
s1: preprocessing a mathematical text and normalizing the text;
s2: carrying out entity type recognition on mathematical symbols and formulas in the mathematical text, and converting into indication pronouns;
s3: dividing a long text of a mathematical problem into short texts with complete semantics and independence;
s4: taking the short text with the label as a sample, building a multi-classification neural network model, and training;
s5: and (4) performing entity filling based on the classification result of the mathematical knowledge type represented by the first-order logic language to obtain complete mathematical knowledge represented by the first-order logic language, and completing semantic understanding of the mathematical problem.
The reference-resolved, first-order logic language referred to in this disclosure has the following explanation.
And resolving the reference, namely determining which noun the pronouns are referred to, and dividing the pronouns into a back finger and a pre-finger. The backward meaning is that the antecedent of the pronoun is in front of the pronoun, and the pre-meaning is that the antecedent of the pronoun is behind the pronoun. The method refers to a resolved target, namely, replaces pronouns in the mathematical text to be specific entities, and supplements the questions completely.
The first-order logic language is a formal language, namely, the first-order predicate logic is a symbolic tool for abstract reasoning. The logic predicates are used as centers, and the mathematical basic elements are used as constituent elements to form a mathematical first-order logic language.
The invention has the advantages that the deep learning technology is applied to the semantic understanding of mathematical questions, the information extraction is decomposed into different task steps, the knowledge representation of the extracted questions is creatively converted into the multi-classification task based on the short mathematical texts, the complexity of the computer on the understanding of the mathematical languages is reduced, the accuracy of the information extraction is improved, the problem of great difficulty of intelligent answering on the semantic understanding is solved, and the application of the deep learning in the field of the intelligent mathematic answering is promoted.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 is a flowchart of a mathematical topic semantic understanding method according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating an exemplary flow of a semantic understanding method for mathematical topics in an embodiment of the present invention.
Detailed Description
As shown in fig. 1, an embodiment of the present invention provides a method for semantic understanding of a mathematical topic, including the following steps:
s1: preprocessing a mathematical text and normalizing the text;
s2: carrying out entity type recognition on mathematical symbols and formulas in the mathematical text, and converting into indication pronouns;
s3: dividing a long text of a mathematical problem into short texts with complete semantics and independence;
s4: taking the short text with the label as a sample, building a multi-classification neural network model, and training;
s5: and performing entity filling based on the classification result of the mathematical knowledge type represented by the first-order logic language.
In another embodiment of the present invention, a method for semantic comprehension of a mathematical topic includes the following steps:
s1, preprocessing the math question text to standardize the math question text;
s2, identifying entity types of mathematical symbols and formulas in the mathematical subject texts, and converting the mathematical symbols and the formulas into indication pronouns;
s3, dividing the long text in the math question text into short texts with complete and independent semantics;
s4, taking the short text with the labels as a sample, building a multi-classification neural network model, and carrying out model training;
and S5, based on the classification result of the mathematical knowledge type expressed by the first-order logic language, entity filling is carried out to obtain complete mathematical knowledge expressed by the first-order logic language, and the semantic understanding of the mathematical subject is completed.
The step S1 specifically includes the following steps:
the mathematical topic text is standardized, which includes washing the mathematical topic text to remove meaningless symbols or words.
The step S2 specifically includes the following steps:
s21, preparing a manually marked sample for model training and standby according to mathematical symbols and mathematical formulas in a mathematical subject text;
s22, carrying out named entity recognition based on the LSTM + crf model, and realizing entity annotation of a new question;
s23, carrying out pronoun reference resolution on the mathematical title based on the improved fragment-pair model.
The step S3 specifically includes the following steps:
s31, marking the mathematical text to be segmented by using a 2-tags mark, and respectively representing a segmentation symbol by using a letter S and a non-segmentation symbol by using a letter N;
and S32, training by using a CRF model to realize the cutting of the mathematically long text.
The step S4 specifically includes the following steps:
s41, performing first-order logic language category manual labeling based on the short text processed in the steps S1-S3, and preparing a training sample;
and S42, building a multi-classification deep learning model based on the labeled training samples, and performing model training.
The step S4 specifically includes:
and aiming at the first-order logic language class obtained from the short text and the extracted entity, entity filling is carried out to obtain a complete formal representation language, and the short text information extraction is completed.
As shown in FIG. 2, the method is a practical example of a semantic understanding method for a mathematical topic, and the processing procedure for the example comprises the following steps:
s1: preprocessing a mathematical text, for example, if the length of a line segment obtained by cutting a straight line x + y ^ 0(a >0) with the title "known circle M { { x } {2} } -2ay } + { { y } {2} } -2ay ^ 0(a >0) is 2\ sqrt {2}, the position relation between the circle M and the circle N (x-1) ^2+ (y-1) ^2 ^ 1 is that ()" the length of a line segment obtained by cutting the straight line x + y ^ 0(a >0) through latex, de-spacing and the like is normalized to be "known circle M: x ^2+ y ^2-2ay ^ 0(a >0) is 2 ^ sqrt {2}, and the position relation between the circle M and the circle N (x-1) ^2+ (y-1) ^2 ^ 1";
s2: and (3) performing entity type recognition and reference resolution on mathematical symbols and formulas in the mathematical text, for example, continuing to process the normalized subject text in fig. 2 into: it is known that the length of the Line segment obtained by cutting the straight Line l _0# # Line by x ^2+ y ^2-2ay ^ 0(a >0) ## express and the Line segment obtained by cutting the straight Line l _0# # Line by x + y ## express is 2 ^ sqrt {2} express, and the position relationship between the Circle M # # Circle x ^2+ y ^2-2ay ^ 0(a >0) ## express and the Circle N # # Circle (x-1) ^2+ (y-1) ^2 ^ 1# # express is () is (a).
S3: and (2) segmenting the long text of the mathematical topic into short texts with complete semantics and independence, and confirming the segmented texts again according to rules, wherein the set { x | x ^2<9, x in R } is complete if the guarantee interval [2, 8 ]. In fig. 2, the title long text is segmented into two short texts with complete semantics: (1) the length of a Line segment obtained by cutting a straight Line l _0# # Line by x ^2+ y ^2-2ay ^ 0(a >0) ## express in a Circle M # # Circle is 2 ^ sqrt {2} express; (2) the position relationship between the Circle M # # Circle x ^2+ y ^2-2ay ^ 0(a >0) ## express and the Circle N # # Circle (x-1) ^2+ (y-1) ^2 ^ 1# # express is ();
s4: taking a short text with entity labels as a sample, building a multi-classification neural network model, inputting a short text sequence based on a word2vec training word vector, and training;
s5: based on the mathematical knowledge type classification result expressed by the first-order logic language (as shown in fig. 2, the first-order logic types are (1) circlescanatlength (), (2) positionrelationship of circleline (), the entities extracted from the short text are filled in the logic predicates. And if the logic predicates do not accord with the entity number, the information extraction is wrong. And finally, obtaining complete mathematical knowledge expressed based on the first-order logic language, and completing the semantic understanding of the mathematical problem. Fig. 2 shows the final extracted result based on the method:
(1)CircleSecantLength(Circle(M,x^2+y^2-2ay=0(a>0)),Line(l_0,x+y=0));
(2)PositionRelationOfCircle(Circle(M,x^2+y^2-2ay=0(a>0)),Circle(N,(x-1)^2+(y-1)^2=1),position(null))。
it should be noted that while the foregoing has described the spirit and principles of the invention with reference to several specific embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in these aspects cannot be combined. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (6)

1. A mathematical topic semantic understanding method is characterized by comprising the following steps:
s1, preprocessing the math question text to standardize the math question text;
s2, identifying entity types of mathematical symbols and formulas in the mathematical subject texts, and converting the mathematical symbols and the formulas into indication pronouns;
s3, dividing the long text in the math question text into short texts with complete and independent semantics;
s4, taking the short text with the labels as a sample, building a multi-classification neural network model, and carrying out model training;
and S5, based on the classification result of the mathematical knowledge type expressed by the first-order logic language, entity filling is carried out to obtain complete mathematical knowledge expressed by the first-order logic language, and the semantic understanding of the mathematical subject is completed.
2. The semantic understanding method of the mathematical topic according to claim 1, characterized in that: the step S1 specifically includes the following steps:
the mathematical topic text is standardized, which includes washing the mathematical topic text to remove meaningless symbols or words.
3. The semantic understanding method of the mathematical topic according to claim 1, characterized in that: the step S2 specifically includes the following steps:
s21, preparing a manually marked sample for model training and standby according to mathematical symbols and mathematical formulas in a mathematical subject text;
s22, carrying out named entity recognition based on the LSTM + crf model, and realizing entity annotation of a new question;
s23, carrying out pronoun reference resolution on the mathematical title based on the improved fragment-pair model.
4. The semantic understanding method of the mathematical topic according to claim 1, characterized in that: the step S3 specifically includes the following steps:
s31, marking the mathematical text to be segmented by using a 2-tags mark, and respectively representing a segmentation symbol by using a letter S and a non-segmentation symbol by using a letter N;
and S32, training by using a CRF model to realize the cutting of the mathematically long text.
5. The semantic understanding method of the mathematical topic according to claim 1, characterized in that: the step S4 specifically includes the following steps:
s41, performing first-order logic language category manual labeling based on the short text processed in the steps S1-S3, and preparing a training sample;
and S42, building a multi-classification deep learning model based on the labeled training samples, and performing model training.
6. The semantic understanding method of the mathematical topic according to claim 1, characterized in that: the step S5 specifically includes:
and aiming at the first-order logic language class obtained from the short text and the extracted entity, entity filling is carried out to obtain a complete formal representation language, and the short text information extraction is completed.
CN201810067659.7A 2018-01-24 2018-01-24 Mathematical problem semantic understanding method Active CN108228568B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810067659.7A CN108228568B (en) 2018-01-24 2018-01-24 Mathematical problem semantic understanding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810067659.7A CN108228568B (en) 2018-01-24 2018-01-24 Mathematical problem semantic understanding method

Publications (2)

Publication Number Publication Date
CN108228568A CN108228568A (en) 2018-06-29
CN108228568B true CN108228568B (en) 2021-06-04

Family

ID=62668740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810067659.7A Active CN108228568B (en) 2018-01-24 2018-01-24 Mathematical problem semantic understanding method

Country Status (1)

Country Link
CN (1) CN108228568B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062904B (en) * 2018-08-23 2022-05-20 上海互教教育科技有限公司 Logic predicate extraction method and device
CN109190099B (en) * 2018-08-23 2022-12-13 上海互教教育科技有限公司 Sentence pattern extraction method and device
KR101986721B1 (en) * 2019-03-27 2019-06-10 월드버텍 주식회사 Method for providing mathematical principle prediction serivce for math word problem using neural machine translation and math corpus
CN111209738B (en) * 2019-12-31 2021-03-26 浙江大学 Multi-task named entity recognition method combining text classification
CN111931020B (en) * 2020-10-12 2021-01-29 北京世纪好未来教育科技有限公司 Formula labeling method, device, equipment and storage medium
CN115438624B (en) * 2022-11-07 2023-03-24 江西风向标智能科技有限公司 Identification method, system, storage medium and equipment for question setting intention of mathematical question
CN117252202B (en) * 2023-11-20 2024-03-19 江西风向标智能科技有限公司 Construction method, identification method and system for named entities in high school mathematics topics

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106886516A (en) * 2017-02-27 2017-06-23 竹间智能科技(上海)有限公司 The method and device of automatic identification statement relationship and entity
CN107301163A (en) * 2016-04-14 2017-10-27 科大讯飞股份有限公司 Text semantic analysis method and device comprising formula
CN107423286A (en) * 2017-07-05 2017-12-01 华中师范大学 The method and system that elementary mathematics algebraically type topic is answered automatically
CN107463553A (en) * 2017-09-12 2017-12-12 复旦大学 For the text semantic extraction, expression and modeling method and system of elementary mathematics topic

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301163A (en) * 2016-04-14 2017-10-27 科大讯飞股份有限公司 Text semantic analysis method and device comprising formula
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106886516A (en) * 2017-02-27 2017-06-23 竹间智能科技(上海)有限公司 The method and device of automatic identification statement relationship and entity
CN107423286A (en) * 2017-07-05 2017-12-01 华中师范大学 The method and system that elementary mathematics algebraically type topic is answered automatically
CN107463553A (en) * 2017-09-12 2017-12-12 复旦大学 For the text semantic extraction, expression and modeling method and system of elementary mathematics topic

Also Published As

Publication number Publication date
CN108228568A (en) 2018-06-29

Similar Documents

Publication Publication Date Title
CN108228568B (en) Mathematical problem semantic understanding method
CN107133220B (en) Geographic science field named entity identification method
CN106777275B (en) Entity attribute and property value extracting method based on more granularity semantic chunks
CN110287494A (en) A method of the short text Similarity matching based on deep learning BERT algorithm
CN109918666A (en) A kind of Chinese punctuation mark adding method neural network based
CN108121702B (en) Method and system for evaluating and reading mathematical subjective questions
TWI608367B (en) Text readability measuring system and method thereof
CN108664474B (en) Resume analysis method based on deep learning
CN108182177A (en) A kind of mathematics knowledge-ID automation mask method and device
CN105975555A (en) Bidirectional recursive neural network-based enterprise abbreviation extraction method
CN111159356B (en) Knowledge graph construction method based on teaching content
CN108052504B (en) Structure analysis method and system for mathematic subjective question answer result
CN110781681B (en) Automatic first-class mathematic application problem solving method and system based on translation model
CN110825867B (en) Similar text recommendation method and device, electronic equipment and storage medium
CN107301163A (en) Text semantic analysis method and device comprising formula
CN109902160B (en) Method and system for automatically solving circuit questions
CN109190099B (en) Sentence pattern extraction method and device
CN110399433A (en) A kind of data entity Relation extraction method based on deep learning
CN112380864A (en) Text triple labeling sample enhancement method based on translation
CN103500216A (en) Method for extracting file information
CN110765241A (en) Super-outline detection method and device for recommendation questions, electronic equipment and storage medium
CN107168949A (en) Mathematics natural language processing implementation method, system based on combination of entities
CN113011154A (en) Job duplicate checking method based on deep learning
CN105955954A (en) New enterprise name finding method based on bidirectional recurrent neural network
CN116561274A (en) Knowledge question-answering method based on digital human technology and natural language big model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Building 10, Lane 2277, Zuchongzhi Road, Pudong New Area Free Trade Pilot Zone, Shanghai, 200000

Patentee after: Shanghai Mutual Education Intelligent Technology Co.,Ltd.

Address before: Room a684-05, building 2, 351 GuoShouJing Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai, 201203

Patentee before: SHANGHAI HUJIAO EDUCATION TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address