CN117633643A - Automatic middle school geometric problem solving method based on contrast learning - Google Patents

Automatic middle school geometric problem solving method based on contrast learning Download PDF

Info

Publication number
CN117633643A
CN117633643A CN202410109877.8A CN202410109877A CN117633643A CN 117633643 A CN117633643 A CN 117633643A CN 202410109877 A CN202410109877 A CN 202410109877A CN 117633643 A CN117633643 A CN 117633643A
Authority
CN
China
Prior art keywords
geometric
feature
middle school
questions
cnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410109877.8A
Other languages
Chinese (zh)
Other versions
CN117633643B (en
Inventor
罗文兵
吴督邦
黄琪
王明文
罗凯威
陈奥
刘祥棋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Normal University
Original Assignee
Jiangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Normal University filed Critical Jiangxi Normal University
Priority to CN202410109877.8A priority Critical patent/CN117633643B/en
Publication of CN117633643A publication Critical patent/CN117633643A/en
Application granted granted Critical
Publication of CN117633643B publication Critical patent/CN117633643B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Computing Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Educational Technology (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an automatic solving method of a middle school geometric problem based on contrast learning, which comprises the following steps: collecting a plurality of middle school geometric questions and answers to obtain a required middle school geometric data set; dividing the geometric problem data into a non-image geometric problem data set and an image geometric problem data set; inputting the non-graph geometric questions into a geometric image generator to obtain a geometric figure with high accuracy; inputting the multi-modal feature vector into a graphic thematic device to obtain a final multi-modal feature vector; a program decoder in the graphic thematic device obtains a solution question answer with high accuracy; and finally, the geometric image generator and the graphic problem solving device are tested together to form a unified large model for solving the self-drawing and graphic geometric problem types. The beneficial effects of the invention are as follows: from a brand new view point, the geometric problem of middle school is divided into two types of questions to be respectively and correspondingly solved, and the two types of questions are fused together to form a model capable of solving the geometric problem of middle school.

Description

一种基于对比学习的中学几何问题自动求解方法An automatic solution method for middle school geometry problems based on comparative learning

技术领域Technical field

本发明涉及机器学习系统领域,具体为一种基于对比学习的中学几何问题自动求解方法。The invention relates to the field of machine learning systems, and is specifically an automatic solution method for middle school geometry problems based on comparative learning.

背景技术Background technique

近年来,研究者们开发的机器学习系统自动解决数学应用题(MWP),由于其较高的学术价值和在智慧教育中的巨大应用潜力而日益受到关注。现有的方法大多集中于解决算术和代数问题,包括传统的机器学习方法和基于网络的模型,而解决几何问题一直鲜有研究。几何作为一个经典的数学问题,其在中学教育中占据了很大一部分。由于几何问题的挑战性和数据特性,它也可以作为一个多模态的数值推理基准,需要在图解和文本上进行联合推理。In recent years, the machine learning system developed by researchers to automatically solve mathematical word problems (MWP) has attracted increasing attention due to its high academic value and huge application potential in smart education. Most existing methods focus on solving arithmetic and algebraic problems, including traditional machine learning methods and network-based models, while solving geometric problems has been rarely studied. As a classic mathematical problem, geometry occupies a large part in secondary education. Due to the challenging nature of geometric problems and data characteristics, it can also serve as a multimodal numerical reasoning benchmark that requires joint reasoning on diagrams and text.

通常来说,一道典型的几何题型主要由文字和几何图形组成。与数学应用题只涉及问题文本相比,几何题型提出了以下新的挑战。首先,附加的问题图提供了问题文本中缺少的基本信息,如线和点的相对位置;因此,求解器应该具有解析关系图的能力。其次,要解决一个几何问题,我们需要同时理解和对齐文本和图表的语义。然而,问题文本往往包含一些对图元的歧义引用和隐含关系,这增加了文本与图元联合推理的难度。第三,许多几何问题在解题过程中需要额外的定理知识。尽管以往有一些方法试图解决上述问题,但其几何问题求解系统的性能远不能令人满意。它们高度依赖于有限的手工规则,并且只在小规模的数据集上验证,这使得它很难推广到更复杂和真实世界的情况。此外,求解过程复杂,这意味着人类很难理解和检验它的可靠性。Generally speaking, a typical geometry question mainly consists of text and geometric figures. Compared with mathematics word problems that only involve problem text, geometry problem types pose the following new challenges. First, the attached problem graph provides basic information missing from the problem text, such as the relative positions of lines and points; therefore, the solver should have the ability to parse relational graphs. Second, to solve a geometric problem, we need to understand and align the semantics of text and diagrams simultaneously. However, question texts often contain some ambiguous references and implicit relationships to primitives, which increases the difficulty of joint reasoning between text and primitives. Third, many geometric problems require additional theorem knowledge during the solution process. Although some methods have tried to solve the above problems in the past, the performance of their geometric problem solving systems is far from satisfactory. They are highly dependent on limited handcrafted rules and are only validated on small-scale datasets, which makes it difficult to generalize to more complex and real-world situations. Furthermore, the complexity of the solution process means that it is difficult for humans to understand and verify its reliability.

最近,许多工作为各种视觉语言推理和生成任务提出了统一的模型,因为底层的视觉/语言理解和推理能力在很大程度上是共同的。受主流进展的启发,我们认为统一的几何问题求解模型也是必要的。首先,几何题型分为自带几何图形的题型,俗称有图几何题;和原文没有图形而需要自己作图来辅助解题的题型,俗称无图几何题。几何题中无论什么题型,在解题过程中都共享几何推理中的一些基本技能和知识。因此,探索统一神经网络在数学领域的一般理解和推理能力是一个很有意义的课题。此外,统一模型不需要辅助模型来确定问题是有图几何题还是无图几何题。这大大提高了解题的效率,并且还减少了因分类题型而出现的错误,从而使得模型能更好的完成解题任务。为此,建立一个在数据层和模型层统一处理几何问题的框架是有价值的,也是值得期待的。Recently, many works have proposed unified models for various visual language reasoning and generation tasks, since the underlying visual/language understanding and reasoning capabilities are largely common. Inspired by mainstream advances, we believe that a unified geometric problem solving model is also necessary. First of all, geometry questions are divided into question types that have their own geometric figures, commonly known as geometry questions with figures; and question types that do not have graphics in the original text and require drawings to assist in solving the problem, commonly known as geometry questions without figures. Regardless of the type of geometry questions, some basic skills and knowledge in geometric reasoning are shared in the problem-solving process. Therefore, exploring the general understanding and reasoning capabilities of unified neural networks in the field of mathematics is a very meaningful topic. Furthermore, the unified model does not require an auxiliary model to determine whether a problem is a graph geometry problem or a graph geometry problem. This greatly improves the efficiency of problem solving and reduces errors caused by classifying problem types, allowing the model to better complete problem-solving tasks. For this reason, it is valuable and worth looking forward to establishing a framework that handles geometric problems uniformly at the data layer and model layer.

发明内容Contents of the invention

为了解决上述技术问题,本发明提出了一种基于对比学习的中学几何问题自动求解方法,从题型视角出发,将几何题型分为两种:有图几何题和无图几何题,针对以上两种题型分别提出了对应的解决方案,最终形成了一个能解决这两类题型的中学几何数学统一大模型。In order to solve the above technical problems, the present invention proposes an automatic solution method for middle school geometry problems based on comparative learning. From the perspective of question types, the geometry question types are divided into two types: geometry questions with figures and geometry questions without figures. In view of the above Corresponding solutions were proposed for the two types of questions, and a unified model of middle school geometry and mathematics that could solve these two types of questions was finally formed.

本发明采用的技术方案如下:一种基于对比学习的中学几何问题自动求解方法,其方法步骤如下:The technical solution adopted by the present invention is as follows: an automatic solution method for middle school geometry problems based on comparative learning, and the method steps are as follows:

步骤S1,数据集构建:收集若干道中学几何题及答案;并分别按照训练集、验证集、测试集对收集若干道几何题及答案进行划分,得到所需的中学几何数据集;Step S1, data set construction: collect several middle school geometry questions and answers; and divide the collected geometry questions and answers according to the training set, verification set, and test set to obtain the required middle school geometry data set;

步骤S2,任务形式化定义:给定包含N条中学几何数据集,经过题型分类器划分成:无图中学几何题数据集B和有图中学几何题数据集C;Step S2, formal definition of the task: given a middle school geometry data set containing N pieces, it is divided into a middle school geometry problem data set without pictures and a middle school geometry problem data set C with pictures through a question type classifier;

步骤S3,无图中学几何题数据集B中有y道需要自己作图的几何题型by和有图中学几何题数据集C中z道自带图形的几何题型cz输入到中学几何问题自动求解模型中的BERT特征编码器中;获取中学几何题题干中的所有字嵌入特征向量;Step S3, there are y geometry question types b y in the middle school geometry question data set B without figures that require drawings by oneself, and z geometry question types c z in the middle school geometry question data set C with figures that have their own graphics, and are input to middle school geometry. In the BERT feature encoder in the automatic problem solving model; obtain all word embedding feature vectors in the middle school geometry question stem;

步骤S4,将BERT特征编码器获得的无图中学几何题题干中的字嵌入特征向量输入到几何图像生成器中,几何图像生成器基于对比学习模型训练与微调得到,用于生成中学几何题所需要的几何图形,采用人工监督的方式进行对比学习模型训练,通过均方误差损失函数计算几何图形生成损失Lprior,优化更新BERT特征编码器和几何图像生成器的参数,获得几何图形;Step S4: Input the word embedding feature vector in the stem of the middle school geometry question without pictures obtained by the BERT feature encoder into the geometric image generator. The geometric image generator is trained and fine-tuned based on the contrastive learning model and used to generate middle school geometry questions. For the required geometric figures, manual supervision is used for comparative learning model training. The geometric figure generation loss L prior is calculated through the mean square error loss function, and the parameters of the BERT feature encoder and geometric image generator are optimized and updated to obtain the geometric figure;

步骤S5,将BERT特征编码器获得的有图中学几何题题干中的字嵌入特征向量以及无图中学几何题和几何图像生成器所生成对应的几何图形输入到有图解题器中,然后有图解题器中内含的图形编码器将几何图形进行编码并特征提取,与BERT特征编码器的字嵌入特征向量进行中学几何题干与几何图形的对齐操作得到最终的多模态特征向量;Step S5: Input the word embedding feature vectors in the stems of the geometry questions in the middle school with pictures obtained by the BERT feature encoder and the corresponding geometric figures generated by the geometry questions in the middle school without pictures and the geometric image generator into the problem solver with pictures, and then The graphics encoder included in the graphic problem solver encodes the geometric figures and extracts features, and performs the alignment operation between the middle school geometry question stem and the geometric figures with the word embedding feature vector of the BERT feature encoder to obtain the final multi-modal feature vector. ;

步骤S6,有图解题器中的程序解码器在多模态特征向量的引导下顺序生成解题程序,采用负对数似然损失函数计算解题错误的生成损失Lg,得到准确率高的解题答案;Step S6, the program decoder in the graphical problem solver sequentially generates problem-solving programs under the guidance of multi-modal feature vectors, and uses the negative log-likelihood loss function to calculate the generation loss L g of problem-solving errors, obtaining a high accuracy The solution to the problem;

步骤S7,将几何图像生成器和有图解题器合在一起测试,形成一个既能解决需要自己作图的无图几何题型,也能解决自带图形的几何题型的统一大模型。Step S7: Combine the geometric image generator and the problem solver with pictures to test together to form a unified large model that can solve not only geometry problems without pictures that require drawing by yourself, but also geometry problems with own graphics.

进一步的,步骤S1中数据集构建,收集若干道中学几何题型及答案,执行以下任务;具体为:Further, in step S1, the data set is constructed, a number of middle school geometry questions and answers are collected, and the following tasks are performed; specifically:

步骤S11,去除重复的中学几何题型及答案;Step S11, remove duplicate middle school geometry question types and answers;

步骤S12,将中学几何题型及答案分类成两种几何题型,有图形的自动分类为有图中学几何题,只有题干而没有几何图形的分类为无图中学几何题;Step S12, classify middle school geometry questions and answers into two types of geometry questions. Those with figures are automatically classified as geometry questions with pictures, and those with only question stems but no geometric figures are classified as geometry questions without pictures;

步骤S13,对中学几何题型进行分类检验,即对同一中学几何题型分类结果采用人工检查和校验;Step S13, perform classification inspection on middle school geometry question types, that is, manual inspection and verification are performed on the classification results of the same middle school geometry question type;

步骤S14,经过人工检查和校验后,按照训练集:验证集:测试集的比例=8:1:1对中学几何题型及答案进行划分;Step S14, after manual inspection and verification, the middle school geometry questions and answers are divided according to the ratio of training set: validation set: test set = 8:1:1;

步骤S15,中学几何题型及答案进行划分后,将训练集以及验证集的中学几何题型进行人工的解题标注;根据答案将解题步骤提炼出来,采用人工方式将解题步骤标注成计算机能识别的程序语言。Step S15: After the middle school geometry question types and answers are divided, the middle school geometry question types of the training set and the verification set are manually annotated; the problem-solving steps are extracted based on the answers, and the problem-solving steps are manually annotated into computer recognized programming language.

进一步的,步骤S3中的BERT特征编码器,使用Transformer模型架构中的编码器模块,由多层双向编码器组成,计算过程如公式(1)所示;Further, the BERT feature encoder in step S3 uses the encoder module in the Transformer model architecture and is composed of multi-layer bidirectional encoders. The calculation process is as shown in formula (1);

(1); (1);

其中,ei w为第i个字令牌wi经过BERT特征编码器得到的相应字嵌入特征向量。Among them, e i w is the corresponding word embedding feature vector obtained by the BERT feature encoder of the i-th word token w i .

进一步的,步骤S4中的几何图像生成器,具体内容包括:Further, the geometric image generator in step S4 includes:

步骤S41,输入数据到几何图像生成器中,输入数据为无图中学几何题题干中的相应字嵌入特征向量;Step S41, input data into the geometric image generator, where the input data is the corresponding word embedding feature vector in the geometry question stem of the middle school without pictures;

步骤S42,几何图像生成器是改编的对比学习模型,对比学习模型是一个文本-图像对的分类模型,对对比学习模型进行训练和微调来达到生成几何图像的下游任务;Step S42, the geometric image generator is an adapted contrastive learning model. The contrastive learning model is a text-image pair classification model. The contrastive learning model is trained and fine-tuned to achieve the downstream task of generating geometric images;

步骤S43,对对比学习模型进行训练:Step S43, train the contrastive learning model:

收集并整理有图几何题数据集,得到中学几何题干和对应的几何图形;Collect and organize the data set of geometry questions with figures to obtain middle school geometry questions and corresponding geometric figures;

分别对中学几何题干和对应的几何图形进行特征提取得到字嵌入特征向量ei w和几何图形特征hCNN,形成文本-图像对;Feature extraction was performed on the middle school geometry question stem and the corresponding geometric figures respectively to obtain the word embedding feature vector e i w and the geometric figure feature h CNN to form a text-image pair;

将文本-图像对的特征输入到文本-图像对的分类模型中去做对比学习,在人工监督的情况下将其中互相匹配的文本-图像对标记为正样本,不匹配的文本-图像对标记为负样本;Input the characteristics of text-image pairs into the classification model of text-image pairs for comparative learning. Under manual supervision, the text-image pairs that match each other are marked as positive samples, and the unmatched text-image pairs are marked as positive samples. is a negative sample;

文本-图像对的分类模型能通过正样本、负样本得到中学几何题干和对应的几何图形,即给出一个字嵌入特征向量ei w找出对应的几何图形特征hCNNThe classification model of text-image pairs can obtain middle school geometry questions and corresponding geometric figures through positive samples and negative samples, that is, a word embedding feature vector e i w is given to find the corresponding geometric figure feature h CNN ;

步骤S44,对对比学习模型进行微调:Step S44, fine-tune the contrastive learning model:

定义文本为x,几何图形为y,产生的图形编码引入Prior:,计算过程如 公式(2)所示,其中Prior产生的图形编码由对比学习模型的图形编码当作真值训练得到; Define the text as x, the geometric figure as y, and introduce the resulting graphics code into Prior: , the calculation process is shown in formula (2), in which the graphic code generated by Prior is obtained by training the graphic code of the contrastive learning model as a true value;

(2); (2);

其中,P(y|x)表示根据本文x来生成几何图形y;表示为经过对比学习 模型训练后,能够根据字嵌入特征向量ei w来生成几何图形特征hCNN表示根据 文本x找到几何图形特征hCNN,然后将几何图形特征hCNN解码生成几何图形y;表 示为根据文本x找到几何图形特征hCNN以及解码后对应的几何图形y,表示为根据 文本x找到几何图形特征hCNNAmong them, P(y|x) represents the generation of geometric figure y based on x in this article; It is expressed that after training by the contrastive learning model, the geometric feature h CNN can be generated according to the word embedding feature vector e i w ; Indicates that the geometric feature h CNN is found based on the text x, and then the geometric feature h CNN is decoded to generate the geometric figure y; It is expressed as finding the geometric figure feature h CNN based on the text x and the corresponding geometric figure y after decoding, It is expressed as finding the geometric feature h CNN based on the text x;

步骤S45,几何图形生成损失Lprior:根据对对比学习模型微调中的Prior,采用均方误差损失函数进行几何图形的预测,计算过程如公式(3)所示;Step S45, geometric figure generation loss L prior : According to the Prior in the fine-tuning of the contrastive learning model, the mean square error loss function is used to predict the geometric figures. The calculation process is as shown in formula (3);

(3); (3);

其中,Lprior表示几何图形生成损失,表示将前i次几何图形生成损失进行求和,T 为次数,h(i) CNN表示为第i次生成的几何图形特征hCNN表示用文本x来第i次生成 的几何图形特征h(i) CNN表示用文本x来第i次生成的几何图形特 征h(i) CNN与几何图形特征hCNN做差,其中为可调的参数量。 Among them, L prior represents the geometry generation loss, Indicates the summation of the previous i geometric figure generation losses, T is the number of times, h (i) CNN is expressed as the i-th generated geometric figure feature h CNN ; Represents the geometric feature h (i) CNN generated for the i-th time using text x; Indicates the difference between the geometric feature h (i) generated by text x for the i-th time between CNN and geometric feature h CNN , where is an adjustable parameter amount.

进一步的,步骤S5中有图解题器,包含双向LSTM层、图形编码器、联合推理模块以及程序解码器四大模块;具体内容包括:Furthermore, step S5 includes a graphical problem solver, which includes four modules: bidirectional LSTM layer, graph encoder, joint reasoning module and program decoder; specific contents include:

步骤S51,输入数据到有图解题器中,输入数据包括BERT特征编码器获得的有图中学几何题题干中的字嵌入特征向量以及无图中学几何题和几何图像生成器所生成对应的几何图形;Step S51, input data into the problem solver with pictures. The input data includes the word embedding feature vectors in the stems of the geometry questions in the middle school with pictures obtained by the BERT feature encoder and the corresponding character embedding feature vectors in the geometry questions in the middle school without pictures and generated by the geometric image generator. geometric figures;

步骤S52,双向LSTM层:BERT特征编码器获得的有图中学几何题题干中的字嵌入特征向量ei w输入到双向LSTM层中,利用双向LSTM层获取数学几何文本中第i字对应的上下文语义特征向量,即将字嵌入特征向量ei w分别对应输入到前向的LSTM层和后向的LSTM层中,如公式(4)所示;Step S52, bidirectional LSTM layer: The word embedding feature vector e i w in the middle school geometry question stem with pictures obtained by the BERT feature encoder is input into the bidirectional LSTM layer, and the bidirectional LSTM layer is used to obtain the corresponding character of the i-th word in the mathematical geometry text. The contextual semantic feature vector, that is, the word embedding feature vector e i w , is input into the forward LSTM layer and the backward LSTM layer respectively, as shown in formula (4);

(4); (4);

其中,hi LSTM为数学几何文本中第i字对应的上下文语义特征向量,LSTMf、LSTMb分 别表示前向LSTM层的输出向量和后向LSTM层的输出向量,表示级联操作; Among them, h i LSTM is the contextual semantic feature vector corresponding to the i-th word in the mathematical geometry text, LSTM f and LSTM b respectively represent the output vector of the forward LSTM layer and the output vector of the backward LSTM layer. Represents cascading operations;

步骤S53,图形编码器:采用CNN卷积神经网络的方式来提取几何图像特征hCNN,CNN卷积神经网络包括卷积层、非线性激活函数和池化层组件;在卷积层中通过滑动卷积核对几何图像进行卷积操作,捕捉局部特征;同时引入非线性激活函数,增加CNN卷积神经网络的表达能力,池化层组件降低特征图的维度,保留几何图形的关键特征;多层堆叠的卷积层使CNN卷积神经网络逐渐提取更高级别的几何特征;通过全连接层得到几何图形特征hCNNStep S53, graphics encoder: Use CNN convolutional neural network to extract geometric image features h CNN . The CNN convolutional neural network includes convolutional layers, nonlinear activation functions and pooling layer components; in the convolutional layer by sliding The convolution kernel performs a convolution operation on the geometric image to capture local features; at the same time, a nonlinear activation function is introduced to increase the expressive ability of the CNN convolutional neural network; the pooling layer component reduces the dimension of the feature map and retains the key features of the geometric figure; multi-layer The stacked convolutional layers enable the CNN convolutional neural network to gradually extract higher-level geometric features; the geometric feature h CNN is obtained through the fully connected layer;

步骤S54,联合推理模块:通过注意力机制将第i字对应的上下文语义特征向量hi LSTM与几何图形特征hCNN进行融合,实现跨界语义融合和对齐,获得蕴含注意力机制的第i字对应的上下文语义特征向量hi LSTM与几何图形特征hCNN信息的第i个字对应的多模态特征向量Mi,计算过程如公式(5)、公式(6);Step S54, joint reasoning module: fuse the contextual semantic feature vector h i LSTM corresponding to the i-th word with the geometric feature h CNN through the attention mechanism to achieve cross-border semantic fusion and alignment, and obtain the i-th word containing the attention mechanism The corresponding contextual semantic feature vector h i LSTM and the multi-modal feature vector M i corresponding to the i-th word of the geometric feature h CNN information, the calculation process is as follows: Formula (5) and Formula (6);

(5); (5);

(6); (6);

其中,Attention表示注意力机制,Q、K、V分别表示查询向量、键向量和值向量, Softmax为归一化指数函数,dd为查询向量Q、键向量K的第二维度大小,分别 表示自注意力机制时第i字对应的查询向量Q、键向量K和值向量V的投影参数矩阵;令,其中为线性层学习的参数矩阵,D表示转置; Among them, Attention represents the attention mechanism, Q, K, V represent the query vector, key vector and value vector respectively, Softmax is the normalized exponential function, dd is the second dimension size of the query vector Q and the key vector K, , , Respectively represent the projection parameter matrix of the query vector Q, key vector K and value vector V corresponding to the i-th word in the self-attention mechanism; let , ,in The parameter matrix learned for the linear layer, D represents the transpose;

步骤S55,程序解码器:多模态特征向量Mi馈入线性层以获得初始状态s0,双向LSTM层在时间步长t处的隐藏状态st与关注结果级联,用Softmax函数馈送到线性层,以预测下一个程序令牌Pt的分布。Step S55, program decoder: the multi-modal feature vector Mi is fed into the linear layer to obtain the initial state s 0 , the hidden state s t of the bidirectional LSTM layer at time step t is cascaded with the attention result, and is fed to the linear layer with the Softmax function Linear layer to predict the distribution of the next program token P t .

进一步的,步骤S6中生成损失Lg采用目标程序的负对数似然,其计算公式如(7)所示;Further, the generation loss L g in step S6 adopts the negative log likelihood of the target program, and its calculation formula is as shown in (7);

) (7); )(7);

其中,θ是损失函数的参数,Pt是程序令牌,yt为要生成t时刻的目标程序,yt-1为要生成t-1时刻的目标程序,Mi是多模态特征向量。Among them, θ is the parameter of the loss function, P t is the program token, y t is the target program to be generated at time t, y t-1 is the target program to be generated at time t-1, M i is the multi-modal feature vector .

进一步的,本发明的中学几何问题自动求解模型,分为BERT特征编码器、几何图像生成器、有图解题器和统一大模型四大模块,BERT特征编码器分别串行几何图像生成器与有图解题器,几何图像生成器与有图解题器呈并行结构,之后串行统一大模型。Furthermore, the automatic solution model for middle school geometry problems of the present invention is divided into four modules: BERT feature encoder, geometric image generator, graphic problem solver and unified large model. The BERT feature encoder serially connects the geometric image generator and The graphical problem solver, the geometric image generator and the graphical problem solver have a parallel structure, and then the large model is unified in series.

本发明的有益效果是:(1)首先,从人教版初中数学教材和试卷中采集数据集,将数据集中的题目及答案进行清洗和归一化,由此构建中学几何数据集。然后,将中学几何数据集通过程序分类成两种题型,分类完后进行人工监督检测以此验证本发明分类的合理性。其次,将分类好的两种题型分别对它们进行处理。针对题目中自带图形的有图几何题:用一个有图解题器模型对其进行解题;针对题目中没有图形而需要自己作图的无图几何题:这类题型就需要借助对比学习模型,并对对比学习模型进行训练和微调,让它更好的生成题目中文本所需要的几何图形。然后,将几何图像生成器生成的几何图形和无图几何题中的题目一起输入到有图解题器中进行模型测试。最后,将训练好的两个模块融合到一起形成一个能同时解决以上两种几何题型的统一大模型。The beneficial effects of the present invention are: (1) First, collect data sets from junior high school mathematics textbooks and test papers of the People's Education Press, clean and normalize the questions and answers in the data set, thereby constructing a middle school geometry data set. Then, the middle school geometry data set is classified into two question types through the program. After the classification is completed, manual supervision and inspection are performed to verify the rationality of the classification of the present invention. Secondly, process the two classified question types separately. For graphed geometry questions that have their own graphics in the question: use a graphed problem solver model to solve them; for non-pictured geometry questions that have no graphics in the question and require your own drawing: this type of question requires the help of comparison Learning model, and training and fine-tuning the comparative learning model so that it can better generate the geometric figures required for the text in the question. Then, the geometric figures generated by the geometric image generator and the questions from the geometry questions without figures are input into the figure solver for model testing. Finally, the two trained modules are merged together to form a unified large model that can solve the above two geometric problem types at the same time.

(2)针对需要作图解决的无图几何题,本发明采用对对比学习模型进行微调和预训练的方式来生成题目中所对应的几何图形,实现了从“无图”到“有图”的这个过程,并为后续的解题做好了铺垫。(2) For geometric problems without drawings that need to be solved by drawing, the present invention adopts the method of fine-tuning and pre-training the comparative learning model to generate the corresponding geometric figures in the problems, realizing the transformation from "without pictures" to "with pictures" This process paved the way for subsequent problem solving.

(3)针对自带图形解决的有图几何题,本发明首先分别对文本和几何图形用编码器进行特征提取,然后用采用协同注意机制来进行跨界语义融合和对齐,最后解决了跨模态联合推理的问题。(3) For graph geometry problems solved with built-in graphics, the present invention first uses encoders to extract features from text and geometric figures respectively, and then uses a collaborative attention mechanism to perform cross-border semantic fusion and alignment, and finally solves the cross-modal problem. The problem of joint reasoning.

(4)本发明从全新的视角,将中学几何问题分成两种题型来分别对应解决,并将它们融合到一起形成一个能解中学几何问题的统一大模型。(4) From a new perspective, this invention divides middle school geometry problems into two types of questions to solve respectively, and integrates them to form a unified large model that can solve middle school geometry problems.

附图说明Description of drawings

图1为本发明的整体模型结构流程图。Figure 1 is a flow chart of the overall model structure of the present invention.

具体实施方式Detailed ways

本发明是这样来工作和实施的,一种基于对比学习的中学几何问题自动求解方法,其方法步骤如下:The present invention works and is implemented in this way. It is an automatic solution method for middle school geometry problems based on comparative learning. The method steps are as follows:

步骤S1,数据集构建:收集若干道中学几何题及答案;并分别按照训练集、验证集、测试集对收集若干道几何题及答案进行划分,得到所需的中学几何数据集;Step S1, data set construction: collect several middle school geometry questions and answers; and divide the collected geometry questions and answers according to the training set, verification set, and test set to obtain the required middle school geometry data set;

步骤S2,任务形式化定义:给定包含N条中学几何数据集,经过题型分类器划分成:无图中学几何题数据集B和有图中学几何题数据集C;Step S2, formal definition of the task: given a middle school geometry data set containing N pieces, it is divided into a middle school geometry problem data set without pictures and a middle school geometry problem data set C with pictures through a question type classifier;

步骤S3,无图中学几何题数据集B中有y道需要自己作图的几何题型by和有图中学几何题数据集C中z道自带图形的几何题型cz输入到中学几何问题自动求解模型中的BERT特征编码器中;获取中学几何题题干中的所有字嵌入特征向量;Step S3, there are y geometry question types b y in the middle school geometry question data set B without figures that require drawings by oneself, and z geometry question types c z in the middle school geometry question data set C with figures that have their own graphics, and are input to middle school geometry. In the BERT feature encoder in the automatic problem solving model; obtain all word embedding feature vectors in the middle school geometry question stem;

步骤S4,将BERT特征编码器获得的无图中学几何题题干中的字嵌入特征向量输入到几何图像生成器中,几何图像生成器基于对比学习模型训练与微调得到,用于生成中学几何题所需要的几何图形,采用人工监督的方式进行对比学习模型训练,通过均方误差损失函数计算几何图形生成损失Lprior,优化更新BERT特征编码器和几何图像生成器的参数,获得几何图形;Step S4: Input the word embedding feature vector in the stem of the middle school geometry question without pictures obtained by the BERT feature encoder into the geometric image generator. The geometric image generator is trained and fine-tuned based on the contrastive learning model and used to generate middle school geometry questions. For the required geometric figures, manual supervision is used for comparative learning model training. The geometric figure generation loss L prior is calculated through the mean square error loss function, and the parameters of the BERT feature encoder and geometric image generator are optimized and updated to obtain the geometric figure;

步骤S5,将BERT特征编码器获得的有图中学几何题题干中的字嵌入特征向量以及无图中学几何题和几何图像生成器所生成对应的几何图形输入到有图解题器中,然后有图解题器中内含的图形编码器将几何图形进行编码并特征提取,与BERT特征编码器的字嵌入特征向量进行中学几何题干与几何图形的对齐操作得到最终的多模态特征向量;Step S5: Input the word embedding feature vectors in the stems of the geometry questions in the middle school with pictures obtained by the BERT feature encoder and the corresponding geometric figures generated by the geometry questions in the middle school without pictures and the geometric image generator into the problem solver with pictures, and then The graphics encoder included in the graphic problem solver encodes the geometric figures and extracts features, and performs the alignment operation between the middle school geometry question stem and the geometric figures with the word embedding feature vector of the BERT feature encoder to obtain the final multi-modal feature vector. ;

步骤S6,有图解题器中的程序解码器在多模态特征向量的引导下顺序生成解题程序,采用负对数似然损失函数计算解题错误的生成损失Lg,得到准确率高的解题答案;Step S6, the program decoder in the graphical problem solver sequentially generates problem-solving programs under the guidance of multi-modal feature vectors, and uses the negative log-likelihood loss function to calculate the generation loss L g of problem-solving errors, obtaining a high accuracy The solution to the problem;

步骤S7,将几何图像生成器和有图解题器合在一起测试,形成一个既能解决需要自己作图的无图几何题型,也能解决自带图形的几何题型的统一大模型。Step S7: Combine the geometric image generator and the problem solver with pictures to test together to form a unified large model that can solve not only geometry problems without pictures that require drawing by yourself, but also geometry problems with own graphics.

进一步的,步骤S1中数据集构建,手动收集16201道几何题型及答案,几何题型及答案来源于新人教版中学教材、试卷考纲和教案资料,执行以下任务;具体为:Further, in step S1, the data set is constructed, and 16,201 geometry questions and answers are manually collected. The geometry questions and answers come from the New People's Education Press middle school textbooks, test paper syllabus and teaching plan materials, and the following tasks are performed; specifically:

步骤S11,去除重复的几何题型及答案;Step S11, remove duplicate geometry question types and answers;

步骤S12,将几何题型及答案分类成两种几何题型,有图形的自动分类为有图几何题,只有题干而没有几何图的分类为无图几何题;Step S12, classify the geometry question types and answers into two geometry question types. Those with figures are automatically classified as geometry questions with figures, and those with only question stems but no geometric figures are classified as geometry questions without figures;

步骤S13,对几何题型进行分类检验,即采用人工检查对同一几何题型分类结果进行校验,保证分类合理;Step S13, perform classification inspection on the geometry question type, that is, use manual inspection to verify the classification results of the same geometry question type to ensure that the classification is reasonable;

步骤S14,经过人工检查和校验后,保留14334道几何题型及答案,有图形的自动分类为有图几何题有9922道,没有几何图的自动分类为无图几何题有4412道,按照训练集:验证集:测试集的比例=8:1:1对几何题型及答案进行划分;Step S14, after manual inspection and verification, 14,334 geometry questions and answers are retained. There are 9,922 geometry questions with figures automatically classified as geometry questions with figures, and 4,412 geometry questions without geometric figures automatically classified as geometry questions without figures. According to The ratio of training set: validation set: test set = 8:1:1 divides the geometry questions and answers;

步骤S15,经过几何题型及答案的划分后,将训练集以及验证集的几何题型进行人工的解题标注;根据答案将解题步骤提炼出来,然后采用人工手工的方式将解题步骤标注成计算机能识别的程序语言。Step S15: After dividing the geometric question types and answers, manually label the geometric question types of the training set and the verification set; extract the problem-solving steps based on the answers, and then manually label the problem-solving steps. into a programming language that computers can recognize.

进一步的,步骤S3中的BERT特征编码器,使用Transformer模型架构中的编码器模块,由多层双向编码器组成,计算过程如公式(1)所示;Further, the BERT feature encoder in step S3 uses the encoder module in the Transformer model architecture and is composed of multi-layer bidirectional encoders. The calculation process is as shown in formula (1);

(1); (1);

其中,ei w为第i个字令牌wi经过BERT特征编码器得到的相应字嵌入特征向量。Among them, e i w is the corresponding word embedding feature vector obtained by the BERT feature encoder of the i-th word token w i .

进一步的,步骤S4中的几何图像生成器,具体内容包括:Further, the geometric image generator in step S4 includes:

步骤S41,输入数据到几何图像生成器中,输入数据为无图几何题题干中的相应字嵌入特征向量;Step S41, input data into the geometric image generator, where the input data is the corresponding word embedding feature vector in the question stem of the non-picture geometry question;

步骤S42,几何图像生成器是改编的对比学习模型,对比学习模型是一个文本-图像对的分类模型,对对比学习模型进行训练和微调来达到生成几何图像的下游任务;Step S42, the geometric image generator is an adapted contrastive learning model. The contrastive learning model is a text-image pair classification model. The contrastive learning model is trained and fine-tuned to achieve the downstream task of generating geometric images;

步骤S43,对对比学习模型进行训练:Step S43, train the contrastive learning model:

收集并整理有图几何题数据集,得到9922个几何题干和对应的几何图形;Collected and organized a data set of geometry questions with figures, and obtained 9922 geometry questions and corresponding geometric figures;

分别对9922个几何题干和对应的几何图形进行特征提取得到字嵌入特征向量ei w和几何图形特征hCNN,即形成9922个文本-图像对;Feature extraction was performed on 9922 geometric question stems and corresponding geometric figures to obtain word embedding feature vectors e i w and geometric figure features h CNN , forming 9922 text-image pairs;

将9922个文本-图像对的特征输入到文本-图像对的分类模型中去做对比学习,在人工监督的情况下将其中互相匹配的文本-图像对标记为正样本,不匹配的文本-图像对标记为负样本;Input the features of 9922 text-image pairs into the classification model of text-image pairs for comparative learning. Under manual supervision, the text-image pairs that match each other are marked as positive samples, and the unmatched text-image pairs are marked as positive samples. Mark as negative sample;

文本-图像对的分类模型能通过正样本、负样本得到几何题干和对应的几何图形,即给出一个字嵌入特征向量ei w找出对应的几何图形特征hCNNThe classification model of text-image pairs can obtain the geometric question stem and the corresponding geometric figure through positive samples and negative samples, that is, a word embedding feature vector e i w is given to find the corresponding geometric figure feature h CNN ;

步骤S44,对对比学习模型进行微调:Step S44, fine-tune the contrastive learning model:

定义文本为x,几何图形为y,产生的图形编码引入Prior:,计算过程如 公式(2)所示,其中Prior产生的图形编码由对比学习模型的图形编码当作真值训练得到; Define the text as x, the geometric figure as y, and introduce the resulting graphics code into Prior: , the calculation process is shown in formula (2), in which the graphic code generated by Prior is obtained by training the graphic code of the contrastive learning model as a true value;

(2); (2);

其中,P(y|x)表示根据本文x来生成几何图形y;表示为经过对比学习 模型训练后,能够根据字嵌入特征向量ei w来生成几何图形特征hCNN表示根据 文本x找到几何图形特征hCNN,然后将几何图形特征hCNN解码生成几何图形y;表 示为根据文本x找到几何图形特征hCNN以及其解码后对应的几何图形y,表示为根 据文本x找到几何图形特征hCNNAmong them, P(y|x) represents the generation of geometric figure y based on this article x; It is expressed that after training by the contrastive learning model, the geometric feature h CNN can be generated according to the word embedding feature vector e i w ; Indicates that the geometric feature h CNN is found based on the text x, and then the geometric feature h CNN is decoded to generate the geometric figure y; It is expressed as finding the geometric figure feature h CNN based on the text x and its decoded corresponding geometric figure y, It is expressed as finding the geometric feature h CNN based on the text x;

步骤S45,几何图形生成损失Lprior:根据对对比学习模型微调中的Prior,采用均方误差损失函数进行几何图形的预测,计算过程如公式(3)所示;Step S45, geometric figure generation loss L prior : According to the Prior in the fine-tuning of the contrastive learning model, the mean square error loss function is used to predict the geometric figures. The calculation process is as shown in formula (3);

(3); (3);

其中,Lprior表示几何图形生成损失,表示将前i次几何图形生成损失进行求和,T 为次数,h(i) CNN表示为第i次生成的几何图形特征;表示用文本x来第i次生成的几 何图形特征h(i) CNN表示用文本x来第i次生成的几何图形特征 h(i) CNN与几何图形特征hCNN做差,其中为可调的参数量。 Among them, L prior represents the geometry generation loss, Indicates the summation of the previous i geometric figure generation losses, T is the number of times, h (i) CNN is expressed as the i-th generated geometric figure feature; Represents the geometric feature h (i) CNN generated for the i-th time using text x; Indicates the difference between the geometric feature h (i) generated by text x for the i-th time between CNN and geometric feature h CNN , where is an adjustable parameter amount.

进一步的,步骤S5中有图解题器,包含双向LSTM层、图形编码器、联合推理模块以及程序解码器四大模块;具体内容包括:Furthermore, step S5 includes a graphical problem solver, which includes four modules: bidirectional LSTM layer, graph encoder, joint reasoning module and program decoder; specific contents include:

步骤S51,输入数据到有图解题器中,输入数据包括BERT特征编码器获得的有图几何题题干中的字嵌入特征向量以及无图几何题和几何图像生成器所生成对应的几何图形;Step S51, input data into the problem solver with pictures. The input data includes the word embedding feature vectors in the stems of the geometry questions with pictures obtained by the BERT feature encoder and the corresponding geometric figures generated by the geometry questions without pictures and the geometric image generator. ;

步骤S52,双向LSTM层:BERT特征编码器获得的有图几何题题干中的字嵌入特征向量ei w输入到双向LSTM层中,利用双向LSTM层获取数学几何文本中第i字对应的上下文语义特征向量,即将字嵌入特征向量ei w分别对应输入到前向的LSTM层和后向的LSTM层中,如公式(4)所示;Step S52, Bidirectional LSTM layer: The word embedding feature vector e i w in the question stem of the graph geometry question obtained by the BERT feature encoder is input into the bidirectional LSTM layer, and the bidirectional LSTM layer is used to obtain the context corresponding to the i-th word in the mathematical geometry text. The semantic feature vector, that is, the word embedding feature vector e i w , is input into the forward LSTM layer and the backward LSTM layer respectively, as shown in formula (4);

(4); (4);

其中,hi LSTM为数学几何文本中第i字对应的上下文语义特征向量,LSTMf、LSTMb分 别表示前向LSTM层的输出向量和后向LSTM层的输出向量,表示级联操作; Among them, h i LSTM is the contextual semantic feature vector corresponding to the i-th word in the mathematical geometry text, LSTM f and LSTM b respectively represent the output vector of the forward LSTM layer and the output vector of the backward LSTM layer. Represents cascading operations;

步骤S53,图形编码器:采用CNN卷积神经网络的方式来提取图像特征hCNN,具体来说通过卷积层、非线性激活函数和池化层等组件实现对几何图形的特征提取。在卷积层中,通过滑动卷积核对几何图像进行卷积操作,捕捉局部特征如边缘的线和点。同时引入非线性激活函数,增加网络的表达能力。池化层则降低特征图的维度,保留几何图形的关键特征。多层堆叠的卷积层使网络逐渐提取更高级别的几何特征,如几何图形的形状和点、线的位置关系。最后,通过全连接层得到几何图形的特征;Step S53, graphics encoder: Use CNN convolutional neural network to extract image features h CNN . Specifically, the feature extraction of geometric figures is achieved through components such as convolution layers, nonlinear activation functions, and pooling layers. In the convolution layer, the geometric image is convolved by sliding convolution kernels to capture local features such as edge lines and points. At the same time, a nonlinear activation function is introduced to increase the expressive ability of the network. The pooling layer reduces the dimensionality of the feature map and retains the key features of the geometry. Multiple stacked convolutional layers enable the network to gradually extract higher-level geometric features, such as the shape of geometric figures and the positional relationships of points and lines. Finally, the features of the geometric figures are obtained through the fully connected layer;

步骤S54,联合推理模块:通过注意力机制将第i字对应的上下文语义特征向量hi LSTM与几何图形特征hCNN进行融合,实现跨界语义融合和对齐,获得蕴含注意力机制的第i字对应的上下文语义特征向量h(i) CNN与几何图形特征hCNN信息的第i个字对应的多模态特征向量Mi,计算过程如公式(5)、公式(6);Step S54, joint reasoning module: use the attention mechanism to fuse the contextual semantic feature vector h i LSTM corresponding to the i-th word with the geometric feature h CNN to achieve cross-border semantic fusion and alignment, and obtain the i-th word containing the attention mechanism Corresponding contextual semantic feature vector h (i) CNN and geometric feature h Multi-modal feature vector M i corresponding to the i-th word of CNN information, the calculation process is as follows: Formula (5) and Formula (6);

(5); (5);

(6); (6);

其中,Attention表示注意力机制,Q、K、V分别表示查询向量、键向量和值向量, Softmax为归一化指数函数,dd为查询向量Q、键向量K的第二维度大小,分别 表示自注意力机制时第i字对应的查询向量Q、键向量K和值向量V的投影参数矩阵;令,其中为线性层学习的参数矩阵,D表示转置; Among them, Attention represents the attention mechanism, Q, K, V represent the query vector, key vector and value vector respectively, Softmax is the normalized exponential function, dd is the second dimension size of the query vector Q and the key vector K, , , Respectively represent the projection parameter matrix of the query vector Q, key vector K and value vector V corresponding to the i-th word in the self-attention mechanism; let , ,in The parameter matrix learned for the linear layer, D represents the transpose;

步骤S55,程序解码器:多模态特征向量Mi馈入线性层以获得初始状态s0,双向LSTM层在时间步长t处的隐藏状态st与关注结果级联,用Softmax函数馈送到线性层,以预测下一个程序令牌Pt的分布。Step S55, program decoder: the multi-modal feature vector Mi is fed into the linear layer to obtain the initial state s 0 , the hidden state s t of the bidirectional LSTM layer at time step t is cascaded with the attention result, and is fed to the linear layer with the Softmax function Linear layer to predict the distribution of the next program token P t .

进一步的,步骤S6中生成损失Lg采用目标程序的负对数似然,其计算公式如(7)所示;Further, the generation loss L g in step S6 adopts the negative log likelihood of the target program, and its calculation formula is as shown in (7);

) (7); )(7);

其中,θ是损失函数的参数,Pt是程序令牌,yt为要生成t时刻的目标程序,yt-1为要生成t-1时刻的目标程序,Mi是多模态特征向量。Among them, θ is the parameter of the loss function, P t is the program token, y t is the target program to be generated at time t, y t-1 is the target program to be generated at time t-1, M i is the multi-modal feature vector .

进一步的,本发明的中学几何问题自动求解模型,分为BERT特征编码器、几何图像生成器、有图解题器和统一大模型四大模块,BERT特征编码器分别串行几何图像生成器与有图解题器,几何图像生成器与有图解题器呈并行结构,之后串行统一大模型。Furthermore, the automatic solution model for middle school geometry problems of the present invention is divided into four modules: BERT feature encoder, geometric image generator, graphic problem solver and unified large model. The BERT feature encoder serially connects the geometric image generator and The graphical problem solver, the geometric image generator and the graphical problem solver have a parallel structure, and then the large model is unified in series.

进一步的,统一大模型是经过几何图像生成器以及有图解题器的训练后,进行大模型测试,假如输入一道中学几何题,不管它是有图还是无图,如果是有图几何题,则直接经过特征编码器后输入到有图解题器中进行题目求解;而如果是无图的几何题,几何图像生成器能够生成几何图像让它变成有图几何题然后再输入到有图解题器中加以求解。所以综上,我们做了一个既能处理自带图形的有图几何题,同时也能处理没有几何图形的无图几何题的统一大模型。Furthermore, the unified large model is trained by the geometric image generator and the problem solver with pictures, and then the large model is tested. If a middle school geometry question is input, regardless of whether it has pictures or no pictures, if it is a geometry question with pictures, Then it is directly passed through the feature encoder and then input into the problem solver with a graph for problem solving; and if it is a geometry problem without a graph, the geometric image generator can generate a geometric image to turn it into a geometry problem with a graph and then input it into the graph problem solver. Solve it in the problem solver. So in summary, we have created a unified large model that can handle both graph geometry problems with built-in graphics and graph-free geometry problems without geometric shapes.

如图1所示,图1为模型预测的流程图,流程如下:将N道中学几何数据集输入到中学几何问题自动求解模型当中,然后将N道中学几何分类成无图几何题和有图几何题;对于无图几何题,首先经过BERT特征编码器的特征提取,然后输入到几何图像生成器中,该几何图像生成器通过对比学习,以及训练和微调的方式能够根据题意来生成题目中所描述的几何图形,其中图形的预测采用均方误差损失函数得到预测损失Lprior,由此优化更新几何图像生成器中的参数,提高其生成图形的准确度,最后将几何图像生成器生成的几何图形以及无图几何题输入到有图解题器中进行求解测试;对于有图几何题,首先经过BERT特征编码器的特征提取,然后继续输入到有图解题器中进行求解,其中在训练期间,生成损失Lg是目标程序的负对数似然,可以提高有图解题器的准确性;在训练阶段,中学几何问题自动求解模型会计算联合总损失L=Lprior+Lg,以同时优化无图解题器和有图解题器中的参数,增强两个模块的信息交互;在测试阶段,将几何图像生成器和有图解题器融合在一起形成一个能解这两类几何题型的统一大模型。As shown in Figure 1, Figure 1 is a flow chart of model prediction. The process is as follows: input the N-channel middle school geometry data set into the automatic solving model for middle school geometry problems, and then classify the N-channel middle school geometry into geometry problems without figures and geometry problems with figures. Geometry questions; for graph-free geometry questions, the features are first extracted by the BERT feature encoder and then input into the geometric image generator. The geometric image generator can generate questions according to the meaning of the question through comparative learning, training and fine-tuning. The geometric figure described in , where the prediction of the figure uses the mean square error loss function to obtain the prediction loss L prior , thereby optimizing and updating the parameters in the geometric image generator, improving the accuracy of the generated graphics, and finally generating the geometric image generator The geometric figures and non-graph geometry problems are input into the problem solver with graphs for solution testing; for the geometry problems with graphs, they are first extracted by the BERT feature encoder, and then continue to be input into the problem solver with graphs for solving, where During the training period, the generation loss L g is the negative log likelihood of the target program, which can improve the accuracy of the graphical problem solver; during the training stage, the middle school geometry problem automatic solving model will calculate the joint total loss L = L prior + L g , to simultaneously optimize the parameters in the problem solver without graphics and the problem solver with graphics, and enhance the information interaction between the two modules; in the testing phase, the geometric image generator and the problem solver with graphics are integrated to form a solution A unified large model for these two types of geometric questions.

Claims (7)

1. An automatic solving method of a middle school geometric problem based on contrast learning is characterized by comprising the following steps: the method comprises the following steps:
step S1, constructing a data set: collecting a plurality of geometric questions and answers of the middle school; dividing the collected multiple geometric questions and answers according to the training set, the verification set and the test set respectively to obtain a required geometric data set of middle school;
step S2, task formalization definition: given a medium geometry dataset comprising N pieces, it is divided into: a non-pictorial geometric topic data set B and a pictorial geometric topic data set C;
step S3, there are y geometric questions B needing to be drawn by oneself in the study geometric question data set B in the no-graph y And geometric topic C with z-channel self-graphic in geometric topic data set C in graph z Input to automatic solving of geometric problems in middle schoolIn the BERT signature encoder in the solution model; acquiring embedded feature vectors of all words in the middle school geometric stem;
s4, inputting word embedded feature vectors in the non-graphic middle school geometric question stems obtained by the BERT feature encoder into a geometric image generator, wherein the geometric image generator is obtained based on comparison learning model training and fine tuning and is used for generating geometric figures required by the middle school geometric questions, the comparison learning model training is carried out by adopting a manual supervision mode, and geometric figure generation loss L is calculated by means of a mean square error loss function prior Optimizing and updating parameters of the BERT feature encoder and the geometric image generator to obtain geometric figures;
s5, inputting word embedded feature vectors in the geometric problem stems with graphics obtained by the BERT feature encoder and geometric figures generated by the geometric problem and geometric image generator without graphics into the graphic problem device, then encoding the geometric figures by the graphic encoder contained in the graphic problem device, extracting features, and performing alignment operation of the geometric problem stems with the geometric figures with the word embedded feature vectors of the BERT feature encoder to obtain final multi-mode feature vectors;
step S6, the program decoder in the diagrammatical question device sequentially generates a question solving program under the guidance of the multi-mode feature vector, and calculates the generation loss L of the question solving error by adopting the negative log likelihood loss function g Obtaining a question-solving answer with high accuracy;
and S7, testing the geometric image generator and the graphic problem solving device together to form a unified large model which can solve the problem of the graphic-free geometric problem needing to be drawn by the user and the geometric problem with the graphic.
2. The automatic middle school geometric problem solving method based on contrast learning according to claim 1, wherein the method is characterized in that: step S1, constructing a data set, collecting geometric questions and answers of a plurality of roads, and executing the following tasks; the method comprises the following steps:
step S11, removing repeated geometric questions and answers of middle school;
step S12, classifying the geometric questions of middle school and answers into two geometric questions, automatically classifying the geometric questions of middle school with graphics, and classifying the geometric questions of middle school without graphics as geometric questions of middle school without graphics;
step S13, classifying and checking the geometric questions of the middle school, namely manually checking and checking the classification result of the geometric questions of the same middle school;
step S14, after manual inspection and verification, according to the training set: verification set: ratio of test set = 8:1:1, dividing the geometric questions and answers of the middle school;
step S15, after the middle school geometric questions and answers are divided, the middle school geometric questions of the training set and the verification set are manually marked; extracting the problem solving step according to the answer, and marking the problem solving step as a program language which can be identified by a computer in a manual mode.
3. The automatic middle school geometric problem solving method based on contrast learning according to claim 2, wherein the method is characterized in that: the BERT feature encoder in the step S3 is composed of a plurality of layers of bidirectional encoders by using an encoder module in a transducer model framework, and the calculation process is shown in a formula (1);
(1);
wherein e i w For the ith word token w i The corresponding word obtained through the BERT feature encoder is embedded into the feature vector.
4. The automatic middle school geometric problem solving method based on contrast learning according to claim 3, wherein the method is characterized in that: the geometric image generator in the step S4 comprises the following specific contents:
s41, inputting data into a geometric image generator, wherein the input data is corresponding word embedded feature vectors in the geometric stem of the non-graphics primitive;
step S42, the geometric image generator is an adapted comparison learning model, the comparison learning model is a classification model of a text-image pair, and the comparison learning model is trained and fine-tuned to achieve a downstream task of generating geometric images;
step S43, training the comparison learning model:
collecting and arranging a graph geometric question data set to obtain a middle school geometric question stem and a corresponding geometric figure;
respectively extracting the characteristics of the geometric stem of middle school and the corresponding geometric figure to obtain a word embedded characteristic vector e i w And geometric feature h CNN Forming a text-image pair;
inputting the characteristics of the text-image pairs into a classification model of the text-image pairs for contrast learning, and marking the text-image pairs matched with each other as positive samples and the text-image pairs not matched as negative samples under the condition of manual supervision;
the classification model of the text-image pair can obtain the stem of the middle school geometry and the corresponding geometry through positive samples and negative samples, namely a word embedded feature vector e is given i w Find out the corresponding geometric figure feature h CNN
Step S44, fine tuning is carried out on the comparison learning model:
defining the text as x, the geometric figure as y, and introducing the generated graphic code into the primary:the calculation process is shown in a formula (2), wherein the graphic code generated by the primary is obtained by training the graphic code of the comparison learning model as a true value;
(2);
wherein,representing the generation of a geometry y from the text x; />Is expressed as being capable of embedding a feature vector e according to a word after training by a contrast learning model i w To generate geometric feature h CNN ; />Representing finding geometric feature h from text x CNN Then the geometric feature h CNN Decoding to generate a geometric figure y; />Represented as finding geometric feature h from text x CNN And the decoded corresponding geometry y, < ->Represented as finding geometric feature h from text x CNN
Step S45, geometric figure generation loss L prior : according to the primary in the fine tuning of the contrast learning model, predicting geometric figures by adopting a mean square error loss function, wherein the calculation process is shown in a formula (3);
(3);
wherein L is prior The loss of the generation of the geometric figure is represented,representing the sum of the geometric figure generation losses of the previous i times, T being the number of times, h (i) CNN Representing the geometric feature h generated for the ith time CNN ;/>Representing the ith generated geometric feature h with text x (i) CNN ;/>Representing the ith generated geometric feature h with text x (i) CNN And geometric figure feature h CNN Do bad, wherein->Is an adjustable parameter.
5. The automatic middle school geometric problem solving method based on contrast learning according to claim 4, wherein the method is characterized in that: the step S5 is provided with a graphic problem device which comprises a bidirectional LSTM layer, a graphic encoder, a joint reasoning module and a program decoder; the specific contents include:
step S51, inputting data into a graphic problem solving device, wherein the input data comprises word embedded feature vectors in graphic geometric problem stems obtained by a BERT feature encoder and corresponding geometric figures generated by a non-graphic geometric problem and a geometric image generator;
step S52, bi-directional LSTM layer: word embedded feature vector e in the stem of the geometric question in the diagram obtained by the BERT feature encoder i w Inputting the text into a bidirectional LSTM layer, and acquiring a context semantic feature vector corresponding to the ith word in the mathematical geometry text by utilizing the bidirectional LSTM layer, namely embedding the word into the feature vector e i w Respectively and correspondingly inputting the data into a forward LSTM layer and a backward LSTM layer, as shown in a formula (4);
(4);
wherein h is i LSTM Context semantic feature vector corresponding to the i-th word in the mathematical geometry text, LSTM f 、LSTM b Representing the output vector of the forward LSTM layer and the output vector of the backward LSTM layer respectively,representing a cascading operation;
step S53, the graphics encoder: adopting CNN convolutional neural network mode to realizeExtracting geometric image features h CNN The CNN convolutional neural network comprises a convolutional layer, a nonlinear activation function and a pooling layer component; performing convolution operation on the geometric image through sliding convolution check in the convolution layer, and capturing local features; meanwhile, a nonlinear activation function is introduced, the expression capacity of the CNN convolutional neural network is increased, the pooling layer assembly reduces the dimension of the feature map, and key features of geometric figures are reserved; the multi-layer stacked convolution layers enable the CNN convolution neural network to gradually extract geometric features of higher levels; obtaining geometric figure characteristics h through full connection layer CNN
Step S54, a joint reasoning module: contextual semantic feature vector h corresponding to the ith word through an attention mechanism i LSTM And geometric figure feature h CNN Fusion is carried out to realize cross-boundary semantic fusion and alignment, and a context semantic feature vector h corresponding to the ith word containing an attention mechanism is obtained i LSTM And geometric figure feature h CNN Multimodal feature vector M corresponding to the ith word of information i The calculation process is as formula (5) and formula (6);
(5);
(6);
where Attention represents the Attention mechanism, Q, K, V represents the query vector, key vector and value vector, respectively, softmax is the normalized exponential function, dd is the second dimension of the query vector Q, key vector K,、/>、/>projection parameter matrixes of a query vector Q, a key vector K and a value vector V corresponding to the ith word in a self-attention mechanism are respectively represented; let->Wherein->A parameter matrix learned for a linear layer, D representing a transpose;
step S55, program decoder, multi-modal feature vector M i Feeding the linear layer to obtain an initial state s 0 Hidden state s of bi-directional LSTM layer at time step t t Concatenated with the result of interest, feed into the linear layer with Softmax function to predict the next program token P t Is a distribution of (a).
6. The automatic middle school geometric problem solving method based on contrast learning according to claim 4, wherein the method is characterized in that: generating loss L in step S6 g Adopting the negative log likelihood of the target program, wherein the calculation formula is shown as (7);
) (7);
where θ is a parameter of the loss function,is a program token, y t To generate the target program at time t, y t-1 To generate the target program at time t-1, M i Is a multi-modal feature vector.
7. The automatic middle school geometric problem solving method based on contrast learning according to claim 6, wherein the method is characterized in that: the automatic solving model of the geometric problem in middle school is divided into four modules of a BERT feature encoder, a geometric image generator, a graphic problem generator and a unified big model, wherein the BERT feature encoder respectively serially connects the geometric image generator and the graphic problem generator, the geometric image generator and the graphic problem generator are in parallel structures, and then serially unifies the big model.
CN202410109877.8A 2024-01-26 2024-01-26 An automatic solution method for middle school geometry problems based on contrastive learning Active CN117633643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410109877.8A CN117633643B (en) 2024-01-26 2024-01-26 An automatic solution method for middle school geometry problems based on contrastive learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410109877.8A CN117633643B (en) 2024-01-26 2024-01-26 An automatic solution method for middle school geometry problems based on contrastive learning

Publications (2)

Publication Number Publication Date
CN117633643A true CN117633643A (en) 2024-03-01
CN117633643B CN117633643B (en) 2024-05-14

Family

ID=90037971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410109877.8A Active CN117633643B (en) 2024-01-26 2024-01-26 An automatic solution method for middle school geometry problems based on contrastive learning

Country Status (1)

Country Link
CN (1) CN117633643B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118898722A (en) * 2024-10-10 2024-11-05 厦门理工学院 Automatic problem-solving method for plane geometry based on spatial perception of geometric primitives
CN118898722B (zh) * 2024-10-10 2025-02-11 厦门理工学院 基于几何基元空间感知的平面几何自动解题方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423287A (en) * 2017-07-05 2017-12-01 华中师范大学 The automatic answer method and system of Proving Plane Geometry topic
CN107967318A (en) * 2017-11-23 2018-04-27 北京师范大学 A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
CN113672716A (en) * 2021-08-25 2021-11-19 中山大学·深圳 Geometry problem solving method and model based on deep learning and multimodal numerical reasoning
KR20220075489A (en) * 2020-11-30 2022-06-08 정재훈 Training system for auto generating and providing question
CN115841156A (en) * 2022-11-16 2023-03-24 科大讯飞股份有限公司 Method, device, storage medium and equipment for solving plane geometry problem
CN116028888A (en) * 2023-01-09 2023-04-28 西交利物浦大学 Automatic problem solving method for plane geometry mathematics problem
CN116778518A (en) * 2022-03-10 2023-09-19 暗物智能科技(广州)有限公司 Intelligent solving method and device for geometric topics, electronic equipment and storage medium
CN116955419A (en) * 2022-11-18 2023-10-27 暗物智能科技(广州)有限公司 Geometric question answering method, system and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423287A (en) * 2017-07-05 2017-12-01 华中师范大学 The automatic answer method and system of Proving Plane Geometry topic
CN107967318A (en) * 2017-11-23 2018-04-27 北京师范大学 A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
KR20220075489A (en) * 2020-11-30 2022-06-08 정재훈 Training system for auto generating and providing question
CN113672716A (en) * 2021-08-25 2021-11-19 中山大学·深圳 Geometry problem solving method and model based on deep learning and multimodal numerical reasoning
CN116778518A (en) * 2022-03-10 2023-09-19 暗物智能科技(广州)有限公司 Intelligent solving method and device for geometric topics, electronic equipment and storage medium
CN115841156A (en) * 2022-11-16 2023-03-24 科大讯飞股份有限公司 Method, device, storage medium and equipment for solving plane geometry problem
CN116955419A (en) * 2022-11-18 2023-10-27 暗物智能科技(广州)有限公司 Geometric question answering method, system and electronic equipment
CN116028888A (en) * 2023-01-09 2023-04-28 西交利物浦大学 Automatic problem solving method for plane geometry mathematics problem

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VENBIN GAN ET AL: "Automatic understanding and formalization of natural language geometry problems using syntax-semantics models", 《INTERNATIONAL JOURNAL OF INNOVATIVE》, vol. 14, no. 1, 28 February 2018 (2018-02-28), pages 83 - 98 *
王奕然: "基于自学习的自动解题系统设计与实现", 《中国优秀硕士学位论文全文数据库(电子期刊)》, vol. 2023, no. 01, 15 January 2023 (2023-01-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118898722A (en) * 2024-10-10 2024-11-05 厦门理工学院 Automatic problem-solving method for plane geometry based on spatial perception of geometric primitives
CN118898722B (zh) * 2024-10-10 2025-02-11 厦门理工学院 基于几何基元空间感知的平面几何自动解题方法

Also Published As

Publication number Publication date
CN117633643B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
Ding et al. Open-vocabulary universal image segmentation with maskclip
Altwaijry et al. Arabic handwriting recognition system using convolutional neural network
CN113535904B (en) Aspect level emotion analysis method based on graph neural network
Sharma et al. A survey of methods, datasets and evaluation metrics for visual question answering
CN108121702B (en) Method and system for evaluating and reading mathematical subjective questions
CN111008293A (en) Visual question-answering method based on structured semantic representation
CN114419642A (en) Method, device and system for extracting key value pair information in document image
CN111428513A (en) A Fake Review Analysis Method Based on Convolutional Neural Network
CN113536798B (en) Multi-instance document key information extraction method and system
CN115935969A (en) Heterogeneous data feature extraction method based on multi-mode information fusion
CN117173450A (en) Traffic scene generation type image description method
CN114756681A (en) Evaluation text fine-grained suggestion mining method based on multi-attention fusion
CN112069825A (en) Entity relation joint extraction method for alert condition record data
Das et al. Determining attention mechanism for visual sentiment analysis of an image using svm classifier in deep learning based architecture
CN113010662B (en) A hierarchical conversational machine reading comprehension system and method
CN117633643B (en) An automatic solution method for middle school geometry problems based on contrastive learning
Kuang et al. Multi-label image classification with multi-layered multi-perspective dynamic semantic representation
CN114417044A (en) Image question answering method and device
CN114818739A (en) Visual question-answering method optimized by using position information
CN114358579A (en) Evaluation method, evaluation device, electronic device, and computer-readable storage medium
CN114372128A (en) An automatic solution method and system for the volume problem of rotationally symmetric geometry
CN113723367A (en) Answer determining method, question judging method and device and electronic equipment
Lee et al. Optical character recognition for handwritten mathematical expressions in educational humanoid robots
Seman et al. Classification of handwriting impairment using CNN for potential dyslexia symptom
CN116595992B (en) A single-step extraction method of terms and types of binary pairs and its model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant