CN117633643A - Automatic middle school geometric problem solving method based on contrast learning - Google Patents
Automatic middle school geometric problem solving method based on contrast learning Download PDFInfo
- Publication number
- CN117633643A CN117633643A CN202410109877.8A CN202410109877A CN117633643A CN 117633643 A CN117633643 A CN 117633643A CN 202410109877 A CN202410109877 A CN 202410109877A CN 117633643 A CN117633643 A CN 117633643A
- Authority
- CN
- China
- Prior art keywords
- geometric
- feature
- middle school
- questions
- cnn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 239000013598 vector Substances 0.000 claims abstract description 96
- 238000013527 convolutional neural network Methods 0.000 claims description 64
- 230000006870 function Effects 0.000 claims description 25
- 230000008569 process Effects 0.000 claims description 18
- 230000002457 bidirectional effect Effects 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 15
- 238000012360 testing method Methods 0.000 claims description 14
- 230000007246 mechanism Effects 0.000 claims description 13
- 238000012795 verification Methods 0.000 claims description 11
- 238000013145 classification model Methods 0.000 claims description 9
- 238000007689 inspection Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000010586 diagram Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000000052 comparative effect Effects 0.000 description 11
- 238000000605 extraction Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
- G06F18/15—Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Business, Economics & Management (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Computing Systems (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Educational Technology (AREA)
- Computational Linguistics (AREA)
- Marketing (AREA)
- Economics (AREA)
- General Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域Technical field
本发明涉及机器学习系统领域,具体为一种基于对比学习的中学几何问题自动求解方法。The invention relates to the field of machine learning systems, and is specifically an automatic solution method for middle school geometry problems based on comparative learning.
背景技术Background technique
近年来,研究者们开发的机器学习系统自动解决数学应用题(MWP),由于其较高的学术价值和在智慧教育中的巨大应用潜力而日益受到关注。现有的方法大多集中于解决算术和代数问题,包括传统的机器学习方法和基于网络的模型,而解决几何问题一直鲜有研究。几何作为一个经典的数学问题,其在中学教育中占据了很大一部分。由于几何问题的挑战性和数据特性,它也可以作为一个多模态的数值推理基准,需要在图解和文本上进行联合推理。In recent years, the machine learning system developed by researchers to automatically solve mathematical word problems (MWP) has attracted increasing attention due to its high academic value and huge application potential in smart education. Most existing methods focus on solving arithmetic and algebraic problems, including traditional machine learning methods and network-based models, while solving geometric problems has been rarely studied. As a classic mathematical problem, geometry occupies a large part in secondary education. Due to the challenging nature of geometric problems and data characteristics, it can also serve as a multimodal numerical reasoning benchmark that requires joint reasoning on diagrams and text.
通常来说,一道典型的几何题型主要由文字和几何图形组成。与数学应用题只涉及问题文本相比,几何题型提出了以下新的挑战。首先,附加的问题图提供了问题文本中缺少的基本信息,如线和点的相对位置;因此,求解器应该具有解析关系图的能力。其次,要解决一个几何问题,我们需要同时理解和对齐文本和图表的语义。然而,问题文本往往包含一些对图元的歧义引用和隐含关系,这增加了文本与图元联合推理的难度。第三,许多几何问题在解题过程中需要额外的定理知识。尽管以往有一些方法试图解决上述问题,但其几何问题求解系统的性能远不能令人满意。它们高度依赖于有限的手工规则,并且只在小规模的数据集上验证,这使得它很难推广到更复杂和真实世界的情况。此外,求解过程复杂,这意味着人类很难理解和检验它的可靠性。Generally speaking, a typical geometry question mainly consists of text and geometric figures. Compared with mathematics word problems that only involve problem text, geometry problem types pose the following new challenges. First, the attached problem graph provides basic information missing from the problem text, such as the relative positions of lines and points; therefore, the solver should have the ability to parse relational graphs. Second, to solve a geometric problem, we need to understand and align the semantics of text and diagrams simultaneously. However, question texts often contain some ambiguous references and implicit relationships to primitives, which increases the difficulty of joint reasoning between text and primitives. Third, many geometric problems require additional theorem knowledge during the solution process. Although some methods have tried to solve the above problems in the past, the performance of their geometric problem solving systems is far from satisfactory. They are highly dependent on limited handcrafted rules and are only validated on small-scale datasets, which makes it difficult to generalize to more complex and real-world situations. Furthermore, the complexity of the solution process means that it is difficult for humans to understand and verify its reliability.
最近,许多工作为各种视觉语言推理和生成任务提出了统一的模型,因为底层的视觉/语言理解和推理能力在很大程度上是共同的。受主流进展的启发,我们认为统一的几何问题求解模型也是必要的。首先,几何题型分为自带几何图形的题型,俗称有图几何题;和原文没有图形而需要自己作图来辅助解题的题型,俗称无图几何题。几何题中无论什么题型,在解题过程中都共享几何推理中的一些基本技能和知识。因此,探索统一神经网络在数学领域的一般理解和推理能力是一个很有意义的课题。此外,统一模型不需要辅助模型来确定问题是有图几何题还是无图几何题。这大大提高了解题的效率,并且还减少了因分类题型而出现的错误,从而使得模型能更好的完成解题任务。为此,建立一个在数据层和模型层统一处理几何问题的框架是有价值的,也是值得期待的。Recently, many works have proposed unified models for various visual language reasoning and generation tasks, since the underlying visual/language understanding and reasoning capabilities are largely common. Inspired by mainstream advances, we believe that a unified geometric problem solving model is also necessary. First of all, geometry questions are divided into question types that have their own geometric figures, commonly known as geometry questions with figures; and question types that do not have graphics in the original text and require drawings to assist in solving the problem, commonly known as geometry questions without figures. Regardless of the type of geometry questions, some basic skills and knowledge in geometric reasoning are shared in the problem-solving process. Therefore, exploring the general understanding and reasoning capabilities of unified neural networks in the field of mathematics is a very meaningful topic. Furthermore, the unified model does not require an auxiliary model to determine whether a problem is a graph geometry problem or a graph geometry problem. This greatly improves the efficiency of problem solving and reduces errors caused by classifying problem types, allowing the model to better complete problem-solving tasks. For this reason, it is valuable and worth looking forward to establishing a framework that handles geometric problems uniformly at the data layer and model layer.
发明内容Contents of the invention
为了解决上述技术问题,本发明提出了一种基于对比学习的中学几何问题自动求解方法,从题型视角出发,将几何题型分为两种:有图几何题和无图几何题,针对以上两种题型分别提出了对应的解决方案,最终形成了一个能解决这两类题型的中学几何数学统一大模型。In order to solve the above technical problems, the present invention proposes an automatic solution method for middle school geometry problems based on comparative learning. From the perspective of question types, the geometry question types are divided into two types: geometry questions with figures and geometry questions without figures. In view of the above Corresponding solutions were proposed for the two types of questions, and a unified model of middle school geometry and mathematics that could solve these two types of questions was finally formed.
本发明采用的技术方案如下:一种基于对比学习的中学几何问题自动求解方法,其方法步骤如下:The technical solution adopted by the present invention is as follows: an automatic solution method for middle school geometry problems based on comparative learning, and the method steps are as follows:
步骤S1,数据集构建:收集若干道中学几何题及答案;并分别按照训练集、验证集、测试集对收集若干道几何题及答案进行划分,得到所需的中学几何数据集;Step S1, data set construction: collect several middle school geometry questions and answers; and divide the collected geometry questions and answers according to the training set, verification set, and test set to obtain the required middle school geometry data set;
步骤S2,任务形式化定义:给定包含N条中学几何数据集,经过题型分类器划分成:无图中学几何题数据集B和有图中学几何题数据集C;Step S2, formal definition of the task: given a middle school geometry data set containing N pieces, it is divided into a middle school geometry problem data set without pictures and a middle school geometry problem data set C with pictures through a question type classifier;
步骤S3,无图中学几何题数据集B中有y道需要自己作图的几何题型by和有图中学几何题数据集C中z道自带图形的几何题型cz输入到中学几何问题自动求解模型中的BERT特征编码器中;获取中学几何题题干中的所有字嵌入特征向量;Step S3, there are y geometry question types b y in the middle school geometry question data set B without figures that require drawings by oneself, and z geometry question types c z in the middle school geometry question data set C with figures that have their own graphics, and are input to middle school geometry. In the BERT feature encoder in the automatic problem solving model; obtain all word embedding feature vectors in the middle school geometry question stem;
步骤S4,将BERT特征编码器获得的无图中学几何题题干中的字嵌入特征向量输入到几何图像生成器中,几何图像生成器基于对比学习模型训练与微调得到,用于生成中学几何题所需要的几何图形,采用人工监督的方式进行对比学习模型训练,通过均方误差损失函数计算几何图形生成损失Lprior,优化更新BERT特征编码器和几何图像生成器的参数,获得几何图形;Step S4: Input the word embedding feature vector in the stem of the middle school geometry question without pictures obtained by the BERT feature encoder into the geometric image generator. The geometric image generator is trained and fine-tuned based on the contrastive learning model and used to generate middle school geometry questions. For the required geometric figures, manual supervision is used for comparative learning model training. The geometric figure generation loss L prior is calculated through the mean square error loss function, and the parameters of the BERT feature encoder and geometric image generator are optimized and updated to obtain the geometric figure;
步骤S5,将BERT特征编码器获得的有图中学几何题题干中的字嵌入特征向量以及无图中学几何题和几何图像生成器所生成对应的几何图形输入到有图解题器中,然后有图解题器中内含的图形编码器将几何图形进行编码并特征提取,与BERT特征编码器的字嵌入特征向量进行中学几何题干与几何图形的对齐操作得到最终的多模态特征向量;Step S5: Input the word embedding feature vectors in the stems of the geometry questions in the middle school with pictures obtained by the BERT feature encoder and the corresponding geometric figures generated by the geometry questions in the middle school without pictures and the geometric image generator into the problem solver with pictures, and then The graphics encoder included in the graphic problem solver encodes the geometric figures and extracts features, and performs the alignment operation between the middle school geometry question stem and the geometric figures with the word embedding feature vector of the BERT feature encoder to obtain the final multi-modal feature vector. ;
步骤S6,有图解题器中的程序解码器在多模态特征向量的引导下顺序生成解题程序,采用负对数似然损失函数计算解题错误的生成损失Lg,得到准确率高的解题答案;Step S6, the program decoder in the graphical problem solver sequentially generates problem-solving programs under the guidance of multi-modal feature vectors, and uses the negative log-likelihood loss function to calculate the generation loss L g of problem-solving errors, obtaining a high accuracy The solution to the problem;
步骤S7,将几何图像生成器和有图解题器合在一起测试,形成一个既能解决需要自己作图的无图几何题型,也能解决自带图形的几何题型的统一大模型。Step S7: Combine the geometric image generator and the problem solver with pictures to test together to form a unified large model that can solve not only geometry problems without pictures that require drawing by yourself, but also geometry problems with own graphics.
进一步的,步骤S1中数据集构建,收集若干道中学几何题型及答案,执行以下任务;具体为:Further, in step S1, the data set is constructed, a number of middle school geometry questions and answers are collected, and the following tasks are performed; specifically:
步骤S11,去除重复的中学几何题型及答案;Step S11, remove duplicate middle school geometry question types and answers;
步骤S12,将中学几何题型及答案分类成两种几何题型,有图形的自动分类为有图中学几何题,只有题干而没有几何图形的分类为无图中学几何题;Step S12, classify middle school geometry questions and answers into two types of geometry questions. Those with figures are automatically classified as geometry questions with pictures, and those with only question stems but no geometric figures are classified as geometry questions without pictures;
步骤S13,对中学几何题型进行分类检验,即对同一中学几何题型分类结果采用人工检查和校验;Step S13, perform classification inspection on middle school geometry question types, that is, manual inspection and verification are performed on the classification results of the same middle school geometry question type;
步骤S14,经过人工检查和校验后,按照训练集:验证集:测试集的比例=8:1:1对中学几何题型及答案进行划分;Step S14, after manual inspection and verification, the middle school geometry questions and answers are divided according to the ratio of training set: validation set: test set = 8:1:1;
步骤S15,中学几何题型及答案进行划分后,将训练集以及验证集的中学几何题型进行人工的解题标注;根据答案将解题步骤提炼出来,采用人工方式将解题步骤标注成计算机能识别的程序语言。Step S15: After the middle school geometry question types and answers are divided, the middle school geometry question types of the training set and the verification set are manually annotated; the problem-solving steps are extracted based on the answers, and the problem-solving steps are manually annotated into computer recognized programming language.
进一步的,步骤S3中的BERT特征编码器,使用Transformer模型架构中的编码器模块,由多层双向编码器组成,计算过程如公式(1)所示;Further, the BERT feature encoder in step S3 uses the encoder module in the Transformer model architecture and is composed of multi-layer bidirectional encoders. The calculation process is as shown in formula (1);
(1); (1);
其中,ei w为第i个字令牌wi经过BERT特征编码器得到的相应字嵌入特征向量。Among them, e i w is the corresponding word embedding feature vector obtained by the BERT feature encoder of the i-th word token w i .
进一步的,步骤S4中的几何图像生成器,具体内容包括:Further, the geometric image generator in step S4 includes:
步骤S41,输入数据到几何图像生成器中,输入数据为无图中学几何题题干中的相应字嵌入特征向量;Step S41, input data into the geometric image generator, where the input data is the corresponding word embedding feature vector in the geometry question stem of the middle school without pictures;
步骤S42,几何图像生成器是改编的对比学习模型,对比学习模型是一个文本-图像对的分类模型,对对比学习模型进行训练和微调来达到生成几何图像的下游任务;Step S42, the geometric image generator is an adapted contrastive learning model. The contrastive learning model is a text-image pair classification model. The contrastive learning model is trained and fine-tuned to achieve the downstream task of generating geometric images;
步骤S43,对对比学习模型进行训练:Step S43, train the contrastive learning model:
收集并整理有图几何题数据集,得到中学几何题干和对应的几何图形;Collect and organize the data set of geometry questions with figures to obtain middle school geometry questions and corresponding geometric figures;
分别对中学几何题干和对应的几何图形进行特征提取得到字嵌入特征向量ei w和几何图形特征hCNN,形成文本-图像对;Feature extraction was performed on the middle school geometry question stem and the corresponding geometric figures respectively to obtain the word embedding feature vector e i w and the geometric figure feature h CNN to form a text-image pair;
将文本-图像对的特征输入到文本-图像对的分类模型中去做对比学习,在人工监督的情况下将其中互相匹配的文本-图像对标记为正样本,不匹配的文本-图像对标记为负样本;Input the characteristics of text-image pairs into the classification model of text-image pairs for comparative learning. Under manual supervision, the text-image pairs that match each other are marked as positive samples, and the unmatched text-image pairs are marked as positive samples. is a negative sample;
文本-图像对的分类模型能通过正样本、负样本得到中学几何题干和对应的几何图形,即给出一个字嵌入特征向量ei w找出对应的几何图形特征hCNN;The classification model of text-image pairs can obtain middle school geometry questions and corresponding geometric figures through positive samples and negative samples, that is, a word embedding feature vector e i w is given to find the corresponding geometric figure feature h CNN ;
步骤S44,对对比学习模型进行微调:Step S44, fine-tune the contrastive learning model:
定义文本为x,几何图形为y,产生的图形编码引入Prior:,计算过程如 公式(2)所示,其中Prior产生的图形编码由对比学习模型的图形编码当作真值训练得到; Define the text as x, the geometric figure as y, and introduce the resulting graphics code into Prior: , the calculation process is shown in formula (2), in which the graphic code generated by Prior is obtained by training the graphic code of the contrastive learning model as a true value;
(2); (2);
其中,P(y|x)表示根据本文x来生成几何图形y;表示为经过对比学习 模型训练后,能够根据字嵌入特征向量ei w来生成几何图形特征hCNN; 表示根据 文本x找到几何图形特征hCNN,然后将几何图形特征hCNN解码生成几何图形y;表 示为根据文本x找到几何图形特征hCNN以及解码后对应的几何图形y,表示为根据 文本x找到几何图形特征hCNN; Among them, P(y|x) represents the generation of geometric figure y based on x in this article; It is expressed that after training by the contrastive learning model, the geometric feature h CNN can be generated according to the word embedding feature vector e i w ; Indicates that the geometric feature h CNN is found based on the text x, and then the geometric feature h CNN is decoded to generate the geometric figure y; It is expressed as finding the geometric figure feature h CNN based on the text x and the corresponding geometric figure y after decoding, It is expressed as finding the geometric feature h CNN based on the text x;
步骤S45,几何图形生成损失Lprior:根据对对比学习模型微调中的Prior,采用均方误差损失函数进行几何图形的预测,计算过程如公式(3)所示;Step S45, geometric figure generation loss L prior : According to the Prior in the fine-tuning of the contrastive learning model, the mean square error loss function is used to predict the geometric figures. The calculation process is as shown in formula (3);
(3); (3);
其中,Lprior表示几何图形生成损失,表示将前i次几何图形生成损失进行求和,T 为次数,h(i) CNN表示为第i次生成的几何图形特征hCNN;表示用文本x来第i次生成 的几何图形特征h(i) CNN;表示用文本x来第i次生成的几何图形特 征h(i) CNN与几何图形特征hCNN做差,其中为可调的参数量。 Among them, L prior represents the geometry generation loss, Indicates the summation of the previous i geometric figure generation losses, T is the number of times, h (i) CNN is expressed as the i-th generated geometric figure feature h CNN ; Represents the geometric feature h (i) CNN generated for the i-th time using text x; Indicates the difference between the geometric feature h (i) generated by text x for the i-th time between CNN and geometric feature h CNN , where is an adjustable parameter amount.
进一步的,步骤S5中有图解题器,包含双向LSTM层、图形编码器、联合推理模块以及程序解码器四大模块;具体内容包括:Furthermore, step S5 includes a graphical problem solver, which includes four modules: bidirectional LSTM layer, graph encoder, joint reasoning module and program decoder; specific contents include:
步骤S51,输入数据到有图解题器中,输入数据包括BERT特征编码器获得的有图中学几何题题干中的字嵌入特征向量以及无图中学几何题和几何图像生成器所生成对应的几何图形;Step S51, input data into the problem solver with pictures. The input data includes the word embedding feature vectors in the stems of the geometry questions in the middle school with pictures obtained by the BERT feature encoder and the corresponding character embedding feature vectors in the geometry questions in the middle school without pictures and generated by the geometric image generator. geometric figures;
步骤S52,双向LSTM层:BERT特征编码器获得的有图中学几何题题干中的字嵌入特征向量ei w输入到双向LSTM层中,利用双向LSTM层获取数学几何文本中第i字对应的上下文语义特征向量,即将字嵌入特征向量ei w分别对应输入到前向的LSTM层和后向的LSTM层中,如公式(4)所示;Step S52, bidirectional LSTM layer: The word embedding feature vector e i w in the middle school geometry question stem with pictures obtained by the BERT feature encoder is input into the bidirectional LSTM layer, and the bidirectional LSTM layer is used to obtain the corresponding character of the i-th word in the mathematical geometry text. The contextual semantic feature vector, that is, the word embedding feature vector e i w , is input into the forward LSTM layer and the backward LSTM layer respectively, as shown in formula (4);
(4); (4);
其中,hi LSTM为数学几何文本中第i字对应的上下文语义特征向量,LSTMf、LSTMb分 别表示前向LSTM层的输出向量和后向LSTM层的输出向量,表示级联操作; Among them, h i LSTM is the contextual semantic feature vector corresponding to the i-th word in the mathematical geometry text, LSTM f and LSTM b respectively represent the output vector of the forward LSTM layer and the output vector of the backward LSTM layer. Represents cascading operations;
步骤S53,图形编码器:采用CNN卷积神经网络的方式来提取几何图像特征hCNN,CNN卷积神经网络包括卷积层、非线性激活函数和池化层组件;在卷积层中通过滑动卷积核对几何图像进行卷积操作,捕捉局部特征;同时引入非线性激活函数,增加CNN卷积神经网络的表达能力,池化层组件降低特征图的维度,保留几何图形的关键特征;多层堆叠的卷积层使CNN卷积神经网络逐渐提取更高级别的几何特征;通过全连接层得到几何图形特征hCNN;Step S53, graphics encoder: Use CNN convolutional neural network to extract geometric image features h CNN . The CNN convolutional neural network includes convolutional layers, nonlinear activation functions and pooling layer components; in the convolutional layer by sliding The convolution kernel performs a convolution operation on the geometric image to capture local features; at the same time, a nonlinear activation function is introduced to increase the expressive ability of the CNN convolutional neural network; the pooling layer component reduces the dimension of the feature map and retains the key features of the geometric figure; multi-layer The stacked convolutional layers enable the CNN convolutional neural network to gradually extract higher-level geometric features; the geometric feature h CNN is obtained through the fully connected layer;
步骤S54,联合推理模块:通过注意力机制将第i字对应的上下文语义特征向量hi LSTM与几何图形特征hCNN进行融合,实现跨界语义融合和对齐,获得蕴含注意力机制的第i字对应的上下文语义特征向量hi LSTM与几何图形特征hCNN信息的第i个字对应的多模态特征向量Mi,计算过程如公式(5)、公式(6);Step S54, joint reasoning module: fuse the contextual semantic feature vector h i LSTM corresponding to the i-th word with the geometric feature h CNN through the attention mechanism to achieve cross-border semantic fusion and alignment, and obtain the i-th word containing the attention mechanism The corresponding contextual semantic feature vector h i LSTM and the multi-modal feature vector M i corresponding to the i-th word of the geometric feature h CNN information, the calculation process is as follows: Formula (5) and Formula (6);
(5); (5);
(6); (6);
其中,Attention表示注意力机制,Q、K、V分别表示查询向量、键向量和值向量, Softmax为归一化指数函数,dd为查询向量Q、键向量K的第二维度大小,、、分别 表示自注意力机制时第i字对应的查询向量Q、键向量K和值向量V的投影参数矩阵;令、,其中为线性层学习的参数矩阵,D表示转置; Among them, Attention represents the attention mechanism, Q, K, V represent the query vector, key vector and value vector respectively, Softmax is the normalized exponential function, dd is the second dimension size of the query vector Q and the key vector K, , , Respectively represent the projection parameter matrix of the query vector Q, key vector K and value vector V corresponding to the i-th word in the self-attention mechanism; let , ,in The parameter matrix learned for the linear layer, D represents the transpose;
步骤S55,程序解码器:多模态特征向量Mi馈入线性层以获得初始状态s0,双向LSTM层在时间步长t处的隐藏状态st与关注结果级联,用Softmax函数馈送到线性层,以预测下一个程序令牌Pt的分布。Step S55, program decoder: the multi-modal feature vector Mi is fed into the linear layer to obtain the initial state s 0 , the hidden state s t of the bidirectional LSTM layer at time step t is cascaded with the attention result, and is fed to the linear layer with the Softmax function Linear layer to predict the distribution of the next program token P t .
进一步的,步骤S6中生成损失Lg采用目标程序的负对数似然,其计算公式如(7)所示;Further, the generation loss L g in step S6 adopts the negative log likelihood of the target program, and its calculation formula is as shown in (7);
) (7); )(7);
其中,θ是损失函数的参数,Pt是程序令牌,yt为要生成t时刻的目标程序,yt-1为要生成t-1时刻的目标程序,Mi是多模态特征向量。Among them, θ is the parameter of the loss function, P t is the program token, y t is the target program to be generated at time t, y t-1 is the target program to be generated at time t-1, M i is the multi-modal feature vector .
进一步的,本发明的中学几何问题自动求解模型,分为BERT特征编码器、几何图像生成器、有图解题器和统一大模型四大模块,BERT特征编码器分别串行几何图像生成器与有图解题器,几何图像生成器与有图解题器呈并行结构,之后串行统一大模型。Furthermore, the automatic solution model for middle school geometry problems of the present invention is divided into four modules: BERT feature encoder, geometric image generator, graphic problem solver and unified large model. The BERT feature encoder serially connects the geometric image generator and The graphical problem solver, the geometric image generator and the graphical problem solver have a parallel structure, and then the large model is unified in series.
本发明的有益效果是:(1)首先,从人教版初中数学教材和试卷中采集数据集,将数据集中的题目及答案进行清洗和归一化,由此构建中学几何数据集。然后,将中学几何数据集通过程序分类成两种题型,分类完后进行人工监督检测以此验证本发明分类的合理性。其次,将分类好的两种题型分别对它们进行处理。针对题目中自带图形的有图几何题:用一个有图解题器模型对其进行解题;针对题目中没有图形而需要自己作图的无图几何题:这类题型就需要借助对比学习模型,并对对比学习模型进行训练和微调,让它更好的生成题目中文本所需要的几何图形。然后,将几何图像生成器生成的几何图形和无图几何题中的题目一起输入到有图解题器中进行模型测试。最后,将训练好的两个模块融合到一起形成一个能同时解决以上两种几何题型的统一大模型。The beneficial effects of the present invention are: (1) First, collect data sets from junior high school mathematics textbooks and test papers of the People's Education Press, clean and normalize the questions and answers in the data set, thereby constructing a middle school geometry data set. Then, the middle school geometry data set is classified into two question types through the program. After the classification is completed, manual supervision and inspection are performed to verify the rationality of the classification of the present invention. Secondly, process the two classified question types separately. For graphed geometry questions that have their own graphics in the question: use a graphed problem solver model to solve them; for non-pictured geometry questions that have no graphics in the question and require your own drawing: this type of question requires the help of comparison Learning model, and training and fine-tuning the comparative learning model so that it can better generate the geometric figures required for the text in the question. Then, the geometric figures generated by the geometric image generator and the questions from the geometry questions without figures are input into the figure solver for model testing. Finally, the two trained modules are merged together to form a unified large model that can solve the above two geometric problem types at the same time.
(2)针对需要作图解决的无图几何题,本发明采用对对比学习模型进行微调和预训练的方式来生成题目中所对应的几何图形,实现了从“无图”到“有图”的这个过程,并为后续的解题做好了铺垫。(2) For geometric problems without drawings that need to be solved by drawing, the present invention adopts the method of fine-tuning and pre-training the comparative learning model to generate the corresponding geometric figures in the problems, realizing the transformation from "without pictures" to "with pictures" This process paved the way for subsequent problem solving.
(3)针对自带图形解决的有图几何题,本发明首先分别对文本和几何图形用编码器进行特征提取,然后用采用协同注意机制来进行跨界语义融合和对齐,最后解决了跨模态联合推理的问题。(3) For graph geometry problems solved with built-in graphics, the present invention first uses encoders to extract features from text and geometric figures respectively, and then uses a collaborative attention mechanism to perform cross-border semantic fusion and alignment, and finally solves the cross-modal problem. The problem of joint reasoning.
(4)本发明从全新的视角,将中学几何问题分成两种题型来分别对应解决,并将它们融合到一起形成一个能解中学几何问题的统一大模型。(4) From a new perspective, this invention divides middle school geometry problems into two types of questions to solve respectively, and integrates them to form a unified large model that can solve middle school geometry problems.
附图说明Description of drawings
图1为本发明的整体模型结构流程图。Figure 1 is a flow chart of the overall model structure of the present invention.
具体实施方式Detailed ways
本发明是这样来工作和实施的,一种基于对比学习的中学几何问题自动求解方法,其方法步骤如下:The present invention works and is implemented in this way. It is an automatic solution method for middle school geometry problems based on comparative learning. The method steps are as follows:
步骤S1,数据集构建:收集若干道中学几何题及答案;并分别按照训练集、验证集、测试集对收集若干道几何题及答案进行划分,得到所需的中学几何数据集;Step S1, data set construction: collect several middle school geometry questions and answers; and divide the collected geometry questions and answers according to the training set, verification set, and test set to obtain the required middle school geometry data set;
步骤S2,任务形式化定义:给定包含N条中学几何数据集,经过题型分类器划分成:无图中学几何题数据集B和有图中学几何题数据集C;Step S2, formal definition of the task: given a middle school geometry data set containing N pieces, it is divided into a middle school geometry problem data set without pictures and a middle school geometry problem data set C with pictures through a question type classifier;
步骤S3,无图中学几何题数据集B中有y道需要自己作图的几何题型by和有图中学几何题数据集C中z道自带图形的几何题型cz输入到中学几何问题自动求解模型中的BERT特征编码器中;获取中学几何题题干中的所有字嵌入特征向量;Step S3, there are y geometry question types b y in the middle school geometry question data set B without figures that require drawings by oneself, and z geometry question types c z in the middle school geometry question data set C with figures that have their own graphics, and are input to middle school geometry. In the BERT feature encoder in the automatic problem solving model; obtain all word embedding feature vectors in the middle school geometry question stem;
步骤S4,将BERT特征编码器获得的无图中学几何题题干中的字嵌入特征向量输入到几何图像生成器中,几何图像生成器基于对比学习模型训练与微调得到,用于生成中学几何题所需要的几何图形,采用人工监督的方式进行对比学习模型训练,通过均方误差损失函数计算几何图形生成损失Lprior,优化更新BERT特征编码器和几何图像生成器的参数,获得几何图形;Step S4: Input the word embedding feature vector in the stem of the middle school geometry question without pictures obtained by the BERT feature encoder into the geometric image generator. The geometric image generator is trained and fine-tuned based on the contrastive learning model and used to generate middle school geometry questions. For the required geometric figures, manual supervision is used for comparative learning model training. The geometric figure generation loss L prior is calculated through the mean square error loss function, and the parameters of the BERT feature encoder and geometric image generator are optimized and updated to obtain the geometric figure;
步骤S5,将BERT特征编码器获得的有图中学几何题题干中的字嵌入特征向量以及无图中学几何题和几何图像生成器所生成对应的几何图形输入到有图解题器中,然后有图解题器中内含的图形编码器将几何图形进行编码并特征提取,与BERT特征编码器的字嵌入特征向量进行中学几何题干与几何图形的对齐操作得到最终的多模态特征向量;Step S5: Input the word embedding feature vectors in the stems of the geometry questions in the middle school with pictures obtained by the BERT feature encoder and the corresponding geometric figures generated by the geometry questions in the middle school without pictures and the geometric image generator into the problem solver with pictures, and then The graphics encoder included in the graphic problem solver encodes the geometric figures and extracts features, and performs the alignment operation between the middle school geometry question stem and the geometric figures with the word embedding feature vector of the BERT feature encoder to obtain the final multi-modal feature vector. ;
步骤S6,有图解题器中的程序解码器在多模态特征向量的引导下顺序生成解题程序,采用负对数似然损失函数计算解题错误的生成损失Lg,得到准确率高的解题答案;Step S6, the program decoder in the graphical problem solver sequentially generates problem-solving programs under the guidance of multi-modal feature vectors, and uses the negative log-likelihood loss function to calculate the generation loss L g of problem-solving errors, obtaining a high accuracy The solution to the problem;
步骤S7,将几何图像生成器和有图解题器合在一起测试,形成一个既能解决需要自己作图的无图几何题型,也能解决自带图形的几何题型的统一大模型。Step S7: Combine the geometric image generator and the problem solver with pictures to test together to form a unified large model that can solve not only geometry problems without pictures that require drawing by yourself, but also geometry problems with own graphics.
进一步的,步骤S1中数据集构建,手动收集16201道几何题型及答案,几何题型及答案来源于新人教版中学教材、试卷考纲和教案资料,执行以下任务;具体为:Further, in step S1, the data set is constructed, and 16,201 geometry questions and answers are manually collected. The geometry questions and answers come from the New People's Education Press middle school textbooks, test paper syllabus and teaching plan materials, and the following tasks are performed; specifically:
步骤S11,去除重复的几何题型及答案;Step S11, remove duplicate geometry question types and answers;
步骤S12,将几何题型及答案分类成两种几何题型,有图形的自动分类为有图几何题,只有题干而没有几何图的分类为无图几何题;Step S12, classify the geometry question types and answers into two geometry question types. Those with figures are automatically classified as geometry questions with figures, and those with only question stems but no geometric figures are classified as geometry questions without figures;
步骤S13,对几何题型进行分类检验,即采用人工检查对同一几何题型分类结果进行校验,保证分类合理;Step S13, perform classification inspection on the geometry question type, that is, use manual inspection to verify the classification results of the same geometry question type to ensure that the classification is reasonable;
步骤S14,经过人工检查和校验后,保留14334道几何题型及答案,有图形的自动分类为有图几何题有9922道,没有几何图的自动分类为无图几何题有4412道,按照训练集:验证集:测试集的比例=8:1:1对几何题型及答案进行划分;Step S14, after manual inspection and verification, 14,334 geometry questions and answers are retained. There are 9,922 geometry questions with figures automatically classified as geometry questions with figures, and 4,412 geometry questions without geometric figures automatically classified as geometry questions without figures. According to The ratio of training set: validation set: test set = 8:1:1 divides the geometry questions and answers;
步骤S15,经过几何题型及答案的划分后,将训练集以及验证集的几何题型进行人工的解题标注;根据答案将解题步骤提炼出来,然后采用人工手工的方式将解题步骤标注成计算机能识别的程序语言。Step S15: After dividing the geometric question types and answers, manually label the geometric question types of the training set and the verification set; extract the problem-solving steps based on the answers, and then manually label the problem-solving steps. into a programming language that computers can recognize.
进一步的,步骤S3中的BERT特征编码器,使用Transformer模型架构中的编码器模块,由多层双向编码器组成,计算过程如公式(1)所示;Further, the BERT feature encoder in step S3 uses the encoder module in the Transformer model architecture and is composed of multi-layer bidirectional encoders. The calculation process is as shown in formula (1);
(1); (1);
其中,ei w为第i个字令牌wi经过BERT特征编码器得到的相应字嵌入特征向量。Among them, e i w is the corresponding word embedding feature vector obtained by the BERT feature encoder of the i-th word token w i .
进一步的,步骤S4中的几何图像生成器,具体内容包括:Further, the geometric image generator in step S4 includes:
步骤S41,输入数据到几何图像生成器中,输入数据为无图几何题题干中的相应字嵌入特征向量;Step S41, input data into the geometric image generator, where the input data is the corresponding word embedding feature vector in the question stem of the non-picture geometry question;
步骤S42,几何图像生成器是改编的对比学习模型,对比学习模型是一个文本-图像对的分类模型,对对比学习模型进行训练和微调来达到生成几何图像的下游任务;Step S42, the geometric image generator is an adapted contrastive learning model. The contrastive learning model is a text-image pair classification model. The contrastive learning model is trained and fine-tuned to achieve the downstream task of generating geometric images;
步骤S43,对对比学习模型进行训练:Step S43, train the contrastive learning model:
收集并整理有图几何题数据集,得到9922个几何题干和对应的几何图形;Collected and organized a data set of geometry questions with figures, and obtained 9922 geometry questions and corresponding geometric figures;
分别对9922个几何题干和对应的几何图形进行特征提取得到字嵌入特征向量ei w和几何图形特征hCNN,即形成9922个文本-图像对;Feature extraction was performed on 9922 geometric question stems and corresponding geometric figures to obtain word embedding feature vectors e i w and geometric figure features h CNN , forming 9922 text-image pairs;
将9922个文本-图像对的特征输入到文本-图像对的分类模型中去做对比学习,在人工监督的情况下将其中互相匹配的文本-图像对标记为正样本,不匹配的文本-图像对标记为负样本;Input the features of 9922 text-image pairs into the classification model of text-image pairs for comparative learning. Under manual supervision, the text-image pairs that match each other are marked as positive samples, and the unmatched text-image pairs are marked as positive samples. Mark as negative sample;
文本-图像对的分类模型能通过正样本、负样本得到几何题干和对应的几何图形,即给出一个字嵌入特征向量ei w找出对应的几何图形特征hCNN;The classification model of text-image pairs can obtain the geometric question stem and the corresponding geometric figure through positive samples and negative samples, that is, a word embedding feature vector e i w is given to find the corresponding geometric figure feature h CNN ;
步骤S44,对对比学习模型进行微调:Step S44, fine-tune the contrastive learning model:
定义文本为x,几何图形为y,产生的图形编码引入Prior:,计算过程如 公式(2)所示,其中Prior产生的图形编码由对比学习模型的图形编码当作真值训练得到; Define the text as x, the geometric figure as y, and introduce the resulting graphics code into Prior: , the calculation process is shown in formula (2), in which the graphic code generated by Prior is obtained by training the graphic code of the contrastive learning model as a true value;
(2); (2);
其中,P(y|x)表示根据本文x来生成几何图形y;表示为经过对比学习 模型训练后,能够根据字嵌入特征向量ei w来生成几何图形特征hCNN; 表示根据 文本x找到几何图形特征hCNN,然后将几何图形特征hCNN解码生成几何图形y;表 示为根据文本x找到几何图形特征hCNN以及其解码后对应的几何图形y,表示为根 据文本x找到几何图形特征hCNN;Among them, P(y|x) represents the generation of geometric figure y based on this article x; It is expressed that after training by the contrastive learning model, the geometric feature h CNN can be generated according to the word embedding feature vector e i w ; Indicates that the geometric feature h CNN is found based on the text x, and then the geometric feature h CNN is decoded to generate the geometric figure y; It is expressed as finding the geometric figure feature h CNN based on the text x and its decoded corresponding geometric figure y, It is expressed as finding the geometric feature h CNN based on the text x;
步骤S45,几何图形生成损失Lprior:根据对对比学习模型微调中的Prior,采用均方误差损失函数进行几何图形的预测,计算过程如公式(3)所示;Step S45, geometric figure generation loss L prior : According to the Prior in the fine-tuning of the contrastive learning model, the mean square error loss function is used to predict the geometric figures. The calculation process is as shown in formula (3);
(3); (3);
其中,Lprior表示几何图形生成损失,表示将前i次几何图形生成损失进行求和,T 为次数,h(i) CNN表示为第i次生成的几何图形特征;表示用文本x来第i次生成的几 何图形特征h(i) CNN;表示用文本x来第i次生成的几何图形特征 h(i) CNN与几何图形特征hCNN做差,其中为可调的参数量。 Among them, L prior represents the geometry generation loss, Indicates the summation of the previous i geometric figure generation losses, T is the number of times, h (i) CNN is expressed as the i-th generated geometric figure feature; Represents the geometric feature h (i) CNN generated for the i-th time using text x; Indicates the difference between the geometric feature h (i) generated by text x for the i-th time between CNN and geometric feature h CNN , where is an adjustable parameter amount.
进一步的,步骤S5中有图解题器,包含双向LSTM层、图形编码器、联合推理模块以及程序解码器四大模块;具体内容包括:Furthermore, step S5 includes a graphical problem solver, which includes four modules: bidirectional LSTM layer, graph encoder, joint reasoning module and program decoder; specific contents include:
步骤S51,输入数据到有图解题器中,输入数据包括BERT特征编码器获得的有图几何题题干中的字嵌入特征向量以及无图几何题和几何图像生成器所生成对应的几何图形;Step S51, input data into the problem solver with pictures. The input data includes the word embedding feature vectors in the stems of the geometry questions with pictures obtained by the BERT feature encoder and the corresponding geometric figures generated by the geometry questions without pictures and the geometric image generator. ;
步骤S52,双向LSTM层:BERT特征编码器获得的有图几何题题干中的字嵌入特征向量ei w输入到双向LSTM层中,利用双向LSTM层获取数学几何文本中第i字对应的上下文语义特征向量,即将字嵌入特征向量ei w分别对应输入到前向的LSTM层和后向的LSTM层中,如公式(4)所示;Step S52, Bidirectional LSTM layer: The word embedding feature vector e i w in the question stem of the graph geometry question obtained by the BERT feature encoder is input into the bidirectional LSTM layer, and the bidirectional LSTM layer is used to obtain the context corresponding to the i-th word in the mathematical geometry text. The semantic feature vector, that is, the word embedding feature vector e i w , is input into the forward LSTM layer and the backward LSTM layer respectively, as shown in formula (4);
(4); (4);
其中,hi LSTM为数学几何文本中第i字对应的上下文语义特征向量,LSTMf、LSTMb分 别表示前向LSTM层的输出向量和后向LSTM层的输出向量,表示级联操作; Among them, h i LSTM is the contextual semantic feature vector corresponding to the i-th word in the mathematical geometry text, LSTM f and LSTM b respectively represent the output vector of the forward LSTM layer and the output vector of the backward LSTM layer. Represents cascading operations;
步骤S53,图形编码器:采用CNN卷积神经网络的方式来提取图像特征hCNN,具体来说通过卷积层、非线性激活函数和池化层等组件实现对几何图形的特征提取。在卷积层中,通过滑动卷积核对几何图像进行卷积操作,捕捉局部特征如边缘的线和点。同时引入非线性激活函数,增加网络的表达能力。池化层则降低特征图的维度,保留几何图形的关键特征。多层堆叠的卷积层使网络逐渐提取更高级别的几何特征,如几何图形的形状和点、线的位置关系。最后,通过全连接层得到几何图形的特征;Step S53, graphics encoder: Use CNN convolutional neural network to extract image features h CNN . Specifically, the feature extraction of geometric figures is achieved through components such as convolution layers, nonlinear activation functions, and pooling layers. In the convolution layer, the geometric image is convolved by sliding convolution kernels to capture local features such as edge lines and points. At the same time, a nonlinear activation function is introduced to increase the expressive ability of the network. The pooling layer reduces the dimensionality of the feature map and retains the key features of the geometry. Multiple stacked convolutional layers enable the network to gradually extract higher-level geometric features, such as the shape of geometric figures and the positional relationships of points and lines. Finally, the features of the geometric figures are obtained through the fully connected layer;
步骤S54,联合推理模块:通过注意力机制将第i字对应的上下文语义特征向量hi LSTM与几何图形特征hCNN进行融合,实现跨界语义融合和对齐,获得蕴含注意力机制的第i字对应的上下文语义特征向量h(i) CNN与几何图形特征hCNN信息的第i个字对应的多模态特征向量Mi,计算过程如公式(5)、公式(6);Step S54, joint reasoning module: use the attention mechanism to fuse the contextual semantic feature vector h i LSTM corresponding to the i-th word with the geometric feature h CNN to achieve cross-border semantic fusion and alignment, and obtain the i-th word containing the attention mechanism Corresponding contextual semantic feature vector h (i) CNN and geometric feature h Multi-modal feature vector M i corresponding to the i-th word of CNN information, the calculation process is as follows: Formula (5) and Formula (6);
(5); (5);
(6); (6);
其中,Attention表示注意力机制,Q、K、V分别表示查询向量、键向量和值向量, Softmax为归一化指数函数,dd为查询向量Q、键向量K的第二维度大小,、、分别 表示自注意力机制时第i字对应的查询向量Q、键向量K和值向量V的投影参数矩阵;令、,其中为线性层学习的参数矩阵,D表示转置; Among them, Attention represents the attention mechanism, Q, K, V represent the query vector, key vector and value vector respectively, Softmax is the normalized exponential function, dd is the second dimension size of the query vector Q and the key vector K, , , Respectively represent the projection parameter matrix of the query vector Q, key vector K and value vector V corresponding to the i-th word in the self-attention mechanism; let , ,in The parameter matrix learned for the linear layer, D represents the transpose;
步骤S55,程序解码器:多模态特征向量Mi馈入线性层以获得初始状态s0,双向LSTM层在时间步长t处的隐藏状态st与关注结果级联,用Softmax函数馈送到线性层,以预测下一个程序令牌Pt的分布。Step S55, program decoder: the multi-modal feature vector Mi is fed into the linear layer to obtain the initial state s 0 , the hidden state s t of the bidirectional LSTM layer at time step t is cascaded with the attention result, and is fed to the linear layer with the Softmax function Linear layer to predict the distribution of the next program token P t .
进一步的,步骤S6中生成损失Lg采用目标程序的负对数似然,其计算公式如(7)所示;Further, the generation loss L g in step S6 adopts the negative log likelihood of the target program, and its calculation formula is as shown in (7);
) (7); )(7);
其中,θ是损失函数的参数,Pt是程序令牌,yt为要生成t时刻的目标程序,yt-1为要生成t-1时刻的目标程序,Mi是多模态特征向量。Among them, θ is the parameter of the loss function, P t is the program token, y t is the target program to be generated at time t, y t-1 is the target program to be generated at time t-1, M i is the multi-modal feature vector .
进一步的,本发明的中学几何问题自动求解模型,分为BERT特征编码器、几何图像生成器、有图解题器和统一大模型四大模块,BERT特征编码器分别串行几何图像生成器与有图解题器,几何图像生成器与有图解题器呈并行结构,之后串行统一大模型。Furthermore, the automatic solution model for middle school geometry problems of the present invention is divided into four modules: BERT feature encoder, geometric image generator, graphic problem solver and unified large model. The BERT feature encoder serially connects the geometric image generator and The graphical problem solver, the geometric image generator and the graphical problem solver have a parallel structure, and then the large model is unified in series.
进一步的,统一大模型是经过几何图像生成器以及有图解题器的训练后,进行大模型测试,假如输入一道中学几何题,不管它是有图还是无图,如果是有图几何题,则直接经过特征编码器后输入到有图解题器中进行题目求解;而如果是无图的几何题,几何图像生成器能够生成几何图像让它变成有图几何题然后再输入到有图解题器中加以求解。所以综上,我们做了一个既能处理自带图形的有图几何题,同时也能处理没有几何图形的无图几何题的统一大模型。Furthermore, the unified large model is trained by the geometric image generator and the problem solver with pictures, and then the large model is tested. If a middle school geometry question is input, regardless of whether it has pictures or no pictures, if it is a geometry question with pictures, Then it is directly passed through the feature encoder and then input into the problem solver with a graph for problem solving; and if it is a geometry problem without a graph, the geometric image generator can generate a geometric image to turn it into a geometry problem with a graph and then input it into the graph problem solver. Solve it in the problem solver. So in summary, we have created a unified large model that can handle both graph geometry problems with built-in graphics and graph-free geometry problems without geometric shapes.
如图1所示,图1为模型预测的流程图,流程如下:将N道中学几何数据集输入到中学几何问题自动求解模型当中,然后将N道中学几何分类成无图几何题和有图几何题;对于无图几何题,首先经过BERT特征编码器的特征提取,然后输入到几何图像生成器中,该几何图像生成器通过对比学习,以及训练和微调的方式能够根据题意来生成题目中所描述的几何图形,其中图形的预测采用均方误差损失函数得到预测损失Lprior,由此优化更新几何图像生成器中的参数,提高其生成图形的准确度,最后将几何图像生成器生成的几何图形以及无图几何题输入到有图解题器中进行求解测试;对于有图几何题,首先经过BERT特征编码器的特征提取,然后继续输入到有图解题器中进行求解,其中在训练期间,生成损失Lg是目标程序的负对数似然,可以提高有图解题器的准确性;在训练阶段,中学几何问题自动求解模型会计算联合总损失L=Lprior+Lg,以同时优化无图解题器和有图解题器中的参数,增强两个模块的信息交互;在测试阶段,将几何图像生成器和有图解题器融合在一起形成一个能解这两类几何题型的统一大模型。As shown in Figure 1, Figure 1 is a flow chart of model prediction. The process is as follows: input the N-channel middle school geometry data set into the automatic solving model for middle school geometry problems, and then classify the N-channel middle school geometry into geometry problems without figures and geometry problems with figures. Geometry questions; for graph-free geometry questions, the features are first extracted by the BERT feature encoder and then input into the geometric image generator. The geometric image generator can generate questions according to the meaning of the question through comparative learning, training and fine-tuning. The geometric figure described in , where the prediction of the figure uses the mean square error loss function to obtain the prediction loss L prior , thereby optimizing and updating the parameters in the geometric image generator, improving the accuracy of the generated graphics, and finally generating the geometric image generator The geometric figures and non-graph geometry problems are input into the problem solver with graphs for solution testing; for the geometry problems with graphs, they are first extracted by the BERT feature encoder, and then continue to be input into the problem solver with graphs for solving, where During the training period, the generation loss L g is the negative log likelihood of the target program, which can improve the accuracy of the graphical problem solver; during the training stage, the middle school geometry problem automatic solving model will calculate the joint total loss L = L prior + L g , to simultaneously optimize the parameters in the problem solver without graphics and the problem solver with graphics, and enhance the information interaction between the two modules; in the testing phase, the geometric image generator and the problem solver with graphics are integrated to form a solution A unified large model for these two types of geometric questions.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410109877.8A CN117633643B (en) | 2024-01-26 | 2024-01-26 | An automatic solution method for middle school geometry problems based on contrastive learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410109877.8A CN117633643B (en) | 2024-01-26 | 2024-01-26 | An automatic solution method for middle school geometry problems based on contrastive learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117633643A true CN117633643A (en) | 2024-03-01 |
CN117633643B CN117633643B (en) | 2024-05-14 |
Family
ID=90037971
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410109877.8A Active CN117633643B (en) | 2024-01-26 | 2024-01-26 | An automatic solution method for middle school geometry problems based on contrastive learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117633643B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118898722A (en) * | 2024-10-10 | 2024-11-05 | 厦门理工学院 | Automatic problem-solving method for plane geometry based on spatial perception of geometric primitives |
CN118898722B (zh) * | 2024-10-10 | 2025-02-11 | 厦门理工学院 | 基于几何基元空间感知的平面几何自动解题方法 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423287A (en) * | 2017-07-05 | 2017-12-01 | 华中师范大学 | The automatic answer method and system of Proving Plane Geometry topic |
CN107967318A (en) * | 2017-11-23 | 2018-04-27 | 北京师范大学 | A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets |
CN113672716A (en) * | 2021-08-25 | 2021-11-19 | 中山大学·深圳 | Geometry problem solving method and model based on deep learning and multimodal numerical reasoning |
KR20220075489A (en) * | 2020-11-30 | 2022-06-08 | 정재훈 | Training system for auto generating and providing question |
CN115841156A (en) * | 2022-11-16 | 2023-03-24 | 科大讯飞股份有限公司 | Method, device, storage medium and equipment for solving plane geometry problem |
CN116028888A (en) * | 2023-01-09 | 2023-04-28 | 西交利物浦大学 | Automatic problem solving method for plane geometry mathematics problem |
CN116778518A (en) * | 2022-03-10 | 2023-09-19 | 暗物智能科技(广州)有限公司 | Intelligent solving method and device for geometric topics, electronic equipment and storage medium |
CN116955419A (en) * | 2022-11-18 | 2023-10-27 | 暗物智能科技(广州)有限公司 | Geometric question answering method, system and electronic equipment |
-
2024
- 2024-01-26 CN CN202410109877.8A patent/CN117633643B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423287A (en) * | 2017-07-05 | 2017-12-01 | 华中师范大学 | The automatic answer method and system of Proving Plane Geometry topic |
CN107967318A (en) * | 2017-11-23 | 2018-04-27 | 北京师范大学 | A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets |
KR20220075489A (en) * | 2020-11-30 | 2022-06-08 | 정재훈 | Training system for auto generating and providing question |
CN113672716A (en) * | 2021-08-25 | 2021-11-19 | 中山大学·深圳 | Geometry problem solving method and model based on deep learning and multimodal numerical reasoning |
CN116778518A (en) * | 2022-03-10 | 2023-09-19 | 暗物智能科技(广州)有限公司 | Intelligent solving method and device for geometric topics, electronic equipment and storage medium |
CN115841156A (en) * | 2022-11-16 | 2023-03-24 | 科大讯飞股份有限公司 | Method, device, storage medium and equipment for solving plane geometry problem |
CN116955419A (en) * | 2022-11-18 | 2023-10-27 | 暗物智能科技(广州)有限公司 | Geometric question answering method, system and electronic equipment |
CN116028888A (en) * | 2023-01-09 | 2023-04-28 | 西交利物浦大学 | Automatic problem solving method for plane geometry mathematics problem |
Non-Patent Citations (2)
Title |
---|
VENBIN GAN ET AL: "Automatic understanding and formalization of natural language geometry problems using syntax-semantics models", 《INTERNATIONAL JOURNAL OF INNOVATIVE》, vol. 14, no. 1, 28 February 2018 (2018-02-28), pages 83 - 98 * |
王奕然: "基于自学习的自动解题系统设计与实现", 《中国优秀硕士学位论文全文数据库(电子期刊)》, vol. 2023, no. 01, 15 January 2023 (2023-01-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118898722A (en) * | 2024-10-10 | 2024-11-05 | 厦门理工学院 | Automatic problem-solving method for plane geometry based on spatial perception of geometric primitives |
CN118898722B (zh) * | 2024-10-10 | 2025-02-11 | 厦门理工学院 | 基于几何基元空间感知的平面几何自动解题方法 |
Also Published As
Publication number | Publication date |
---|---|
CN117633643B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ding et al. | Open-vocabulary universal image segmentation with maskclip | |
Altwaijry et al. | Arabic handwriting recognition system using convolutional neural network | |
CN113535904B (en) | Aspect level emotion analysis method based on graph neural network | |
Sharma et al. | A survey of methods, datasets and evaluation metrics for visual question answering | |
CN108121702B (en) | Method and system for evaluating and reading mathematical subjective questions | |
CN111008293A (en) | Visual question-answering method based on structured semantic representation | |
CN114419642A (en) | Method, device and system for extracting key value pair information in document image | |
CN111428513A (en) | A Fake Review Analysis Method Based on Convolutional Neural Network | |
CN113536798B (en) | Multi-instance document key information extraction method and system | |
CN115935969A (en) | Heterogeneous data feature extraction method based on multi-mode information fusion | |
CN117173450A (en) | Traffic scene generation type image description method | |
CN114756681A (en) | Evaluation text fine-grained suggestion mining method based on multi-attention fusion | |
CN112069825A (en) | Entity relation joint extraction method for alert condition record data | |
Das et al. | Determining attention mechanism for visual sentiment analysis of an image using svm classifier in deep learning based architecture | |
CN113010662B (en) | A hierarchical conversational machine reading comprehension system and method | |
CN117633643B (en) | An automatic solution method for middle school geometry problems based on contrastive learning | |
Kuang et al. | Multi-label image classification with multi-layered multi-perspective dynamic semantic representation | |
CN114417044A (en) | Image question answering method and device | |
CN114818739A (en) | Visual question-answering method optimized by using position information | |
CN114358579A (en) | Evaluation method, evaluation device, electronic device, and computer-readable storage medium | |
CN114372128A (en) | An automatic solution method and system for the volume problem of rotationally symmetric geometry | |
CN113723367A (en) | Answer determining method, question judging method and device and electronic equipment | |
Lee et al. | Optical character recognition for handwritten mathematical expressions in educational humanoid robots | |
Seman et al. | Classification of handwriting impairment using CNN for potential dyslexia symptom | |
CN116595992B (en) | A single-step extraction method of terms and types of binary pairs and its model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |