WO2021164284A1 - Method, apparatus and device for generating reading comprehension question, and storage medium - Google Patents

Method, apparatus and device for generating reading comprehension question, and storage medium Download PDF

Info

Publication number
WO2021164284A1
WO2021164284A1 PCT/CN2020/121523 CN2020121523W WO2021164284A1 WO 2021164284 A1 WO2021164284 A1 WO 2021164284A1 CN 2020121523 W CN2020121523 W CN 2020121523W WO 2021164284 A1 WO2021164284 A1 WO 2021164284A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
phrase
text
vector
preset
Prior art date
Application number
PCT/CN2020/121523
Other languages
French (fr)
Chinese (zh)
Inventor
王燕蒙
许开河
王烨
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021164284A1 publication Critical patent/WO2021164284A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Educational Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

A method, apparatus and device for generating a reading comprehension question, and a storage medium. The method comprises: first, obtaining a reading comprehension source text to be processed (S10), and performing word segmentation on said reading comprehension source text according to the phrase type, so that said reading comprehension source text has characteristic phrases of multiple different phrase types (S20); determining a target phrase type from the phrase type, and obtaining a preset target answer vector corresponding to the target phrase type from a preset storage area (S30); selecting a target characteristic phrase corresponding to the target phrase type from the characteristic phrases, and generating a target word vector corresponding to the target characteristic phrase (S40); obtaining position information of the target characteristic phrase in said reading comprehension source text, and generating a position vector corresponding to the position information (S50); and sending the target word vector corresponding to the target phrase type, the position vector, the preset target answer vector into a preset sequence-to-sequence model, and finally, automatically generating a question text that more fits the original meaning of said reading comprehension text (S60).

Description

生成阅读理解的问题题目的方法、装置、设备及存储介质Method, device, equipment and storage medium for generating reading comprehension questions
本申请要求于2020年2月19日提交中国专利局、申请号为CN202010103758.3、名称为“生成阅读理解的问题题目的方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 19, 2020, with the application number CN202010103758.3, titled "Method, Apparatus, Equipment, and Storage Medium for Generating Reading Comprehension Questions", which The entire content is incorporated into this application by reference.
技术领域Technical field
本申请涉及大数据分析领域,尤其涉及一种生成阅读理解的问题题目的方法、装置、设备及存储介质。This application relates to the field of big data analysis, and in particular to a method, device, equipment and storage medium for generating reading comprehension questions.
背景技术Background technique
从学校教育到职业培训,无论是语言学习还是特定学科和技术的学习,都离不开文本文献的阅读和理解能力。要提高阅读能力,需要学生大量进行阅读并根据相关内容回答问题,提高对文章的理解能力,更重要的是老师需要一种可靠的手段检查学生是否阅读了老师指定的篇章,掌握学生的学习进度,并根据效果安排调整学习计划。传统方法都是人工出题,看学生是否能够正确回答相关问题。伴随着新教材和文章的涌现,人工出题耗时耗力,检查流程无法实现自动化。From school education to vocational training, whether it is language learning or the learning of specific subjects and technologies, it is inseparable from the ability to read and understand texts. To improve reading ability, students need to read a lot and answer questions based on relevant content to improve the comprehension of the article. More importantly, the teacher needs a reliable method to check whether the student has read the chapter specified by the teacher and master the student’s learning progress. , And adjust the study plan according to the effect. The traditional method is to create questions manually to see if the students can answer the relevant questions correctly. With the emergence of new textbooks and articles, manual questioning is time-consuming and labor-intensive, and the inspection process cannot be automated.
目前,越来越多的神经网络被成功的应用于问答系统和其他的阅读理解任务,甚至在某些方面已经超越了人类,但是它们在达到较好水平的同时需要大量的数据来进行支持,而这些数据如果全部通过人工标注又过于需要人力。于是,文本生成问题技术应运而生,问题生成技术要解决的是通过一段文字来生成与之对应的问题,可用于数据增强、对话系统,同时对阅读理解有很大的帮助,针对一段文本生成问题,以用于数据增强、对话系统、阅读理解。At present, more and more neural networks have been successfully applied to question answering systems and other reading comprehension tasks, and have even surpassed humans in some aspects, but they need a lot of data to support them while reaching a better level. And if these data are all manually labeled, it would be too manpower required. Therefore, the text generation problem technology came into being. What the problem generation technology needs to solve is to generate the corresponding problem through a paragraph of text. It can be used for data enhancement, dialogue system, and it is very helpful for reading comprehension. It is aimed at generating a paragraph of text. Questions for data enhancement, dialogue systems, and reading comprehension.
发明人意识到,现有技术中基于文章阅读理解文本生成问题这项技术通常为基于种子词使用模板来进行扩展和检查,这种生成方式容易出现没有结合文本原文本意的现象,通过这种方式生成的问题可能会存在可以从文章中找到多种答案的情况,即通过这种方式生成的文本句式过于单一化,生成的问题过于简单,无法有效地替代人工出题,效果不理想。The inventor realizes that in the prior art, the problem of text generation based on article reading comprehension is usually based on the use of seed words to use templates to expand and check. This generation method is prone to the phenomenon of not combining the original text meaning of the text. In this way There may be situations where multiple answers can be found from the article in the generated questions, that is, the text sentence generated by this method is too simplistic, the generated questions are too simple, and cannot effectively replace manual questions, and the effect is not ideal.
发明内容Summary of the invention
本申请提供了一种生成阅读理解的问题题目方法,所述方法包括以下步骤:This application provides a method for generating reading comprehension questions. The method includes the following steps:
获取待处理的阅读理解源文本;Obtain the source text for reading comprehension to be processed;
对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组;Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;
从所述词组类型中确定目标词组类型,从预设存储区域中获取与所述目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量存在预设映射关系;The target phrase type is determined from the phrase types, and a preset target answer vector corresponding to the target phrase type is obtained from a preset storage area, and there is a preset mapping relationship between the target phrase type and the preset target answer vector ;
从各特征词组中选取与所述目标词组类型对应的目标特征词组,生成与所述目标特征词组对应的目标词向量;Selecting a target feature phrase corresponding to the target phrase type from each feature phrase, and generating a target word vector corresponding to the target feature phrase;
获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量;Acquiring position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information;
将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本。Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence-to-sequence seq2seq model to generate a question title corresponding to the target phrase type text.
本申请还提出一种生成阅读理解的问题题目的装置,所述装置包括:This application also proposes a device for generating reading comprehension questions, and the device includes:
获取模块,用于获取待处理的阅读理解源文本;The acquisition module is used to acquire the source text for reading comprehension to be processed;
分词模块,用于对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组;The word segmentation module is used to segment the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;
确定模块,用于从所述词组类型中确定目标词组类型,从预设存储区域中获取与所述 目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量预设存在预设映射关系;The determining module is configured to determine the target phrase type from the phrase type, and obtain a preset target answer vector corresponding to the target phrase type from a preset storage area, the target phrase type and the preset target answer vector There is a preset mapping relationship by default;
选取模块,用于从各特征词组中选取与所述目标词组类型对应的目标特征词组,生成与所述目标特征词组对应的目标词向量;The selection module is used to select the target feature phrase corresponding to the target phrase type from each feature phrase, and generate a target word vector corresponding to the target feature phrase;
记录模块,用于获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量;A recording module, configured to obtain position information of the target characteristic phrase in the reading comprehension source text, and generate a position vector corresponding to the position information;
生成模块,用于将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本。A generating module, configured to send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the preset sequence-to-sequence seq2seq model to generate the target phrase Question title text corresponding to the type.
本申请还提出一种用于生成阅读理解的问题题目的设备,所述设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的生成阅读理解的问题题目的程序,所述生成阅读理解的问题题目的程序配置为实现如下步骤:The present application also proposes a device for generating reading comprehension questions, the device comprising: a memory, a processor, and a device for generating reading comprehension questions stored on the memory and running on the processor A program, the program for generating reading comprehension questions is configured to implement the following steps:
获取待处理的阅读理解源文本;Obtain the source text for reading comprehension to be processed;
对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组;Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;
从所述词组类型中确定目标词组类型,从预设存储区域中获取与所述目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量存在预设映射关系;The target phrase type is determined from the phrase types, and a preset target answer vector corresponding to the target phrase type is obtained from a preset storage area, and there is a preset mapping relationship between the target phrase type and the preset target answer vector ;
从各特征词组中选取与所述目标词组类型对应的目标特征词组,生成与所述目标特征词组对应的目标词向量;Selecting a target feature phrase corresponding to the target phrase type from each feature phrase, and generating a target word vector corresponding to the target feature phrase;
获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量;Acquiring position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information;
将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本。Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence-to-sequence seq2seq model to generate a question title corresponding to the target phrase type text.
本申请还提出一种存储介质,所述存储介质为计算机可读存储介质;所述计算机可读存储介质存储有生成阅读理解的问题题目的程序,所述生成阅读理解的问题题目的程序配置为实现如下步骤:This application also proposes a storage medium, which is a computer-readable storage medium; the computer-readable storage medium stores a program for generating reading comprehension question questions, and the program for generating reading comprehension question questions is configured as To achieve the following steps:
获取待处理的阅读理解源文本;Obtain the source text for reading comprehension to be processed;
对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组;Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;
从所述词组类型中确定目标词组类型,从预设存储区域中获取与所述目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量存在预设映射关系;The target phrase type is determined from the phrase types, and a preset target answer vector corresponding to the target phrase type is obtained from a preset storage area, and there is a preset mapping relationship between the target phrase type and the preset target answer vector ;
从各特征词组中选取与所述目标词组类型对应的目标特征词组,生成与所述目标特征词组对应的目标词向量;Selecting a target feature phrase corresponding to the target phrase type from each feature phrase, and generating a target word vector corresponding to the target feature phrase;
获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量;Acquiring position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information;
将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本。Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence-to-sequence seq2seq model to generate a question title corresponding to the target phrase type text.
附图说明Description of the drawings
图1为本申请实施例方案涉及的硬件运行环境的用于生成阅读理解的问题题目的设备的结构示意图;FIG. 1 is a schematic structural diagram of a device for generating reading comprehension questions in a hardware operating environment involved in a solution of an embodiment of the application;
图2为本申请一种生成阅读理解的问题题目的方法一实施例的流程示意图;2 is a schematic flowchart of an embodiment of a method for generating reading comprehension questions according to this application;
图3为本申请一种生成阅读理解的问题题目的方法第二实施例的流程示意图;3 is a schematic flowchart of a second embodiment of a method for generating reading comprehension questions according to this application;
图4为本申请一种生成阅读理解的问题题目的方法第三实施例流程示意图;4 is a schematic flowchart of a third embodiment of a method for generating reading comprehension questions according to this application;
图5为本申请一种生成阅读理解的问题题目的装置的结构框图。Fig. 5 is a structural block diagram of a device for generating reading comprehension questions in this application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
参照图1,图1为本申请实施例方案涉及的硬件运行环境的用于生成阅读理解的问题题目的设备的结构示意图。Referring to FIG. 1, FIG. 1 is a schematic structural diagram of a device for generating reading comprehension questions in the hardware operating environment involved in the solution of the embodiment of the application.
如图1所示,该设备可以包括:处理器1001,例如CPU,通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1, the device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
本领域技术人员可以理解,图1中示出的结构并不构成对所述设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。所述生成阅读理解的问题题目的设备可以是台式电脑主机。Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute a limitation on the device, and may include more or less components than shown in the figure, or combine certain components, or arrange different components. The device for generating reading comprehension questions may be a desktop computer host.
如图1所示,作为一种计算机存储介质的存储器1005中可以包括计算机操作系统、网络通信模块、用户接收模块以及生成阅读理解的问题题目的程序。As shown in FIG. 1, the memory 1005 as a computer storage medium may include a computer operating system, a network communication module, a user receiving module, and a program for generating reading comprehension questions.
在图1所示的设备中,本申请的生成阅读理解的问题题目的设备通过处理器1001调用存储器1005中存储的生成阅读理解的问题题目程序,并执行生成阅读理解的问题题目方法的步骤。In the device shown in FIG. 1, the device for generating reading comprehension question questions of the present application calls the question question program for generating reading comprehension stored in the memory 1005 through the processor 1001, and executes the steps of the method for generating reading comprehension question questions.
参照图2,图2本申请一种生成阅读理解的问题题目方法第一实施例的流程示意图。Referring to FIG. 2, FIG. 2 is a schematic flowchart of a first embodiment of a method for generating reading comprehension questions according to the present application.
本实施例中,所述生成阅读理解的问题题目方法包括以下步骤:In this embodiment, the method for generating reading comprehension questions includes the following steps:
步骤S10:获取待处理的阅读理解源文本;Step S10: Obtain the source text for reading comprehension to be processed;
需要说明的是,本实施例的执行主体是上述用于生成阅读理解的问题题目的设备(本实施例简称计算机系统),所述设备装载有生成阅读理解的问题题目程序。本实施例的实施场景可以以老师想要为某一篇英文文章生成若干的阅读理解题目为例。所述阅读理解源文本即为英文文章。It should be noted that the execution subject of this embodiment is the above-mentioned device (referred to as a computer system in this embodiment) for generating reading comprehension question questions, and the device is loaded with a reading comprehension question question program. In the implementation scenario of this embodiment, the teacher wants to generate several reading comprehension questions for a certain English article as an example. The reading comprehension source text is the English article.
步骤S20:对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组;Step S20: Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;
需要说明的是,本实施例的词组类型包括人物词组类型、时间词组类型、以及地点词组类型中的至少一项;It should be noted that the phrase type in this embodiment includes at least one of a person phrase type, a time phrase type, and a location phrase type;
可理解的是,人物词组类型可对应人物答案词、时间词组类型可对应日期答案词、地点词组类型可对应地点答案词;此外,所述词组类型还包括一些非答案词组类型、机构答案词组类型、数字答案词类型等等。It is understandable that the person phrase type can correspond to the person answer word, the time phrase type can correspond to the date answer word, and the location phrase type can correspond to the location answer word; in addition, the phrase type also includes some non-answer phrase types and institutional answer phrase types , Number answer word type, etc.
在具体实现中,针对所述阅读理解源文本会使用专有分词工具,对所述阅读理解源文本按照词组类型进行分词处理,分词结果中会包括所述阅读理解源文本中出现的标注出人名、地名、机构名、时间、数量、日期等专有名词。In specific implementation, a proprietary word segmentation tool will be used for the reading comprehension source text, and the reading comprehension source text will be segmented according to the phrase type, and the word segmentation result will include the name of the labeled person appearing in the reading comprehension source text , Place name, organization name, time, quantity, date and other proper nouns.
具体地,本实施例所使用的专有分词工具可以是NLTK工具(Natural Language Toolkit,自然语言处理工具包工具),NLTK工具是基于python语言实现的一种自然语言工具包,其收集的大量公开数据集、模型上提供了全面、易用的接口,涵盖了分词、词性标注(Part-Of-Speech tag,POS-tag)、命名实体识别(Named Entity Recognition,NER)、句法分析(Syntactic Parse)等各项NLP领域的功能。使用NLTK工具按照词组类型对所述阅读理解源文进行分词,识别所述阅读理解源文本中出现的人名、地名、机构名、时间、数量、日 期等专有名词,并对这些专有名词进行标注。Specifically, the proprietary word segmentation tool used in this embodiment may be the NLTK tool (Natural Language Toolkit, natural language processing toolkit tool). The NLTK tool is a natural language toolkit implemented based on the python language. The data set and model provide a comprehensive and easy-to-use interface, covering word segmentation, part-of-speech tag (POS-tag), named entity recognition (NER), and syntactic analysis (Syntactic Parse) And other functions in the NLP field. Use the NLTK tool to segment the reading comprehension source text according to the type of phrase, identify the proper nouns such as person, place, organization, time, number, and date that appear in the reading comprehension source text, and perform analysis on these proper nouns. Label.
步骤S30:从所述词组类型中确定目标词组类型,从预设存储区域中获取与所述目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量存在预设映射关系;Step S30: Determine the target phrase type from the phrase type, obtain a preset target answer vector corresponding to the target phrase type from a preset storage area, and the target phrase type and the preset target answer vector have a pre-existing relationship. Set up the mapping relationship;
在具体实现中,每种词组类型都会对应一些标准答案,例如所述阅读理解源文本中出现过的时间(时间词组类型)、地点(地点词组类型)、人物(人物词组类型)都对应一些标准答案文本,这些标准答案文本属于出题人预先准备好的文本,这些文本会存储于预设存储区域中,所述预设存储区域可以为数据库,该数据库可装载于所述生成阅读理解的问题题目的设备中。In specific implementation, each type of phrase corresponds to some standard answers. For example, the time (temporal phrase type), location (location phrase type), and person (person phrase type) that appeared in the reading comprehension source text all correspond to some standards. Answer texts. These standard answer texts belong to the texts prepared by the questioner in advance. These texts will be stored in a preset storage area. The preset storage area may be a database, which may be loaded in the question for generating reading comprehension. The device of the subject.
需要说明的是,本实施例中的这些不同词组类型对应的标准答案会以能够匹配seq2seq模型的向量的形式预先存储到数据库中。所述目标词组类型与所述预设目标答案向量预设存在预设映射关系。It should be noted that the standard answers corresponding to these different phrase types in this embodiment will be pre-stored in the database in the form of vectors that can match the seq2seq model. There is a preset mapping relationship between the target phrase type and the preset target answer vector.
具体地,本实施例中,每个题型可对应一种词组类型,一种词组类型可以对应四个标准答案文本,这四个标准答案文本均需要与该词组类型建立预设映射关系;Specifically, in this embodiment, each question type can correspond to one type of phrase, and one type of phrase can correspond to four standard answer texts, and the four standard answer texts need to establish a preset mapping relationship with the phrase type;
相应地,本实施例会预先通过NLTK工具将每个标准答案文本转换为文本向量,进而得到答案向量(answer type embedding),这样在答案文本与词组类型存在预设映射关系的基础上,所述词组类型与所述预设目标答案向量也存在所述预设映射关系。Correspondingly, in this embodiment, each standard answer text is converted into a text vector through the NLTK tool in advance to obtain an answer type embedding. In this way, based on the preset mapping relationship between the answer text and the phrase type, the phrase The type and the preset target answer vector also have the preset mapping relationship.
可理解的是,由于老师要为阅读理解源文本出若干题型,因此计算机系统会对所述阅读理解源文本中各个词组类型进行遍历,将遍历到的词组类型作为目标词组类型,从预设存储区域中获取与所述目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量预设存在预设映射关系;It is understandable that since the teacher has to create several question types for reading and comprehension of the source text, the computer system will traverse the various phrase types in the reading and comprehension source text, and use the traversed phrase type as the target phrase type. Acquiring a preset target answer vector corresponding to the target phrase type in the storage area, and a preset mapping relationship between the target phrase type and the preset target answer vector is preset;
步骤S40:从各特征词组中选取与所述目标词组类型对应的目标特征词组,生成与所述目标特征词组对应的目标词向量;Step S40: Select a target feature phrase corresponding to the target phrase type from each feature phrase, and generate a target word vector corresponding to the target feature phrase;
可理解的是,在进行分词处理之后,计算机系统会从所述阅读理解源文本中的多个特征词组选取与所述目标词组类型对应的目标特征词组,然后通过NLTK工具将所述目标特征词组转换为向量形式,即生成与所述目标特征词组对应的目标词向量(word embedding)。It is understandable that after the word segmentation process, the computer system will select the target characteristic phrase corresponding to the target phrase type from multiple characteristic phrases in the reading comprehension source text, and then use the NLTK tool to convert the target characteristic phrase Converted into a vector form, that is, a target word vector (word embedding) corresponding to the target feature phrase is generated.
步骤S50:获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量;Step S50: Obtain the position information of the target characteristic phrase in the reading comprehension source text, and generate a position vector corresponding to the position information;
可理解的是,计算机系统会确定所述目标特征词组在所述阅读理解源文本中出现的位置,将位置信息转换为向量形式,即生成与所述位置信息对应的位置向量(positional embedding),本实施例引入位置信息向量,这样生成的阅读理解的问题能够更加结合原文的本意。It is understandable that the computer system will determine the position where the target feature phrase appears in the reading comprehension source text, convert the position information into a vector form, that is, generate a positional embedding corresponding to the position information, This embodiment introduces a position information vector, so that the generated reading comprehension problem can be more integrated with the original intention of the original text.
步骤S60:将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本。Step S60: Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence to sequence seq2seq model to generate a sequence corresponding to the target phrase type The text of the question title.
可理解的是,序列到序列Seq2Seq模型是输出的长度不确定时采用的模型,它的模型结构为编码encoder-解码decoder模型。所谓编码,就是将输入序列转化成一个固定长度的向量;解码,就是将之前生成的固定向量再转化成输出序列。It is understandable that the sequence-to-sequence Seq2Seq model is a model used when the output length is uncertain, and its model structure is an encoding encoder-decoding decoder model. The so-called encoding is to convert the input sequence into a fixed-length vector; decoding is to convert the previously generated fixed vector into an output sequence.
在具体实现中,本实施例将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,编码encoder负责将输入序列压缩成指定长度的向量,这个向量就可以看成是这个序列的语义,这个过程称为编码。解码decoder,就是将之前生成的固定向量再转化成输出序列,解码阶段可以看做编码的逆过程:即首先将目标词向量、位置向量、以及答案向量作为输入特征序列,将这些向量看作是这个输入序列的语义,计算机系统根据这些给定的语义向量预测可能出现的文本,对这些预测出的文本作为输出序列进行输出。In specific implementation, this embodiment sends the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the preset sequence to the sequence seq2seq model, and the encoding encoder is responsible for Compress the input sequence into a vector of a specified length. This vector can be regarded as the semantics of the sequence. This process is called encoding. Decoding the decoder is to convert the previously generated fixed vector into an output sequence. The decoding stage can be regarded as the inverse process of encoding: first, the target word vector, position vector, and answer vector are used as the input feature sequence, and these vectors are regarded as For the semantics of this input sequence, the computer system predicts possible texts based on these given semantic vectors, and outputs these predicted texts as output sequences.
具体地,计算机系统首先将上述输入特征序列输入到seq2seq模型的multi-head self-attention层,再做残差连接(residual connection)处理和归一化处理(Layer Normalization);然后将经过处理的输入特征序列输入到seq2seq模型的position-wise feed-forward network层中,再进行残差连接处理和归一化处理,生成输入处理序列;Specifically, the computer system first inputs the above-mentioned input feature sequence to the multi-head self-attention layer of the seq2seq model, and then performs residual connection processing and normalization processing (Layer normalization); then, the processed input The feature sequence is input into the position-wise feed-forward network layer of the seq2seq model, and then the residual connection processing and normalization processing are performed to generate the input processing sequence;
进一步地,对所述目标特征词所在的句子进行分词,将分词结果作为输出特征序列,然后将所述输入处理序列输入到multi-head self-attention层,做残差连接处理和归一化处理,生成输出处理序列;Further, perform word segmentation on the sentence where the target feature word is located, and use the word segmentation result as the output feature sequence, and then input the input processing sequence to the multi-head self-attention layer for residual connection processing and normalization processing , Generate output processing sequence;
将输入处理序列和输出处理序列一并输入到multi-head context-attention(多头注意力机制)层,再做残差连接处理和归一化处理;Input the input processing sequence and output processing sequence to the multi-head context-attention (multi-head attention mechanism) layer, and then perform residual connection processing and normalization processing;
最后将输入position-wise feed-forward network,再做残差连接处理和归一化处理,通过线性变换处理后输出所述目标词组类型对应的问题题目文本。Finally, input the position-wise feed-forward network, then perform residual connection processing and normalization processing, and output the question title text corresponding to the target phrase type after linear transformation processing.
可理解的是,multi-head self attention层的机制能够用于进行自动特征交叉学习以提升CTR预测任务的精度,其CTR预测任务模型结构包括输入、嵌入、特征提取、以及输出;而引入多头注意力机制(Multi-head attention),能够使得seq2seq模型从不同向量所表征的空间上获取关于句子更多层面的信息,提高模型的特征表达能力;同时在现有的词向量和位置向量作为网络输入的基础上,进一步引入依存句法特征和相对核心谓词依赖特征,其中依存句法特征包括当前词的依存关系值和所依赖的父节点位置,从而使模型进一步准确地获取更多的文本句法信息。It is understandable that the mechanism of the multi-head self attention layer can be used to perform automatic feature cross learning to improve the accuracy of the CTR prediction task. Its CTR prediction task model structure includes input, embedding, feature extraction, and output; and the introduction of multi-head attention The force mechanism (Multi-head attention) enables the seq2seq model to obtain more information about the sentence from the space represented by different vectors, improving the feature expression ability of the model; at the same time, the existing word vector and position vector are used as network input On the basis of, further introduce the dependency syntax feature and the relative core predicate dependency feature, where the dependency syntax feature includes the dependency relationship value of the current word and the position of the dependent parent node, so that the model can further accurately obtain more text syntax information.
本实施例首先获取待处理的阅读理解源文本,对阅读理解源文本按照词组类型进行分词处理,使得阅读理解源文本具有多个不同词组类型的特征词组;从所述词组类型中确定目标词组类型,从预设存储区域中获取与目标词组类型对应的预设目标答案向量;从各特征词组中选取与目标词组类型对应的目标特征词组,生成与目标特征词组对应的目标词向量;获取目标特征词组在阅读理解源文本中的位置信息,生成与位置信息对应的位置向量;将与目标词组类型对应的目标词向量、位置向量、以及预设目标答案向量送入预设序列到序列模型中,生成与所述目标词组类型对应的问题题目文本,本实施例将位置信息向量与人工预先设置的答案本文结合,同时结合序列到序列模型能够自动生成更加贴合阅读理解源文本本意的题目,生成的题目对应的答案也更加具有唯一性。This embodiment first obtains the reading comprehension source text to be processed, and performs word segmentation processing on the reading comprehension source text according to the phrase type, so that the reading comprehension source text has multiple characteristic phrases of different phrase types; the target phrase type is determined from the phrase types , Obtain the preset target answer vector corresponding to the target phrase type from the preset storage area; select the target feature phrase corresponding to the target phrase type from each feature phrase to generate the target word vector corresponding to the target feature phrase; obtain the target feature The position information of the phrase in the reading and comprehension of the source text, and the position vector corresponding to the position information is generated; the target word vector, position vector, and preset target answer vector corresponding to the target phrase type are sent into the preset sequence to the sequence model, Generate question title text corresponding to the target phrase type. In this embodiment, the position information vector is combined with the manually preset answer text, and the sequence-to-sequence model can be combined to automatically generate a title that is more suitable for reading and understanding the original text of the source text. The answer corresponding to the question is also more unique.
进一步地,参照图3,图3本申请一种生成阅读理解的问题题目方法第二实施例的流程示意图;基于上述生成阅读理解的问题题目方法的第一实施例,提出本申请一种生成阅读理解的问题题目方法第二实施例。Further, referring to FIG. 3, FIG. 3 is a schematic flowchart of a second embodiment of a method for generating reading comprehension questions in this application; based on the first embodiment of the above method for generating reading comprehension question questions, a method for generating reading comprehension according to the present application is proposed The second embodiment of the method of understanding the problem title.
本实施例中,所述步骤S60之前,还包括:In this embodiment, before the step S60, the method further includes:
步骤S031:从所述预设存储区域中获取与目标词组类型对应的目标样本文本。Step S031: Obtain the target sample text corresponding to the target phrase type from the preset storage area.
可理解的是,本实施例会在数据库中(即预设存储区域)中预先存储有多个与不同的词组类型(例如人名、地名、机构名、时间、数量、日期等)相关的样本文本作为训练语料(即目标样本文本);并建立不同的训练语料与目标词组类型之间的映射关系;同时基于seq2seq模型对这些语料进行训练,进而生成问题生成模型,所述问题生成模型的生成方式具体为下述步骤S032到步骤S035:It is understandable that in this embodiment, a plurality of sample texts related to different phrase types (such as person names, place names, organization names, time, number, date, etc.) will be pre-stored in the database (ie, the preset storage area) as Training corpus (ie target sample text); and establish the mapping relationship between different training corpora and target phrase types; at the same time, train these corpora based on the seq2seq model to generate a question generation model, and the generation method of the question generation model is specific The following steps S032 to S035:
步骤S032:对所述目标样本文本进行分词,使得所述目标样本文本具有样本文本词组;Step S032: Perform word segmentation on the target sample text so that the target sample text has sample text phrases;
步骤S033:生成与所述样本文本词组对应的样本词向量;Step S033: Generate a sample word vector corresponding to the sample text phrase;
步骤S034:将与所述目标词组类型对应的预设目标答案向量和所述样本词向量进行相加,将相加结果作为所述目标样本文本的特征向量;Step S034: Add a preset target answer vector corresponding to the target phrase type and the sample word vector, and use the addition result as a feature vector of the target sample text;
步骤S035:将所述特征向量作为输入序列送入预设序列到序列seq2seq模型中进行训练,将训练结果作为问题生成模型。Step S035: Send the feature vector as the input sequence into the preset sequence to the sequence seq2seq model for training, and use the training result as the question generation model.
进一步地,所述步骤S50之后,还包括:Further, after the step S50, it further includes:
步骤S51:根据所述位置信息确定所述目标特征词组对应的目标句子文本;Step S51: Determine the target sentence text corresponding to the target feature phrase according to the location information;
步骤S52:对所述目标句子文本进行分词,使得所述目标句子文本具有多个不同词性的词性特征词;Step S52: Perform word segmentation on the target sentence text, so that the target sentence text has multiple parts of speech feature words with different parts of speech;
可理解的是,本实施例会对所述目标特征词所在的句子进行分词,分词结果为所述目标句子文本具有多个不同词性的词性特征词;It is understandable that this embodiment will segment the sentence in which the target feature word is located, and the segmentation result is that the target sentence text has multiple parts of speech feature words with different parts of speech;
步骤S53:分别将所述目标句子文本的各个词性特征词转换为词性特征词向量;Step S53: Convert each part-of-speech feature word of the target sentence text into a part-of-speech feature word vector;
步骤S54:并获取各个词性特征词在所述目标句子文本中出现的位置先后顺序;Step S54: and obtain the sequence of the positions of each part-of-speech feature word in the target sentence text;
可理解的是,这里的位置顺序为一篇文章的某句话中,词语从左到右的顺序。It is understandable that the order of positions here is the order of words from left to right in a sentence of an article.
相应地,所述步骤S60具体为“将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入所述问题生成模型中,生成与所述目标词组类型对应的问题题目文本”;Correspondingly, the step S60 is specifically "sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the question generation model, and generating and State the question title text corresponding to the target phrase type";
此外,所述步骤60又进一步包括:In addition, the step 60 further includes:
步骤S601:将所述目标词组类型对应的所述目标词向量、所述位置向量、所述预设目标答案向量作为所述问题生成模型的输入特征序列;Step S601: Use the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type as the input feature sequence of the question generation model;
可理解的是,本实施例用x表征上述输入特征序列,计算机系统首先将上述输入特征序列x输入到seq2seq模型的multi-head self-attention层,再做残差连接处理和归一化处理;然后将经过处理的输入特征序列输入到seq2seq模型的position-wise feed-forward network层中,再进行残差连接处理和归一化处理,生成输入处理序列;It is understandable that, in this embodiment, x is used to characterize the aforementioned input feature sequence, and the computer system first inputs the aforementioned input feature sequence x into the multi-head self-attention layer of the seq2seq model, and then performs residual connection processing and normalization processing; Then input the processed input feature sequence into the position-wise feed-forward network layer of the seq2seq model, and then perform residual connection processing and normalization processing to generate an input processing sequence;
步骤S602:按照所述位置先后顺序对各个词性特征词向量进行遍历,将遍历到的词性特征词向量作为所述问题生成模型的输出特征序列;Step S602: traverse each part-of-speech feature word vector according to the position sequence, and use the traversed part-of-speech feature word vector as the output feature sequence of the question generation model;
可理解的是,用y表示各个词性特征词向量,在上述步骤中已获取各个词性特征词在所述目标句子文本中出现的位置先后顺序t,那么计算机系统会对目标句子文本中的出现过的每一个词性特征词向量y进行遍历,将遍历到的第t个词性特征词向量记作y t,将y t作为所述问题生成模型的输出特征序列; It is understandable that y is used to represent each part-of-speech feature word vector. In the above steps, the position sequence t of each part-of-speech feature word in the target sentence text has been obtained, then the computer system will detect the occurrence of each part-of-speech feature word in the target sentence text. Traverse each part-of-speech feature word vector y of, and record the t-th part-of-speech feature word vector traversed as y t , and use y t as the output feature sequence of the problem generation model;
步骤S603:将所述输入特征序列以及所述输出特征序列送入所述问题生成模型中进行计算,直至遍历完毕,将计算结果作为目标向量数据;Step S603: Send the input feature sequence and the output feature sequence to the question generation model for calculation until the traversal is completed, and the calculation result is used as target vector data;
本实施例中,所述问题生成模型用以下公式进行表征:In this embodiment, the problem generation model is characterized by the following formula:
Figure PCTCN2020121523-appb-000001
Figure PCTCN2020121523-appb-000001
其中,x表征所述输入特征序列,y t表示位于所述目标句子文本中第t个词性特征词对应的词性特征词向量,n y表示所述目标句子文本中词性特征词的数量,P(y|x)表征所述目标向量数据; Where x represents the input feature sequence, y t represents the part-of-speech feature word vector corresponding to the t-th part-of-speech feature word in the target sentence text, n y represents the number of part-of-speech feature words in the target sentence text, P( y|x) characterize the target vector data;
上述公式可理解为:将每一个词性特征词向量y(最多有t=n y个词性特征词向量),与输入特征序列x送入所述问题生成模型,生新的向量数据,并将n y个新的向量数据相加,最终得到目标向量数据P(y|x)。 The above formula can be understood as: send each part-of-speech feature word vector y (at most t=n y part-of-speech feature word vectors) and the input feature sequence x into the problem generation model, generate new vector data, and add n The y new vector data are added together, and finally the target vector data P(y|x) is obtained.
步骤S604:将所述目标向量数据转换为与所述目标词组类型对应的问题题目文本。Step S604: Convert the target vector data into question title text corresponding to the target phrase type.
具体地,本实施例可通过NLTK工具将所述目标向量数据由向量转换为与文本格式,最终生成更加的贴合阅读理解文章的本意、更加有水平的的题目,生成的题目对应的答案也更加具有唯一性。Specifically, in this embodiment, the target vector data can be converted from a vector to a text format through the NLTK tool, and finally a more level question that fits the original meaning of the reading comprehension article is generated, and the answer corresponding to the generated question is also More unique.
进一步地,参照图4,图4本申请一种生成阅读理解的问题题目方法第三实施例的流程示意图;基于上述生成阅读理解的问题题目方法的第一实施例或第二实施例,提出本申请一种生成阅读理解的问题题目方法第三实施例。Further, referring to FIG. 4, FIG. 4 is a schematic flowchart of a third embodiment of a method for generating a question title for reading comprehension in this application; based on the first embodiment or the second embodiment of the method for generating a question title for reading comprehension, the present application is proposed Apply for a third embodiment of a method for generating reading comprehension questions.
本实施例中,所述步骤S20,具体包括:In this embodiment, the step S20 specifically includes:
步骤S201:对所述阅读理解源文本按照语义规则进行分段处理,得到多个段落文本;Step S201: Perform segmentation processing on the reading comprehension source text according to semantic rules to obtain multiple paragraph texts;
在具体实现中,本实施例可以使用NLTK工具将所述阅读理解源文本按照语义规则切 分成多个语义完整的段落,每段保证具备主语。In a specific implementation, this embodiment can use the NLTK tool to divide the reading comprehension source text into multiple semantically complete paragraphs according to semantic rules, and each paragraph is guaranteed to have a subject.
步骤S202:分别对各个段落文本按照词组类型进行分词处理,使得每个段落文本具有多个不同词组类型的特征词组;Step S202: perform word segmentation processing on each paragraph text according to the phrase type, so that each paragraph text has a plurality of characteristic phrases of different phrase types;
所述步骤S50,具体包括:The step S50 specifically includes:
步骤S500:获取所述目标特征词组在所述段落文本中的位置信息,生成与所述位置信息对应的位置向量。Step S500: Obtain the position information of the target characteristic phrase in the paragraph text, and generate a position vector corresponding to the position information.
本实施例将一篇阅读理解文本切分成若干个语义段落,每个段落描述的子主题都各不相同,都是各自独立的。将文本中描述相似内容的部分聚合在一起,使得语义段落内部具有最大的语义一致性。对文本的分析可从原先对篇章的研究,缩小到对语义段落的研究;这种分割的形式与对文章划分自然段类似,旨在从大量的文本中快速准确地获得所需要的信息。In this embodiment, a reading comprehension text is divided into several semantic paragraphs, and the subtopics described in each paragraph are different, and they are all independent. The parts describing similar content in the text are grouped together so that the semantic paragraph has the greatest semantic consistency. The analysis of the text can be reduced from the original study of the text to the study of the semantic paragraph; the form of this segmentation is similar to the division of natural paragraphs of the article, and it aims to quickly and accurately obtain the required information from a large amount of text.
进一步地,在一实施例中,在所述步骤S60之后,Further, in an embodiment, after the step S60,
步骤:获取与所述预设目标答案向量对应的预设目标答案;Step: Obtain a preset target answer corresponding to the preset target answer vector;
步骤:建立所述预设目标答案与所述问题题目文本之间的映射关系,将所述映射关系以及所述问题题目文本存储到所述预设存储区域中。Step: establishing a mapping relationship between the preset target answer and the question title text, and storing the mapping relationship and the question title text in the preset storage area.
可理解的是,本实施例将生成的问题题目文本、以及所述预设目标答案与所述问题题目文本之间的映射关系存储到数据库中,便于下次出题直接使用。It is understandable that, in this embodiment, the generated question title text and the mapping relationship between the preset target answer and the question title text are stored in the database, so as to facilitate direct use in the next question.
此外,参照图5,本申请还提出一种生成阅读理解的问题题目装置,所述装置包括:In addition, referring to FIG. 5, this application also proposes a device for generating reading comprehension questions, which includes:
获取模块10,用于获取待处理的阅读理解源文本;The obtaining module 10 is used to obtain the source text for reading comprehension to be processed;
分词模块20,用于对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组;The word segmentation module 20 is configured to perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;
确定模块30,用于从所述词组类型中确定目标词组类型,从预设存储区域中获取与所述目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量预设存在预设映射关系;The determining module 30 is configured to determine a target phrase type from the phrase type, obtain a preset target answer vector corresponding to the target phrase type from a preset storage area, the target phrase type and the preset target answer The vector preset has a preset mapping relationship;
选取模块40,用于从各特征词组中选取与所述目标词组类型对应的目标特征词组,生成与所述目标特征词组对应的目标词向量;The selecting module 40 is configured to select the target feature phrase corresponding to the target phrase type from each feature phrase, and generate a target word vector corresponding to the target feature phrase;
记录模块50,用于获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量;The recording module 50 is configured to obtain position information of the target characteristic phrase in the reading comprehension source text, and generate a position vector corresponding to the position information;
生成模块60,用于将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本。The generating module 60 is configured to send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the preset sequence-to-sequence seq2seq model, and generate a sequence corresponding to the target Question title text corresponding to the phrase type.
可理解的是,本实施的生成阅读理解的问题题目装置可以是一种计算机应用程序,该计算机应用程序装载在上述实施例的生成阅读理解的问题题目设备中,所述用于生成阅读理解的问题题目的设备可以是出题人使用的电脑主机。本申请生成阅读理解的问题题目装置的具体实现方式可参照上述生成阅读理解的问题题目方法实施例,此处不再赘述。It is understandable that the device for generating reading comprehension questionnaires in this embodiment may be a computer application program loaded in the device for generating reading comprehension questionnaires in the above embodiment, and the device for generating reading comprehension questions The equipment for the question question can be the host computer used by the questioner. For the specific implementation of the device for generating reading comprehension questions in the present application, please refer to the foregoing embodiment of the method for generating reading comprehension question questions, which will not be repeated here.
此外,本申请还提供一种计算机存储介质,所述计算机存储介质可以是易失性,也可以是非易失性,所述计算机存储介质上存储有生成阅读理解的问题题目程序,所述生成阅读理解的问题题目程序被处理器执行时实现如上所述的生成阅读理解的问题题目方法步骤。In addition, the present application also provides a computer storage medium, the computer storage medium may be volatile or non-volatile, the computer storage medium stores a program for generating reading comprehension questions, and the generating reading When the comprehensible question item program is executed by the processor, the steps of the method for generating the question question for reading comprehension as described above are realized.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or system. Without more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or system that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disks, optical disks), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种生成阅读理解的问题题目的方法,其中,所述方法包括:A method for generating reading comprehension questions, wherein the method includes:
    获取待处理的阅读理解源文本;Obtain the source text for reading comprehension to be processed;
    对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组;Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;
    从所述词组类型中确定目标词组类型,从预设存储区域中获取与所述目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量存在预设映射关系;The target phrase type is determined from the phrase types, and a preset target answer vector corresponding to the target phrase type is obtained from a preset storage area, and there is a preset mapping relationship between the target phrase type and the preset target answer vector ;
    从各特征词组中选取与所述目标词组类型对应的目标特征词组,生成与所述目标特征词组对应的目标词向量;Selecting a target feature phrase corresponding to the target phrase type from each feature phrase, and generating a target word vector corresponding to the target feature phrase;
    获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量;Acquiring position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information;
    将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本。Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence-to-sequence seq2seq model to generate a question title corresponding to the target phrase type text.
  2. 如权利要求1所述的方法,其中,所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本的步骤之前,还包括:The method of claim 1, wherein said sending said target word vector, said position vector, and said preset target answer vector corresponding to said target phrase type into a preset sequence to sequence seq2seq model , Before the step of generating the question title text corresponding to the target phrase type, it also includes:
    从所述预设存储区域中获取与目标词组类型对应的目标样本文本;Obtaining the target sample text corresponding to the target phrase type from the preset storage area;
    对所述目标样本文本进行分词,使得所述目标样本文本具有样本文本词组;Segmenting the target sample text so that the target sample text has sample text phrases;
    生成与所述样本文本词组对应的样本词向量;Generating a sample word vector corresponding to the sample text phrase;
    将与所述目标词组类型对应的预设目标答案向量和所述样本词向量进行相加,将相加结果作为所述目标样本文本的特征向量;Adding a preset target answer vector corresponding to the target phrase type and the sample word vector, and using the addition result as a feature vector of the target sample text;
    将所述特征向量作为输入序列送入预设序列到序列seq2seq模型中进行训练,将训练结果作为问题生成模型;Sending the feature vector as an input sequence into a preset sequence to the sequence seq2seq model for training, and using the training result as a problem generation model;
    所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本的步骤,具体包括:Said sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence to a sequence seq2seq model, and generating a sequence corresponding to the target phrase type The steps for the text of the question title include:
    将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入所述问题生成模型中,生成与所述目标词组类型对应的问题题目文本。The target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type are sent into the question generation model to generate a question title text corresponding to the target phrase type.
  3. 如权利要求2所述的方法,其中,所述获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量的步骤之后,还包括:3. The method according to claim 2, wherein after the step of obtaining the position information of the target characteristic phrase in the source text for reading comprehension, and generating a position vector corresponding to the position information, the method further comprises:
    根据所述位置信息确定所述目标特征词组对应的目标句子文本;Determine the target sentence text corresponding to the target feature phrase according to the location information;
    对所述目标句子文本进行分词,使得所述目标句子文本具有多个不同词性的词性特征词;Performing word segmentation on the target sentence text, so that the target sentence text has multiple parts of speech feature words with different parts of speech;
    分别将所述目标句子文本的各个词性特征词转换为词性特征词向量;Respectively converting each part-of-speech feature word of the target sentence text into a part-of-speech feature word vector;
    并获取各个词性特征词在所述目标句子文本中出现的位置先后顺序;And obtain the sequence of positions of each part-of-speech feature words in the target sentence text;
    所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入所述问题生成模型中,生成与所述目标词组类型对应的问题题目文本的步骤,具体包括:The sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the question generation model to generate a question question corresponding to the target phrase type The steps of the text include:
    将所述目标词组类型对应的所述目标词向量、所述位置向量、所述预设目标答案向量作为所述问题生成模型的输入特征序列;Using the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type as the input feature sequence of the question generation model;
    按照所述位置先后顺序对各个词性特征词向量进行遍历,将遍历到的词性特征词向量作为所述问题生成模型的输出特征序列;Traverse each part-of-speech feature word vector according to the position sequence, and use the traversed part-of-speech feature word vector as the output feature sequence of the question generation model;
    将所述输入特征序列以及所述输出特征序列送入所述问题生成模型中进行计算,直至 遍历完毕,将计算结果作为目标向量数据;Sending the input feature sequence and the output feature sequence to the problem generation model for calculation until the traversal is completed, and the calculation result is used as target vector data;
    将所述目标向量数据转换为与所述目标词组类型对应的问题题目文本。The target vector data is converted into question title text corresponding to the target phrase type.
  4. 如权利要求3所述的方法,其中,所述问题生成模型用以下公式进行表征:The method according to claim 3, wherein the problem generation model is characterized by the following formula:
    Figure PCTCN2020121523-appb-100001
    Figure PCTCN2020121523-appb-100001
    其中,x表征所述输入特征序列,y t表示位于所述目标句子文本中第t个词性特征词对应的词性特征词向量,n y表示所述目标句子文本中词性特征词的数量,P(y|x)表征所述目标向量数据。 Where x represents the input feature sequence, y t represents the part-of-speech feature word vector corresponding to the t-th part-of-speech feature word in the target sentence text, n y represents the number of part-of-speech feature words in the target sentence text, P( y|x) represents the target vector data.
  5. 如权利要求1-4任一项所述的方法,其中,所述对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组的步骤,包括:The method according to any one of claims 1 to 4, wherein the step of performing word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has a plurality of characteristic phrases of different phrase types ,include:
    对所述阅读理解源文本按照语义规则进行分段处理,得到多个段落文本;Perform segmentation processing on the reading comprehension source text according to semantic rules to obtain multiple paragraph texts;
    分别对各个段落文本按照词组类型进行分词处理,使得每个段落文本具有多个不同词组类型的特征词组;Separate each paragraph text according to the phrase type, so that each paragraph text has multiple characteristic phrases of different phrase types;
    所述获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量的步骤,具体包括:The step of acquiring the position information of the target characteristic phrase in the reading comprehension source text and generating a position vector corresponding to the position information specifically includes:
    获取所述目标特征词组在所述段落文本中的位置信息,生成与所述位置信息对应的位置向量。The position information of the target feature phrase in the paragraph text is acquired, and a position vector corresponding to the position information is generated.
  6. 如权利要求5所述的方法,其中,所述词组类型包括人物词组类型、时间词组类型、以及地点词组类型中的至少一项。The method of claim 5, wherein the phrase type includes at least one of a person phrase type, a time phrase type, and a location phrase type.
  7. 如权利要求1-4任一项所述的方法,其中,所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本的步骤之后,还包括:The method according to any one of claims 1-4, wherein said sending said target word vector, said position vector, and said preset target answer vector corresponding to said target phrase type into a preset In the sequence-to-sequence seq2seq model, after the step of generating the question title text corresponding to the target phrase type, it also includes:
    获取与所述预设目标答案向量对应的预设目标答案;Obtaining a preset target answer corresponding to the preset target answer vector;
    建立所述预设目标答案与所述问题题目文本之间的映射关系,将所述映射关系以及所述问题题目文本存储到所述预设存储区域中。A mapping relationship between the preset target answer and the question title text is established, and the mapping relationship and the question title text are stored in the preset storage area.
  8. 一种生成阅读理解的问题题目的装置,其中,所述装置包括:A device for generating question questions for reading comprehension, wherein the device includes:
    获取模块,用于获取待处理的阅读理解源文本;The acquisition module is used to acquire the source text for reading comprehension to be processed;
    分词模块,用于对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组;The word segmentation module is used to segment the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;
    确定模块,用于从所述词组类型中确定目标词组类型,从预设存储区域中获取与所述目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量预设存在预设映射关系;The determining module is configured to determine the target phrase type from the phrase type, and obtain a preset target answer vector corresponding to the target phrase type from a preset storage area, the target phrase type and the preset target answer vector There is a preset mapping relationship by default;
    选取模块,用于从各特征词组中选取与所述目标词组类型对应的目标特征词组,生成与所述目标特征词组对应的目标词向量;The selection module is used to select the target feature phrase corresponding to the target phrase type from each feature phrase, and generate a target word vector corresponding to the target feature phrase;
    记录模块,用于获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量;A recording module, configured to obtain position information of the target characteristic phrase in the reading comprehension source text, and generate a position vector corresponding to the position information;
    生成模块,用于将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本。A generating module, configured to send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the preset sequence-to-sequence seq2seq model to generate the target phrase Question title text corresponding to the type.
  9. 一种用于生成阅读理解的问题题目的设备,其中,所述设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的生成阅读理解的问题题目的程序,所述生成阅读理解的问题题目的程序配置为实现如下步骤:A device for generating reading comprehension question questions, wherein the device includes: a memory, a processor, and a program for generating reading comprehension question questions stored on the memory and running on the processor, The program for generating reading comprehension questions is configured to implement the following steps:
    获取待处理的阅读理解源文本;Obtain the source text for reading comprehension to be processed;
    对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组;Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;
    从所述词组类型中确定目标词组类型,从预设存储区域中获取与所述目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量存在预设映射关系;The target phrase type is determined from the phrase types, and a preset target answer vector corresponding to the target phrase type is obtained from a preset storage area, and there is a preset mapping relationship between the target phrase type and the preset target answer vector ;
    从各特征词组中选取与所述目标词组类型对应的目标特征词组,生成与所述目标特征词组对应的目标词向量;Selecting a target feature phrase corresponding to the target phrase type from each feature phrase, and generating a target word vector corresponding to the target feature phrase;
    获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量;Acquiring position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information;
    将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本。Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence-to-sequence seq2seq model to generate a question title corresponding to the target phrase type text.
  10. 如权利要求9所述的设备,其中,所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本的步骤之前,所述生成阅读理解的问题题目的程序被处理器执行时还实现如下步骤:The device according to claim 9, wherein said sending said target word vector, said position vector, and said preset target answer vector corresponding to said target phrase type into a preset sequence to sequence seq2seq model Before the step of generating the question title text corresponding to the target phrase type, when the program for generating the question title for reading comprehension is executed by the processor, the following steps are further implemented:
    从所述预设存储区域中获取与目标词组类型对应的目标样本文本;Obtaining the target sample text corresponding to the target phrase type from the preset storage area;
    对所述目标样本文本进行分词,使得所述目标样本文本具有样本文本词组;Segmenting the target sample text so that the target sample text has sample text phrases;
    生成与所述样本文本词组对应的样本词向量;Generating a sample word vector corresponding to the sample text phrase;
    将与所述目标词组类型对应的预设目标答案向量和所述样本词向量进行相加,将相加结果作为所述目标样本文本的特征向量;Adding a preset target answer vector corresponding to the target phrase type and the sample word vector, and using the addition result as a feature vector of the target sample text;
    将所述特征向量作为输入序列送入预设序列到序列seq2seq模型中进行训练,将训练结果作为问题生成模型;Sending the feature vector as an input sequence into a preset sequence to the sequence seq2seq model for training, and using the training result as a problem generation model;
    所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本的步骤,具体包括:Said sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence to a sequence seq2seq model, and generating a sequence corresponding to the target phrase type The steps for the text of the question title include:
    将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入所述问题生成模型中,生成与所述目标词组类型对应的问题题目文本。The target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type are sent into the question generation model to generate a question title text corresponding to the target phrase type.
  11. 如权利要求10所述的设备,其中,所述获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量的步骤之后,所述生成阅读理解的问题题目的程序被处理器执行时还实现如下步骤:The device according to claim 10, wherein after the step of acquiring the position information of the target characteristic phrase in the source text of reading comprehension, and generating a position vector corresponding to the position information, the generating reading comprehension When the program of the problem problem is executed by the processor, the following steps are also implemented:
    根据所述位置信息确定所述目标特征词组对应的目标句子文本;Determine the target sentence text corresponding to the target feature phrase according to the location information;
    对所述目标句子文本进行分词,使得所述目标句子文本具有多个不同词性的词性特征词;Performing word segmentation on the target sentence text, so that the target sentence text has multiple parts of speech feature words with different parts of speech;
    分别将所述目标句子文本的各个词性特征词转换为词性特征词向量;Respectively converting each part-of-speech feature word of the target sentence text into a part-of-speech feature word vector;
    并获取各个词性特征词在所述目标句子文本中出现的位置先后顺序;And obtain the sequence of positions of each part-of-speech feature words in the target sentence text;
    所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入所述问题生成模型中,生成与所述目标词组类型对应的问题题目文本的步骤,具体包括:The sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the question generation model to generate a question question corresponding to the target phrase type The steps of the text include:
    将所述目标词组类型对应的所述目标词向量、所述位置向量、所述预设目标答案向量作为所述问题生成模型的输入特征序列;Using the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type as the input feature sequence of the question generation model;
    按照所述位置先后顺序对各个词性特征词向量进行遍历,将遍历到的词性特征词向量作为所述问题生成模型的输出特征序列;Traverse each part-of-speech feature word vector according to the position sequence, and use the traversed part-of-speech feature word vector as the output feature sequence of the question generation model;
    将所述输入特征序列以及所述输出特征序列送入所述问题生成模型中进行计算,直至遍历完毕,将计算结果作为目标向量数据;Sending the input feature sequence and the output feature sequence to the problem generation model for calculation until the traversal is completed, and the calculation result is used as target vector data;
    将所述目标向量数据转换为与所述目标词组类型对应的问题题目文本。The target vector data is converted into question title text corresponding to the target phrase type.
  12. 如权利要求11所述的设备,其中,所述问题生成模型用以下公式进行表征:The device according to claim 11, wherein the problem generation model is characterized by the following formula:
    Figure PCTCN2020121523-appb-100002
    Figure PCTCN2020121523-appb-100002
    其中,x表征所述输入特征序列,y t表示位于所述目标句子文本中第t个词性特征词对应的词性特征词向量,n y表示所述目标句子文本中词性特征词的数量,P(y|x)表征所述目标向量数据。 Where x represents the input feature sequence, y t represents the part-of-speech feature word vector corresponding to the t-th part-of-speech feature word in the target sentence text, n y represents the number of part-of-speech feature words in the target sentence text, P( y|x) represents the target vector data.
  13. 如权利要求9-12任一项所述的设备,其中,所述对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组的步骤,包括:The device according to any one of claims 9-12, wherein the step of performing word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types ,include:
    对所述阅读理解源文本按照语义规则进行分段处理,得到多个段落文本;Perform segmentation processing on the reading comprehension source text according to semantic rules to obtain multiple paragraph texts;
    分别对各个段落文本按照词组类型进行分词处理,使得每个段落文本具有多个不同词组类型的特征词组;Separate each paragraph text according to the phrase type, so that each paragraph text has multiple characteristic phrases of different phrase types;
    所述获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量的步骤,具体包括:The step of acquiring the position information of the target characteristic phrase in the reading comprehension source text and generating a position vector corresponding to the position information specifically includes:
    获取所述目标特征词组在所述段落文本中的位置信息,生成与所述位置信息对应的位置向量。The position information of the target feature phrase in the paragraph text is acquired, and a position vector corresponding to the position information is generated.
  14. 如权利要求13所述的设备,其中,所述词组类型包括人物词组类型、时间词组类型、以及地点词组类型中的至少一项。The device of claim 13, wherein the phrase type includes at least one of a person phrase type, a time phrase type, and a location phrase type.
  15. 如权利要求9-12任一项所述的设备,其中,所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本的步骤之后,所述生成阅读理解的问题题目的程序被处理器执行时还实现如下步骤:The device according to any one of claims 9-12, wherein said sending said target word vector, said position vector, and said preset target answer vector corresponding to said target phrase type into a preset In the sequence-to-sequence seq2seq model, after the step of generating the question title text corresponding to the target phrase type, when the program for generating the question title for reading comprehension is executed by the processor, the following steps are also implemented:
    获取与所述预设目标答案向量对应的预设目标答案;Obtaining a preset target answer corresponding to the preset target answer vector;
    建立所述预设目标答案与所述问题题目文本之间的映射关系,将所述映射关系以及所述问题题目文本存储到所述预设存储区域中。A mapping relationship between the preset target answer and the question title text is established, and the mapping relationship and the question title text are stored in the preset storage area.
  16. 一种存储介质,其中,所述存储介质为计算机可读存储介质;所述计算机可读存储介质存储有生成阅读理解的问题题目的程序,所述生成阅读理解的问题题目的程序配置为实现如下步骤:A storage medium, wherein the storage medium is a computer-readable storage medium; the computer-readable storage medium stores a program for generating reading comprehension question questions, and the program for generating reading comprehension question questions is configured to achieve the following step:
    获取待处理的阅读理解源文本;Obtain the source text for reading comprehension to be processed;
    对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组;Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;
    从所述词组类型中确定目标词组类型,从预设存储区域中获取与所述目标词组类型对应的预设目标答案向量,所述目标词组类型与所述预设目标答案向量存在预设映射关系;The target phrase type is determined from the phrase types, and a preset target answer vector corresponding to the target phrase type is obtained from a preset storage area, and there is a preset mapping relationship between the target phrase type and the preset target answer vector ;
    从各特征词组中选取与所述目标词组类型对应的目标特征词组,生成与所述目标特征词组对应的目标词向量;Selecting a target feature phrase corresponding to the target phrase type from each feature phrase, and generating a target word vector corresponding to the target feature phrase;
    获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量;Acquiring position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information;
    将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本。Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence-to-sequence seq2seq model to generate a question title corresponding to the target phrase type text.
  17. 如权利要求16所述的存储介质,其中,所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本的步骤之前,所述生成阅读理解的问题题目的程序被处理器执行时还实现如下步骤:The storage medium according to claim 16, wherein said sending said target word vector, said position vector, and said preset target answer vector corresponding to said target phrase type into a preset sequence to sequence seq2seq In the model, before the step of generating the question title text corresponding to the target phrase type, when the program for generating the question title for reading comprehension is executed by the processor, the following steps are also implemented:
    从所述预设存储区域中获取与目标词组类型对应的目标样本文本;Obtaining the target sample text corresponding to the target phrase type from the preset storage area;
    对所述目标样本文本进行分词,使得所述目标样本文本具有样本文本词组;Segmenting the target sample text so that the target sample text has sample text phrases;
    生成与所述样本文本词组对应的样本词向量;Generating a sample word vector corresponding to the sample text phrase;
    将与所述目标词组类型对应的预设目标答案向量和所述样本词向量进行相加,将相加结果作为所述目标样本文本的特征向量;Adding a preset target answer vector corresponding to the target phrase type and the sample word vector, and using the addition result as a feature vector of the target sample text;
    将所述特征向量作为输入序列送入预设序列到序列seq2seq模型中进行训练,将训练结果作为问题生成模型;Sending the feature vector as an input sequence into a preset sequence to the sequence seq2seq model for training, and using the training result as a problem generation model;
    所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入预设序列到序列seq2seq模型中,生成与所述目标词组类型对应的问题题目文本的步骤,具体包括:Said sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence to a sequence seq2seq model, and generating a sequence corresponding to the target phrase type The steps for the text of the question title include:
    将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入所述问题生成模型中,生成与所述目标词组类型对应的问题题目文本。The target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type are sent into the question generation model to generate a question title text corresponding to the target phrase type.
  18. 如权利要求17所述的存储介质,其中,所述获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量的步骤之后,所述生成阅读理解的问题题目的程序被处理器执行时还实现如下步骤:The storage medium according to claim 17, wherein, after the step of obtaining the position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information, the generating reading When the program of the understood problem topic is executed by the processor, the following steps are also implemented:
    根据所述位置信息确定所述目标特征词组对应的目标句子文本;Determine the target sentence text corresponding to the target feature phrase according to the location information;
    对所述目标句子文本进行分词,使得所述目标句子文本具有多个不同词性的词性特征词;Performing word segmentation on the target sentence text, so that the target sentence text has multiple parts of speech feature words with different parts of speech;
    分别将所述目标句子文本的各个词性特征词转换为词性特征词向量;Respectively converting each part-of-speech feature word of the target sentence text into a part-of-speech feature word vector;
    并获取各个词性特征词在所述目标句子文本中出现的位置先后顺序;And obtain the sequence of positions of each part-of-speech feature words in the target sentence text;
    所述将与所述目标词组类型对应的所述目标词向量、所述位置向量、以及所述预设目标答案向量送入所述问题生成模型中,生成与所述目标词组类型对应的问题题目文本的步骤,具体包括:The sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the question generation model to generate a question question corresponding to the target phrase type The steps of the text include:
    将所述目标词组类型对应的所述目标词向量、所述位置向量、所述预设目标答案向量作为所述问题生成模型的输入特征序列;Using the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type as the input feature sequence of the question generation model;
    按照所述位置先后顺序对各个词性特征词向量进行遍历,将遍历到的词性特征词向量作为所述问题生成模型的输出特征序列;Traverse each part-of-speech feature word vector according to the position sequence, and use the traversed part-of-speech feature word vector as the output feature sequence of the question generation model;
    将所述输入特征序列以及所述输出特征序列送入所述问题生成模型中进行计算,直至遍历完毕,将计算结果作为目标向量数据;Sending the input feature sequence and the output feature sequence to the problem generation model for calculation until the traversal is completed, and the calculation result is used as target vector data;
    将所述目标向量数据转换为与所述目标词组类型对应的问题题目文本。The target vector data is converted into question title text corresponding to the target phrase type.
  19. 如权利要求18所述的存储介质,其中,所述问题生成模型用以下公式进行表征:The storage medium of claim 18, wherein the problem generation model is characterized by the following formula:
    Figure PCTCN2020121523-appb-100003
    Figure PCTCN2020121523-appb-100003
    其中,x表征所述输入特征序列,y t表示位于所述目标句子文本中第t个词性特征词对应的词性特征词向量,n y表示所述目标句子文本中词性特征词的数量,P(y|x)表征所述目标向量数据。 Where x represents the input feature sequence, y t represents the part-of-speech feature word vector corresponding to the t-th part-of-speech feature word in the target sentence text, n y represents the number of part-of-speech feature words in the target sentence text, P( y|x) represents the target vector data.
  20. 如权利要求16-19任一项所述的存储介质,其中,所述对所述阅读理解源文本按照词组类型进行分词处理,使得所述阅读理解源文本具有多个不同词组类型的特征词组的步骤,包括:The storage medium according to any one of claims 16-19, wherein the reading comprehension source text is segmented according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types The steps include:
    对所述阅读理解源文本按照语义规则进行分段处理,得到多个段落文本;Perform segmentation processing on the reading comprehension source text according to semantic rules to obtain multiple paragraph texts;
    分别对各个段落文本按照词组类型进行分词处理,使得每个段落文本具有多个不同词组类型的特征词组;Separate each paragraph text according to the phrase type, so that each paragraph text has multiple characteristic phrases of different phrase types;
    所述获取所述目标特征词组在所述阅读理解源文本中的位置信息,生成与所述位置信息对应的位置向量的步骤,具体包括:The step of acquiring the position information of the target characteristic phrase in the reading comprehension source text and generating a position vector corresponding to the position information specifically includes:
    获取所述目标特征词组在所述段落文本中的位置信息,生成与所述位置信息对应的位置向量。The position information of the target feature phrase in the paragraph text is acquired, and a position vector corresponding to the position information is generated.
PCT/CN2020/121523 2020-02-19 2020-10-16 Method, apparatus and device for generating reading comprehension question, and storage medium WO2021164284A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010103758.3A CN111428467B (en) 2020-02-19 2020-02-19 Method, device, equipment and storage medium for generating problem questions for reading and understanding
CN202010103758.3 2020-02-19

Publications (1)

Publication Number Publication Date
WO2021164284A1 true WO2021164284A1 (en) 2021-08-26

Family

ID=71551596

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/121523 WO2021164284A1 (en) 2020-02-19 2020-10-16 Method, apparatus and device for generating reading comprehension question, and storage medium

Country Status (2)

Country Link
CN (1) CN111428467B (en)
WO (1) WO2021164284A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713065A (en) * 2022-11-08 2023-02-24 贝壳找房(北京)科技有限公司 Method for generating question, electronic equipment and computer readable storage medium

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428467B (en) * 2020-02-19 2024-05-07 平安科技(深圳)有限公司 Method, device, equipment and storage medium for generating problem questions for reading and understanding
CN112487139B (en) * 2020-11-27 2023-07-14 平安科技(深圳)有限公司 Text-based automatic question setting method and device and computer equipment
CN112489652A (en) * 2020-12-10 2021-03-12 北京有竹居网络技术有限公司 Text acquisition method and device for voice information and storage medium
CN113065332B (en) * 2021-04-22 2023-05-12 深圳壹账通智能科技有限公司 Text processing method, device, equipment and storage medium based on reading model
CN113220854B (en) * 2021-05-24 2023-11-07 中国平安人寿保险股份有限公司 Intelligent dialogue method and device for machine reading and understanding
CN113255351B (en) * 2021-06-22 2023-02-03 中国平安财产保险股份有限公司 Sentence intention recognition method and device, computer equipment and storage medium
CN113657089A (en) * 2021-08-20 2021-11-16 西安电子科技大学 English reading understanding auxiliary question setting method and system
CN113627137A (en) * 2021-10-11 2021-11-09 江西软云科技股份有限公司 Question generation method, question generation system, storage medium and equipment
CN115600587B (en) * 2022-12-16 2023-04-07 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Mathematics application question generation system and method, intelligent terminal and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107329995A (en) * 2017-06-08 2017-11-07 北京神州泰岳软件股份有限公司 A kind of controlled answer generation method of semanteme, apparatus and system
CN107463699A (en) * 2017-08-15 2017-12-12 济南浪潮高新科技投资发展有限公司 A kind of method for realizing question and answer robot based on seq2seq models
CN108363743A (en) * 2018-01-24 2018-08-03 清华大学深圳研究生院 A kind of intelligence questions generation method, device and computer readable storage medium
US20190384810A1 (en) * 2018-06-15 2019-12-19 Beijing Baidu Netcom Science And Technology Co., Ltd. Method of training a descriptive text generating model, and method and apparatus for generating descriptive text
CN111428467A (en) * 2020-02-19 2020-07-17 平安科技(深圳)有限公司 Method, device, equipment and storage medium for generating reading comprehension question topic

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108717406B (en) * 2018-05-10 2021-08-24 平安科技(深圳)有限公司 Text emotion analysis method and device and storage medium
CN109086303B (en) * 2018-06-21 2021-09-28 深圳壹账通智能科技有限公司 Intelligent conversation method, device and terminal based on machine reading understanding
CN110210021B (en) * 2019-05-22 2021-05-28 北京百度网讯科技有限公司 Reading understanding method and device
CN111414464B (en) * 2019-05-27 2023-04-07 腾讯科技(深圳)有限公司 Question generation method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107329995A (en) * 2017-06-08 2017-11-07 北京神州泰岳软件股份有限公司 A kind of controlled answer generation method of semanteme, apparatus and system
CN107463699A (en) * 2017-08-15 2017-12-12 济南浪潮高新科技投资发展有限公司 A kind of method for realizing question and answer robot based on seq2seq models
CN108363743A (en) * 2018-01-24 2018-08-03 清华大学深圳研究生院 A kind of intelligence questions generation method, device and computer readable storage medium
US20190384810A1 (en) * 2018-06-15 2019-12-19 Beijing Baidu Netcom Science And Technology Co., Ltd. Method of training a descriptive text generating model, and method and apparatus for generating descriptive text
CN111428467A (en) * 2020-02-19 2020-07-17 平安科技(深圳)有限公司 Method, device, equipment and storage medium for generating reading comprehension question topic

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713065A (en) * 2022-11-08 2023-02-24 贝壳找房(北京)科技有限公司 Method for generating question, electronic equipment and computer readable storage medium
CN115713065B (en) * 2022-11-08 2023-09-15 贝壳找房(北京)科技有限公司 Method for generating problem, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN111428467B (en) 2024-05-07
CN111428467A (en) 2020-07-17

Similar Documents

Publication Publication Date Title
WO2021164284A1 (en) Method, apparatus and device for generating reading comprehension question, and storage medium
KR102401942B1 (en) Method and apparatus for evaluating translation quality
US9104287B2 (en) System and method for data collection interface creation and data collection administration
US8510412B2 (en) Web-based speech recognition with scripting and semantic objects
CN110795552A (en) Training sample generation method and device, electronic equipment and storage medium
JPH02302876A (en) Conversational language analyzer
US11461613B2 (en) Method and apparatus for multi-document question answering
US11907665B2 (en) Method and system for processing user inputs using natural language processing
CN111177351A (en) Method, device and system for acquiring natural language expression intention based on rule
WO2021169485A1 (en) Dialogue generation method and apparatus, and computer device
JP5230927B2 (en) Problem automatic creation apparatus, problem automatic creation method, and computer program
US7962324B2 (en) Method for globalizing support operations
CN108932225A (en) For natural language demand to be converted into the method and system of semantic modeling language statement
CN110674633A (en) Document review proofreading method and device, storage medium and electronic equipment
CN114428788A (en) Natural language processing method, device, equipment and storage medium
CN112131378A (en) Method and device for identifying categories of civil problems and electronic equipment
CN108932326B (en) Instance extension method, device, equipment and medium
EP4303716A1 (en) Method for generating data input, data input system and computer program
US20230342553A1 (en) Attribute and rating co-extraction
Ali Mousa et al. Developing a web application for collecting conversations in lab rooms
CN115599898A (en) Intelligent question and answer implementation method and device, electronic equipment and storage medium
CN116089601A (en) Dialogue abstract generation method, device, equipment and medium
CN113744737A (en) Training of speech recognition model, man-machine interaction method, equipment and storage medium
CN116991965A (en) Method and device for generating online test paper, computer readable medium and electronic equipment
CN115881108A (en) Voice recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20919713

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20919713

Country of ref document: EP

Kind code of ref document: A1