WO2021164284A1

WO2021164284A1 - Method, apparatus and device for generating reading comprehension question, and storage medium

Info

Publication number: WO2021164284A1
Application number: PCT/CN2020/121523
Authority: WO
Inventors: 王燕蒙; 许开河; 王烨; 王少军
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-02-19
Filing date: 2020-10-16
Publication date: 2021-08-26
Also published as: CN111428467A; CN111428467B

Abstract

A method, apparatus and device for generating a reading comprehension question, and a storage medium. The method comprises: first, obtaining a reading comprehension source text to be processed (S10), and performing word segmentation on said reading comprehension source text according to the phrase type, so that said reading comprehension source text has characteristic phrases of multiple different phrase types (S20); determining a target phrase type from the phrase type, and obtaining a preset target answer vector corresponding to the target phrase type from a preset storage area (S30); selecting a target characteristic phrase corresponding to the target phrase type from the characteristic phrases, and generating a target word vector corresponding to the target characteristic phrase (S40); obtaining position information of the target characteristic phrase in said reading comprehension source text, and generating a position vector corresponding to the position information (S50); and sending the target word vector corresponding to the target phrase type, the position vector, the preset target answer vector into a preset sequence-to-sequence model, and finally, automatically generating a question text that more fits the original meaning of said reading comprehension text (S60).

Description

Method, device, equipment and storage medium for generating reading comprehension questions

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 19, 2020, with the application number CN202010103758.3, titled "Method, Apparatus, Equipment, and Storage Medium for Generating Reading Comprehension Questions", which The entire content is incorporated into this application by reference.

Technical field

This application relates to the field of big data analysis, and in particular to a method, device, equipment and storage medium for generating reading comprehension questions.

Background technique

From school education to vocational training, whether it is language learning or the learning of specific subjects and technologies, it is inseparable from the ability to read and understand texts. To improve reading ability, students need to read a lot and answer questions based on relevant content to improve the comprehension of the article. More importantly, the teacher needs a reliable method to check whether the student has read the chapter specified by the teacher and master the student’s learning progress. , And adjust the study plan according to the effect. The traditional method is to create questions manually to see if the students can answer the relevant questions correctly. With the emergence of new textbooks and articles, manual questioning is time-consuming and labor-intensive, and the inspection process cannot be automated.

At present, more and more neural networks have been successfully applied to question answering systems and other reading comprehension tasks, and have even surpassed humans in some aspects, but they need a lot of data to support them while reaching a better level. And if these data are all manually labeled, it would be too manpower required. Therefore, the text generation problem technology came into being. What the problem generation technology needs to solve is to generate the corresponding problem through a paragraph of text. It can be used for data enhancement, dialogue system, and it is very helpful for reading comprehension. It is aimed at generating a paragraph of text. Questions for data enhancement, dialogue systems, and reading comprehension.

The inventor realizes that in the prior art, the problem of text generation based on article reading comprehension is usually based on the use of seed words to use templates to expand and check. This generation method is prone to the phenomenon of not combining the original text meaning of the text. In this way There may be situations where multiple answers can be found from the article in the generated questions, that is, the text sentence generated by this method is too simplistic, the generated questions are too simple, and cannot effectively replace manual questions, and the effect is not ideal.

Summary of the invention

This application provides a method for generating reading comprehension questions. The method includes the following steps:

Obtain the source text for reading comprehension to be processed;

Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;

The target phrase type is determined from the phrase types, and a preset target answer vector corresponding to the target phrase type is obtained from a preset storage area, and there is a preset mapping relationship between the target phrase type and the preset target answer vector ；

Selecting a target feature phrase corresponding to the target phrase type from each feature phrase, and generating a target word vector corresponding to the target feature phrase;

Acquiring position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information;

Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence-to-sequence seq2seq model to generate a question title corresponding to the target phrase type text.

This application also proposes a device for generating reading comprehension questions, and the device includes:

The acquisition module is used to acquire the source text for reading comprehension to be processed;

The word segmentation module is used to segment the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;

The determining module is configured to determine the target phrase type from the phrase type, and obtain a preset target answer vector corresponding to the target phrase type from a preset storage area, the target phrase type and the preset target answer vector There is a preset mapping relationship by default;

The selection module is used to select the target feature phrase corresponding to the target phrase type from each feature phrase, and generate a target word vector corresponding to the target feature phrase;

A recording module, configured to obtain position information of the target characteristic phrase in the reading comprehension source text, and generate a position vector corresponding to the position information;

A generating module, configured to send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the preset sequence-to-sequence seq2seq model to generate the target phrase Question title text corresponding to the type.

The present application also proposes a device for generating reading comprehension questions, the device comprising: a memory, a processor, and a device for generating reading comprehension questions stored on the memory and running on the processor A program, the program for generating reading comprehension questions is configured to implement the following steps:

Obtain the source text for reading comprehension to be processed;

This application also proposes a storage medium, which is a computer-readable storage medium; the computer-readable storage medium stores a program for generating reading comprehension question questions, and the program for generating reading comprehension question questions is configured as To achieve the following steps:

Obtain the source text for reading comprehension to be processed;

Description of the drawings

FIG. 1 is a schematic structural diagram of a device for generating reading comprehension questions in a hardware operating environment involved in a solution of an embodiment of the application;

2 is a schematic flowchart of an embodiment of a method for generating reading comprehension questions according to this application;

3 is a schematic flowchart of a second embodiment of a method for generating reading comprehension questions according to this application;

4 is a schematic flowchart of a third embodiment of a method for generating reading comprehension questions according to this application;

Fig. 5 is a structural block diagram of a device for generating reading comprehension questions in this application.

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

Referring to FIG. 1, FIG. 1 is a schematic structural diagram of a device for generating reading comprehension questions in the hardware operating environment involved in the solution of the embodiment of the application.

As shown in FIG. 1, the device may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.

Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute a limitation on the device, and may include more or less components than shown in the figure, or combine certain components, or arrange different components. The device for generating reading comprehension questions may be a desktop computer host.

As shown in FIG. 1, the memory 1005 as a computer storage medium may include a computer operating system, a network communication module, a user receiving module, and a program for generating reading comprehension questions.

In the device shown in FIG. 1, the device for generating reading comprehension question questions of the present application calls the question question program for generating reading comprehension stored in the memory 1005 through the processor 1001, and executes the steps of the method for generating reading comprehension question questions.

Referring to FIG. 2, FIG. 2 is a schematic flowchart of a first embodiment of a method for generating reading comprehension questions according to the present application.

In this embodiment, the method for generating reading comprehension questions includes the following steps:

Step S10: Obtain the source text for reading comprehension to be processed;

It should be noted that the execution subject of this embodiment is the above-mentioned device (referred to as a computer system in this embodiment) for generating reading comprehension question questions, and the device is loaded with a reading comprehension question question program. In the implementation scenario of this embodiment, the teacher wants to generate several reading comprehension questions for a certain English article as an example. The reading comprehension source text is the English article.

Step S20: Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;

It should be noted that the phrase type in this embodiment includes at least one of a person phrase type, a time phrase type, and a location phrase type;

It is understandable that the person phrase type can correspond to the person answer word, the time phrase type can correspond to the date answer word, and the location phrase type can correspond to the location answer word; in addition, the phrase type also includes some non-answer phrase types and institutional answer phrase types , Number answer word type, etc.

In specific implementation, a proprietary word segmentation tool will be used for the reading comprehension source text, and the reading comprehension source text will be segmented according to the phrase type, and the word segmentation result will include the name of the labeled person appearing in the reading comprehension source text , Place name, organization name, time, quantity, date and other proper nouns.

Specifically, the proprietary word segmentation tool used in this embodiment may be the NLTK tool (Natural Language Toolkit, natural language processing toolkit tool). The NLTK tool is a natural language toolkit implemented based on the python language. The data set and model provide a comprehensive and easy-to-use interface, covering word segmentation, part-of-speech tag (POS-tag), named entity recognition (NER), and syntactic analysis (Syntactic Parse) And other functions in the NLP field. Use the NLTK tool to segment the reading comprehension source text according to the type of phrase, identify the proper nouns such as person, place, organization, time, number, and date that appear in the reading comprehension source text, and perform analysis on these proper nouns. Label.

Step S30: Determine the target phrase type from the phrase type, obtain a preset target answer vector corresponding to the target phrase type from a preset storage area, and the target phrase type and the preset target answer vector have a pre-existing relationship. Set up the mapping relationship;

In specific implementation, each type of phrase corresponds to some standard answers. For example, the time (temporal phrase type), location (location phrase type), and person (person phrase type) that appeared in the reading comprehension source text all correspond to some standards. Answer texts. These standard answer texts belong to the texts prepared by the questioner in advance. These texts will be stored in a preset storage area. The preset storage area may be a database, which may be loaded in the question for generating reading comprehension. The device of the subject.

It should be noted that the standard answers corresponding to these different phrase types in this embodiment will be pre-stored in the database in the form of vectors that can match the seq2seq model. There is a preset mapping relationship between the target phrase type and the preset target answer vector.

Specifically, in this embodiment, each question type can correspond to one type of phrase, and one type of phrase can correspond to four standard answer texts, and the four standard answer texts need to establish a preset mapping relationship with the phrase type;

Correspondingly, in this embodiment, each standard answer text is converted into a text vector through the NLTK tool in advance to obtain an answer type embedding. In this way, based on the preset mapping relationship between the answer text and the phrase type, the phrase The type and the preset target answer vector also have the preset mapping relationship.

It is understandable that since the teacher has to create several question types for reading and comprehension of the source text, the computer system will traverse the various phrase types in the reading and comprehension source text, and use the traversed phrase type as the target phrase type. Acquiring a preset target answer vector corresponding to the target phrase type in the storage area, and a preset mapping relationship between the target phrase type and the preset target answer vector is preset;

Step S40: Select a target feature phrase corresponding to the target phrase type from each feature phrase, and generate a target word vector corresponding to the target feature phrase;

It is understandable that after the word segmentation process, the computer system will select the target characteristic phrase corresponding to the target phrase type from multiple characteristic phrases in the reading comprehension source text, and then use the NLTK tool to convert the target characteristic phrase Converted into a vector form, that is, a target word vector (word embedding) corresponding to the target feature phrase is generated.

Step S50: Obtain the position information of the target characteristic phrase in the reading comprehension source text, and generate a position vector corresponding to the position information;

It is understandable that the computer system will determine the position where the target feature phrase appears in the reading comprehension source text, convert the position information into a vector form, that is, generate a positional embedding corresponding to the position information, This embodiment introduces a position information vector, so that the generated reading comprehension problem can be more integrated with the original intention of the original text.

Step S60: Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence to sequence seq2seq model to generate a sequence corresponding to the target phrase type The text of the question title.

It is understandable that the sequence-to-sequence Seq2Seq model is a model used when the output length is uncertain, and its model structure is an encoding encoder-decoding decoder model. The so-called encoding is to convert the input sequence into a fixed-length vector; decoding is to convert the previously generated fixed vector into an output sequence.

In specific implementation, this embodiment sends the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the preset sequence to the sequence seq2seq model, and the encoding encoder is responsible for Compress the input sequence into a vector of a specified length. This vector can be regarded as the semantics of the sequence. This process is called encoding. Decoding the decoder is to convert the previously generated fixed vector into an output sequence. The decoding stage can be regarded as the inverse process of encoding: first, the target word vector, position vector, and answer vector are used as the input feature sequence, and these vectors are regarded as For the semantics of this input sequence, the computer system predicts possible texts based on these given semantic vectors, and outputs these predicted texts as output sequences.

Specifically, the computer system first inputs the above-mentioned input feature sequence to the multi-head self-attention layer of the seq2seq model, and then performs residual connection processing and normalization processing (Layer normalization); then, the processed input The feature sequence is input into the position-wise feed-forward network layer of the seq2seq model, and then the residual connection processing and normalization processing are performed to generate the input processing sequence;

Further, perform word segmentation on the sentence where the target feature word is located, and use the word segmentation result as the output feature sequence, and then input the input processing sequence to the multi-head self-attention layer for residual connection processing and normalization processing , Generate output processing sequence;

Input the input processing sequence and output processing sequence to the multi-head context-attention (multi-head attention mechanism) layer, and then perform residual connection processing and normalization processing;

Finally, input the position-wise feed-forward network, then perform residual connection processing and normalization processing, and output the question title text corresponding to the target phrase type after linear transformation processing.

It is understandable that the mechanism of the multi-head self attention layer can be used to perform automatic feature cross learning to improve the accuracy of the CTR prediction task. Its CTR prediction task model structure includes input, embedding, feature extraction, and output; and the introduction of multi-head attention The force mechanism (Multi-head attention) enables the seq2seq model to obtain more information about the sentence from the space represented by different vectors, improving the feature expression ability of the model; at the same time, the existing word vector and position vector are used as network input On the basis of, further introduce the dependency syntax feature and the relative core predicate dependency feature, where the dependency syntax feature includes the dependency relationship value of the current word and the position of the dependent parent node, so that the model can further accurately obtain more text syntax information.

This embodiment first obtains the reading comprehension source text to be processed, and performs word segmentation processing on the reading comprehension source text according to the phrase type, so that the reading comprehension source text has multiple characteristic phrases of different phrase types; the target phrase type is determined from the phrase types , Obtain the preset target answer vector corresponding to the target phrase type from the preset storage area; select the target feature phrase corresponding to the target phrase type from each feature phrase to generate the target word vector corresponding to the target feature phrase; obtain the target feature The position information of the phrase in the reading and comprehension of the source text, and the position vector corresponding to the position information is generated; the target word vector, position vector, and preset target answer vector corresponding to the target phrase type are sent into the preset sequence to the sequence model, Generate question title text corresponding to the target phrase type. In this embodiment, the position information vector is combined with the manually preset answer text, and the sequence-to-sequence model can be combined to automatically generate a title that is more suitable for reading and understanding the original text of the source text. The answer corresponding to the question is also more unique.

Further, referring to FIG. 3, FIG. 3 is a schematic flowchart of a second embodiment of a method for generating reading comprehension questions in this application; based on the first embodiment of the above method for generating reading comprehension question questions, a method for generating reading comprehension according to the present application is proposed The second embodiment of the method of understanding the problem title.

In this embodiment, before the step S60, the method further includes:

Step S031: Obtain the target sample text corresponding to the target phrase type from the preset storage area.

It is understandable that in this embodiment, a plurality of sample texts related to different phrase types (such as person names, place names, organization names, time, number, date, etc.) will be pre-stored in the database (ie, the preset storage area) as Training corpus (ie target sample text); and establish the mapping relationship between different training corpora and target phrase types; at the same time, train these corpora based on the seq2seq model to generate a question generation model, and the generation method of the question generation model is specific The following steps S032 to S035:

Step S032: Perform word segmentation on the target sample text so that the target sample text has sample text phrases;

Step S033: Generate a sample word vector corresponding to the sample text phrase;

Step S034: Add a preset target answer vector corresponding to the target phrase type and the sample word vector, and use the addition result as a feature vector of the target sample text;

Step S035: Send the feature vector as the input sequence into the preset sequence to the sequence seq2seq model for training, and use the training result as the question generation model.

Further, after the step S50, it further includes:

Step S51: Determine the target sentence text corresponding to the target feature phrase according to the location information;

Step S52: Perform word segmentation on the target sentence text, so that the target sentence text has multiple parts of speech feature words with different parts of speech;

It is understandable that this embodiment will segment the sentence in which the target feature word is located, and the segmentation result is that the target sentence text has multiple parts of speech feature words with different parts of speech;

Step S53: Convert each part-of-speech feature word of the target sentence text into a part-of-speech feature word vector;

Step S54: and obtain the sequence of the positions of each part-of-speech feature word in the target sentence text;

It is understandable that the order of positions here is the order of words from left to right in a sentence of an article.

Correspondingly, the step S60 is specifically "sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the question generation model, and generating and State the question title text corresponding to the target phrase type";

In addition, the step 60 further includes:

Step S601: Use the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type as the input feature sequence of the question generation model;

It is understandable that, in this embodiment, x is used to characterize the aforementioned input feature sequence, and the computer system first inputs the aforementioned input feature sequence x into the multi-head self-attention layer of the seq2seq model, and then performs residual connection processing and normalization processing; Then input the processed input feature sequence into the position-wise feed-forward network layer of the seq2seq model, and then perform residual connection processing and normalization processing to generate an input processing sequence;

Step S602: traverse each part-of-speech feature word vector according to the position sequence, and use the traversed part-of-speech feature word vector as the output feature sequence of the question generation model;

It is understandable that y is used to represent each part-of-speech feature word vector. In the above steps, the position sequence t of each part-of-speech feature word in the target sentence text has been obtained, then the computer system will detect the occurrence of each part-of-speech feature word in the target sentence text. Traverse each part-of-speech feature word vector y of, and record the t-th part-of-speech feature word vector traversed as y _t , and use y _t as the output feature sequence of the problem generation model;

Step S603: Send the input feature sequence and the output feature sequence to the question generation model for calculation until the traversal is completed, and the calculation result is used as target vector data;

In this embodiment, the problem generation model is characterized by the following formula:

Where x represents the input feature sequence, y _t represents the part-of-speech feature word vector corresponding to the t-th part-of-speech feature word in _{the target sentence text, n y} represents the number of part-of-speech feature words in the target sentence text, P( y|x) characterize the target vector data;

The above formula can be understood as: send each part-of-speech feature word vector y (at most t=n _y part-of-speech feature word vectors) and the input feature sequence x into the problem generation model, generate new vector data, and add n _{The y} new vector data are added together, and finally the target vector data P(y|x) is obtained.

Step S604: Convert the target vector data into question title text corresponding to the target phrase type.

Specifically, in this embodiment, the target vector data can be converted from a vector to a text format through the NLTK tool, and finally a more level question that fits the original meaning of the reading comprehension article is generated, and the answer corresponding to the generated question is also More unique.

Further, referring to FIG. 4, FIG. 4 is a schematic flowchart of a third embodiment of a method for generating a question title for reading comprehension in this application; based on the first embodiment or the second embodiment of the method for generating a question title for reading comprehension, the present application is proposed Apply for a third embodiment of a method for generating reading comprehension questions.

In this embodiment, the step S20 specifically includes:

Step S201: Perform segmentation processing on the reading comprehension source text according to semantic rules to obtain multiple paragraph texts;

In a specific implementation, this embodiment can use the NLTK tool to divide the reading comprehension source text into multiple semantically complete paragraphs according to semantic rules, and each paragraph is guaranteed to have a subject.

Step S202: perform word segmentation processing on each paragraph text according to the phrase type, so that each paragraph text has a plurality of characteristic phrases of different phrase types;

The step S50 specifically includes:

Step S500: Obtain the position information of the target characteristic phrase in the paragraph text, and generate a position vector corresponding to the position information.

In this embodiment, a reading comprehension text is divided into several semantic paragraphs, and the subtopics described in each paragraph are different, and they are all independent. The parts describing similar content in the text are grouped together so that the semantic paragraph has the greatest semantic consistency. The analysis of the text can be reduced from the original study of the text to the study of the semantic paragraph; the form of this segmentation is similar to the division of natural paragraphs of the article, and it aims to quickly and accurately obtain the required information from a large amount of text.

Further, in an embodiment, after the step S60,

Step: Obtain a preset target answer corresponding to the preset target answer vector;

Step: establishing a mapping relationship between the preset target answer and the question title text, and storing the mapping relationship and the question title text in the preset storage area.

It is understandable that, in this embodiment, the generated question title text and the mapping relationship between the preset target answer and the question title text are stored in the database, so as to facilitate direct use in the next question.

In addition, referring to FIG. 5, this application also proposes a device for generating reading comprehension questions, which includes:

The obtaining module 10 is used to obtain the source text for reading comprehension to be processed;

The word segmentation module 20 is configured to perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;

The determining module 30 is configured to determine a target phrase type from the phrase type, obtain a preset target answer vector corresponding to the target phrase type from a preset storage area, the target phrase type and the preset target answer The vector preset has a preset mapping relationship;

The selecting module 40 is configured to select the target feature phrase corresponding to the target phrase type from each feature phrase, and generate a target word vector corresponding to the target feature phrase;

The recording module 50 is configured to obtain position information of the target characteristic phrase in the reading comprehension source text, and generate a position vector corresponding to the position information;

The generating module 60 is configured to send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the preset sequence-to-sequence seq2seq model, and generate a sequence corresponding to the target Question title text corresponding to the phrase type.

It is understandable that the device for generating reading comprehension questionnaires in this embodiment may be a computer application program loaded in the device for generating reading comprehension questionnaires in the above embodiment, and the device for generating reading comprehension questions The equipment for the question question can be the host computer used by the questioner. For the specific implementation of the device for generating reading comprehension questions in the present application, please refer to the foregoing embodiment of the method for generating reading comprehension question questions, which will not be repeated here.

In addition, the present application also provides a computer storage medium, the computer storage medium may be volatile or non-volatile, the computer storage medium stores a program for generating reading comprehension questions, and the generating reading When the comprehensible question item program is executed by the processor, the steps of the method for generating the question question for reading comprehension as described above are realized.

It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or system. Without more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or system that includes the element.

The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.

Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disks, optical disks), including several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.

The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims

A method for generating reading comprehension questions, wherein the method includes:

Obtain the source text for reading comprehension to be processed;

Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;

The target phrase type is determined from the phrase types, and a preset target answer vector corresponding to the target phrase type is obtained from a preset storage area, and there is a preset mapping relationship between the target phrase type and the preset target answer vector ；

Selecting a target feature phrase corresponding to the target phrase type from each feature phrase, and generating a target word vector corresponding to the target feature phrase;

Acquiring position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information;

Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence-to-sequence seq2seq model to generate a question title corresponding to the target phrase type text.
The method of claim 1, wherein said sending said target word vector, said position vector, and said preset target answer vector corresponding to said target phrase type into a preset sequence to sequence seq2seq model , Before the step of generating the question title text corresponding to the target phrase type, it also includes:

Obtaining the target sample text corresponding to the target phrase type from the preset storage area;

Segmenting the target sample text so that the target sample text has sample text phrases;

Generating a sample word vector corresponding to the sample text phrase;

Adding a preset target answer vector corresponding to the target phrase type and the sample word vector, and using the addition result as a feature vector of the target sample text;

Sending the feature vector as an input sequence into a preset sequence to the sequence seq2seq model for training, and using the training result as a problem generation model;

Said sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence to a sequence seq2seq model, and generating a sequence corresponding to the target phrase type The steps for the text of the question title include:

The target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type are sent into the question generation model to generate a question title text corresponding to the target phrase type.
3. The method according to claim 2, wherein after the step of obtaining the position information of the target characteristic phrase in the source text for reading comprehension, and generating a position vector corresponding to the position information, the method further comprises:

Determine the target sentence text corresponding to the target feature phrase according to the location information;

Performing word segmentation on the target sentence text, so that the target sentence text has multiple parts of speech feature words with different parts of speech;

Respectively converting each part-of-speech feature word of the target sentence text into a part-of-speech feature word vector;

And obtain the sequence of positions of each part-of-speech feature words in the target sentence text;

The sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the question generation model to generate a question question corresponding to the target phrase type The steps of the text include:

Using the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type as the input feature sequence of the question generation model;

Traverse each part-of-speech feature word vector according to the position sequence, and use the traversed part-of-speech feature word vector as the output feature sequence of the question generation model;

Sending the input feature sequence and the output feature sequence to the problem generation model for calculation until the traversal is completed, and the calculation result is used as target vector data;

The target vector data is converted into question title text corresponding to the target phrase type.
The method according to claim 3, wherein the problem generation model is characterized by the following formula:

Where x represents the input feature sequence, y t represents the part-of-speech feature word vector corresponding to the t-th part-of-speech feature word in the target sentence text, n y represents the number of part-of-speech feature words in the target sentence text, P( y|x) represents the target vector data.
The method according to any one of claims 1 to 4, wherein the step of performing word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has a plurality of characteristic phrases of different phrase types ,include:

Perform segmentation processing on the reading comprehension source text according to semantic rules to obtain multiple paragraph texts;

Separate each paragraph text according to the phrase type, so that each paragraph text has multiple characteristic phrases of different phrase types;

The step of acquiring the position information of the target characteristic phrase in the reading comprehension source text and generating a position vector corresponding to the position information specifically includes:

The position information of the target feature phrase in the paragraph text is acquired, and a position vector corresponding to the position information is generated.
The method of claim 5, wherein the phrase type includes at least one of a person phrase type, a time phrase type, and a location phrase type.
The method according to any one of claims 1-4, wherein said sending said target word vector, said position vector, and said preset target answer vector corresponding to said target phrase type into a preset In the sequence-to-sequence seq2seq model, after the step of generating the question title text corresponding to the target phrase type, it also includes:

Obtaining a preset target answer corresponding to the preset target answer vector;

A mapping relationship between the preset target answer and the question title text is established, and the mapping relationship and the question title text are stored in the preset storage area.
A device for generating question questions for reading comprehension, wherein the device includes:

The acquisition module is used to acquire the source text for reading comprehension to be processed;

The word segmentation module is used to segment the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;

The determining module is configured to determine the target phrase type from the phrase type, and obtain a preset target answer vector corresponding to the target phrase type from a preset storage area, the target phrase type and the preset target answer vector There is a preset mapping relationship by default;

The selection module is used to select the target feature phrase corresponding to the target phrase type from each feature phrase, and generate a target word vector corresponding to the target feature phrase;

A recording module, configured to obtain position information of the target characteristic phrase in the reading comprehension source text, and generate a position vector corresponding to the position information;

A generating module, configured to send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the preset sequence-to-sequence seq2seq model to generate the target phrase Question title text corresponding to the type.
A device for generating reading comprehension question questions, wherein the device includes: a memory, a processor, and a program for generating reading comprehension question questions stored on the memory and running on the processor, The program for generating reading comprehension questions is configured to implement the following steps:

Obtain the source text for reading comprehension to be processed;

Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;

The target phrase type is determined from the phrase types, and a preset target answer vector corresponding to the target phrase type is obtained from a preset storage area, and there is a preset mapping relationship between the target phrase type and the preset target answer vector ；

Selecting a target feature phrase corresponding to the target phrase type from each feature phrase, and generating a target word vector corresponding to the target feature phrase;

Acquiring position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information;

Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence-to-sequence seq2seq model to generate a question title corresponding to the target phrase type text.
The device according to claim 9, wherein said sending said target word vector, said position vector, and said preset target answer vector corresponding to said target phrase type into a preset sequence to sequence seq2seq model Before the step of generating the question title text corresponding to the target phrase type, when the program for generating the question title for reading comprehension is executed by the processor, the following steps are further implemented:

Obtaining the target sample text corresponding to the target phrase type from the preset storage area;

Segmenting the target sample text so that the target sample text has sample text phrases;

Generating a sample word vector corresponding to the sample text phrase;

Adding a preset target answer vector corresponding to the target phrase type and the sample word vector, and using the addition result as a feature vector of the target sample text;

Sending the feature vector as an input sequence into a preset sequence to the sequence seq2seq model for training, and using the training result as a problem generation model;

Said sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence to a sequence seq2seq model, and generating a sequence corresponding to the target phrase type The steps for the text of the question title include:

The target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type are sent into the question generation model to generate a question title text corresponding to the target phrase type.
The device according to claim 10, wherein after the step of acquiring the position information of the target characteristic phrase in the source text of reading comprehension, and generating a position vector corresponding to the position information, the generating reading comprehension When the program of the problem problem is executed by the processor, the following steps are also implemented:

Determine the target sentence text corresponding to the target feature phrase according to the location information;

Performing word segmentation on the target sentence text, so that the target sentence text has multiple parts of speech feature words with different parts of speech;

Respectively converting each part-of-speech feature word of the target sentence text into a part-of-speech feature word vector;

And obtain the sequence of positions of each part-of-speech feature words in the target sentence text;

The sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the question generation model to generate a question question corresponding to the target phrase type The steps of the text include:

Using the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type as the input feature sequence of the question generation model;

Traverse each part-of-speech feature word vector according to the position sequence, and use the traversed part-of-speech feature word vector as the output feature sequence of the question generation model;

Sending the input feature sequence and the output feature sequence to the problem generation model for calculation until the traversal is completed, and the calculation result is used as target vector data;

The target vector data is converted into question title text corresponding to the target phrase type.
The device according to claim 11, wherein the problem generation model is characterized by the following formula:

Where x represents the input feature sequence, y t represents the part-of-speech feature word vector corresponding to the t-th part-of-speech feature word in the target sentence text, n y represents the number of part-of-speech feature words in the target sentence text, P( y|x) represents the target vector data.
The device according to any one of claims 9-12, wherein the step of performing word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types ,include:

Perform segmentation processing on the reading comprehension source text according to semantic rules to obtain multiple paragraph texts;

Separate each paragraph text according to the phrase type, so that each paragraph text has multiple characteristic phrases of different phrase types;

The step of acquiring the position information of the target characteristic phrase in the reading comprehension source text and generating a position vector corresponding to the position information specifically includes:

The position information of the target feature phrase in the paragraph text is acquired, and a position vector corresponding to the position information is generated.
The device of claim 13, wherein the phrase type includes at least one of a person phrase type, a time phrase type, and a location phrase type.
The device according to any one of claims 9-12, wherein said sending said target word vector, said position vector, and said preset target answer vector corresponding to said target phrase type into a preset In the sequence-to-sequence seq2seq model, after the step of generating the question title text corresponding to the target phrase type, when the program for generating the question title for reading comprehension is executed by the processor, the following steps are also implemented:

Obtaining a preset target answer corresponding to the preset target answer vector;

A mapping relationship between the preset target answer and the question title text is established, and the mapping relationship and the question title text are stored in the preset storage area.
A storage medium, wherein the storage medium is a computer-readable storage medium; the computer-readable storage medium stores a program for generating reading comprehension question questions, and the program for generating reading comprehension question questions is configured to achieve the following step:

Obtain the source text for reading comprehension to be processed;

Perform word segmentation processing on the reading comprehension source text according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types;

The target phrase type is determined from the phrase types, and a preset target answer vector corresponding to the target phrase type is obtained from a preset storage area, and there is a preset mapping relationship between the target phrase type and the preset target answer vector ；

Selecting a target feature phrase corresponding to the target phrase type from each feature phrase, and generating a target word vector corresponding to the target feature phrase;

Acquiring position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information;

Send the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence-to-sequence seq2seq model to generate a question title corresponding to the target phrase type text.
The storage medium according to claim 16, wherein said sending said target word vector, said position vector, and said preset target answer vector corresponding to said target phrase type into a preset sequence to sequence seq2seq In the model, before the step of generating the question title text corresponding to the target phrase type, when the program for generating the question title for reading comprehension is executed by the processor, the following steps are also implemented:

Obtaining the target sample text corresponding to the target phrase type from the preset storage area;

Segmenting the target sample text so that the target sample text has sample text phrases;

Generating a sample word vector corresponding to the sample text phrase;

Adding a preset target answer vector corresponding to the target phrase type and the sample word vector, and using the addition result as a feature vector of the target sample text;

Sending the feature vector as an input sequence into a preset sequence to the sequence seq2seq model for training, and using the training result as a problem generation model;

Said sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into a preset sequence to a sequence seq2seq model, and generating a sequence corresponding to the target phrase type The steps for the text of the question title include:

The target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type are sent into the question generation model to generate a question title text corresponding to the target phrase type.
The storage medium according to claim 17, wherein, after the step of obtaining the position information of the target characteristic phrase in the reading comprehension source text, and generating a position vector corresponding to the position information, the generating reading When the program of the understood problem topic is executed by the processor, the following steps are also implemented:

Determine the target sentence text corresponding to the target feature phrase according to the location information;

Performing word segmentation on the target sentence text, so that the target sentence text has multiple parts of speech feature words with different parts of speech;

Respectively converting each part-of-speech feature word of the target sentence text into a part-of-speech feature word vector;

And obtain the sequence of positions of each part-of-speech feature words in the target sentence text;

The sending the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type into the question generation model to generate a question question corresponding to the target phrase type The steps of the text include:

Using the target word vector, the position vector, and the preset target answer vector corresponding to the target phrase type as the input feature sequence of the question generation model;

Traverse each part-of-speech feature word vector according to the position sequence, and use the traversed part-of-speech feature word vector as the output feature sequence of the question generation model;

Sending the input feature sequence and the output feature sequence to the problem generation model for calculation until the traversal is completed, and the calculation result is used as target vector data;

The target vector data is converted into question title text corresponding to the target phrase type.
The storage medium of claim 18, wherein the problem generation model is characterized by the following formula:

Where x represents the input feature sequence, y t represents the part-of-speech feature word vector corresponding to the t-th part-of-speech feature word in the target sentence text, n y represents the number of part-of-speech feature words in the target sentence text, P( y|x) represents the target vector data.
The storage medium according to any one of claims 16-19, wherein the reading comprehension source text is segmented according to phrase types, so that the reading comprehension source text has multiple characteristic phrases of different phrase types The steps include:

Perform segmentation processing on the reading comprehension source text according to semantic rules to obtain multiple paragraph texts;

Separate each paragraph text according to the phrase type, so that each paragraph text has multiple characteristic phrases of different phrase types;

The step of acquiring the position information of the target characteristic phrase in the reading comprehension source text and generating a position vector corresponding to the position information specifically includes:

The position information of the target feature phrase in the paragraph text is acquired, and a position vector corresponding to the position information is generated.