CN108363743B

CN108363743B - Intelligent problem generation method and device and computer readable storage medium

Info

Publication number: CN108363743B
Application number: CN201810068857.5A
Authority: CN
Inventors: 韩金新; 郑海涛; 王伟; 陈金元; 肖喜
Original assignee: Shenzhen Graduate School Tsinghua University
Current assignee: Shenzhen Graduate School Tsinghua University
Priority date: 2018-01-24
Filing date: 2018-01-24
Publication date: 2020-06-02
Anticipated expiration: 2038-01-24
Also published as: CN108363743A

Abstract

The invention discloses an intelligent question generation method, an intelligent question generation device and a computer readable storage medium, which are used for automatically generating and outputting questions for input articles and comprise the following steps: s1, extracting key contents of the article by using a seq2seq model; s2, carrying out syntactic analysis and named entity recognition on each sentence in the key content to establish a corresponding syntax tree of each sentence; s3, matching the grammar tree with question templates in a question template database established in advance, and if matched question templates exist, converting sentences corresponding to the grammar tree into question sentences based on the matched question templates so as to generate questions; and S4, sorting the generated problems by using a neural network and outputting the sorted problems.

Description

Intelligent problem generation method and device and computer readable storage medium

Technical Field

The invention relates to the technical field of computers and natural language processing, in particular to an intelligent problem generation method based on deep learning and a corresponding device.

Background

With the rapid development of computer networks, more and more information data are available on the networks, and users cannot find their interest points until all information is completely read. At present, for the screening reading of articles, documents and the like on the network, most of the articles are still determined whether to continue reading the articles by browsing titles, however, the method has the defects that: many titles do not accurately and comprehensively reflect the core content of the articles, resulting in the user not being able to accurately find the articles of interest or missing the articles of interest. Therefore, we can consider the relevant technology of natural language processing to concentrate the article content into several relevant questions, and attract the reading interest of the user in the form of questions, when there are just topics that the user cares about or is interested in the questions of the article, the user can be attracted to find answers in the article, so that the reading interest of the user can be greatly increased.

At present, the related technology of automatically generating questions for articles by using natural language processing technology is mainly applied to the field of education and teaching, for example, a teacher is helped to generate a series of questions from reading documents to evaluate the comprehension degree of the documents by students, so that the workload of the teacher can be greatly reduced, and more energy of the teacher is put into teaching.

According to the current investigation situation, a mature technology for solving the problem of intelligent problem generation does not exist in the aspect of Chinese language processing; existing english-related problem generation methods can be classified into three categories: a semantic structure based problem generation algorithm, a template based problem generation algorithm, and a sequence based problem generation algorithm.

Problem generation algorithm based on semantic structure: the semantic roles in the sentences mainly include grammatical roles such as implementers, subjects, topics, targets, tools, time, places, predicates and the like, each sentence in the text is composed of the components, the correlation among words in the sentence can be found by identifying the role of each word in the sentence, and Kunichika and Mazidi and the like use the correlation to generate problems. Kunichika et al generated questions about English stories using grammatical relations to test the level of understanding of stories by different people, where the questions were generated from five perspectives: the method comprises the steps of questioning the content of the whole sentence, questioning by utilizing the correspondence between the similar meaning words and the antisense words, questioning according to the relation between time and space, questioning words in a complex form in the sentence and questioning related phrases, but the result shows that the problems generated in the mode have more grammar errors. Mazidi et al add natural language comprehension content on the basis of semantic structure, and firstly, perform grammatical role labeling and semantic role labeling on text content respectively, and synthesize two results, thereby obtaining good effect on problem generation. For example, for a sentence "Xiaoming yesterday night encounters a little red in the park", semantic character labeling is performed first, as shown in Table 1 below:

TABLE 1

Input sequence

Xiaoming liquor

Yesterday

At night

In that

Park

Encounter with

To master

Small red

。

Semantic roles

Implementer of

Time of day

Location of a site

Predicate(s)

Test subject

In this example sentence, the semantic role is "Xiaoming (implementer) yesterday night (time) meets (predicate) Small Red (Subject) in park (place)", and then, a question similar to "1, who met Small Red in park yesterday night? 2. Will you see when a little red is encountered in the park? 3. Yesterday of xiao you meet who in the park at night? "and the like. However, in such a method for generating questions based on semantic relations, the generated questions are often too trivial, two questions are often constructed from one subject and the implementer, but in reality, we do not pay attention to all the questions.

Problem generation algorithm based on template: fixed types of questions are generated by manually set rules. Mostow et al, in order to satisfy the requirement of generating a specified question from a structured text for students, learned the expression form of a large amount of texts, first designed three templates to generate what, why, how types of questions, the specific templates are as shown in the following Table 2:

TABLE 2

Type of problem	Question template
		WHAT	What did<character><verb>？
HOW	How did<character><verb><complement>？
		WHY	Why was/were<character><past-participle>？

And experiments are carried out on more than five hundred articles, and a certain result is obtained by adopting a manual evaluation mode. Then, Labutov et al extract low-dimensional ontologies from a large number of articles captured from wikipedia, and manually design templates from these ontologies to generate problems, such as (Person, early life) < whow wee the key inflections on < Person > In the child package >. Lindberg et al, which synthesizes semantic structures and templates, first analyzes the fixed existing patterns of sentence structures, and then creates 60 templates according to the patterns to generate problems, thereby achieving good effects.

Sequence-based problem generation algorithms: in the aspect of natural language processing, a cyclic Neural network (RNN) is mainly used for constructing a seq2seq model for solving the problem of time sequence dependence, the problem generation task is regarded as conversion from a sentence to a sentence, a large amount of text data is converted into a vector by using word2vec, and the method can fully utilize the similarity of words to predict the next word in a maximum probability manner until a terminal symbol appears. The recurrent neural network comprises various improved versions, one of which is LSTM (Long-Short term memory), the LSTM mechanism is very sound and flexible to realize, Serban et al use the model to construct a logic triple (comprising subject, correlation and entity) to construct potential problems, the realization of the model needs a large amount of labeled training data, 100000 English data combinations of < text, problem > which are manually labeled are used to train an LSTM network to generate fixed form problems. Xinya Du et al extracted triples of < sentences, questions, answers > in the SQuAD dataset, then trained neural networks from sentence and paragraph levels to solve the problems, and systematically adopted manual and automatic evaluation methods, the whole scheme seems perfect. Mosafazadeh proposes visualization problem generation, i.e. generating a problem from a picture. Based on the MSCOCO picture description data set, Microsoft enriches a large number of data sets, and utilizes a large number of people on Amazon to label problems in the picture, including event-based and physical problems, wherein 75000 problems are contained in three databases, and then trains a seq2seq model to generate the problems. It can be seen that a large and high quality data set is key to the implementation of a character-based problem generation algorithm.

However, the above method has significant disadvantages in processing complex chinese: because the Chinese is complex in structure and contains the expressed article by rendering layer by layer, unlike the educational article which is directly clear, the problem that the lengthy text is generated in the prototype system can not meet the requirement of the user, for example, the repeated occurrence of similar problems directly reduces the satisfaction degree of the generated problem, and the user only wants to be the problem of positioning the key content in the text, so the lengthy text is not a good input for the problem generation; further, outputting the questions after taking the order of generation of the questions as the final output order of the questions, or using a simple sorting method such as linear regression sorting, cannot ensure that each question is sorted in the most appropriate position.

The above background disclosure is only for the purpose of assisting understanding of the inventive concept and technical solutions of the present invention, and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above content is disclosed before the filing date of the present patent application.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an intelligent problem generation method and device based on deep learning.

One of the technical solutions proposed by the present invention to achieve the above object is as follows:

an intelligent question generation method is used for automatically generating questions for input articles and outputting the questions, and comprises the following steps:

s1, extracting key contents of the article by using a seq2seq model;

s2, carrying out syntactic analysis and named entity recognition on each sentence in the key content to establish a corresponding syntax tree of each sentence;

s3, matching the grammar tree with question templates in a question template database established in advance, and if matched question templates exist, converting key sentences corresponding to the grammar tree into question sentences based on the matched question templates so as to generate questions;

and S4, automatically scoring the generated problems by using a problem ranking model based on a neural network architecture and outputting the problems in a ranking mode according to scores.

The method provided by the invention aims to solve the problem that the key content is automatically extracted from a lengthy Chinese article and is displayed in a problem mode aiming at the key content (or abstract) so that a user can decide whether to continue reading the article by browsing the problem, thereby not only saving the reading time of the user, but also increasing the reading interest of the user by displaying the core content of the article in a problem mode. The method provided by the invention can be used for constructing an intelligent auxiliary reading system, namely, a certain specific problem set can be generated for the user, and the user can select an intentional problem from the problem set to read with a destination, so that the reading interest is increased, and the reading quality is improved.

In addition, the technical scheme of the system device provided by the invention based on the method is as follows:

an intelligent question generation device for automatically generating questions for input articles and outputting the questions, comprising: the seq2seq model is used for extracting key contents of the article; a syntax tree construction program for performing syntactic analysis and named entity recognition on each sentence in the key content to establish a syntax tree corresponding to each sentence; a question construction program for matching the grammar tree with question templates in a question template database established in advance and converting sentences corresponding to the grammar tree into question sentences based on the matched question templates to generate questions when the matched question templates exist; and the problem ordering model based on the neural network architecture is used for automatically scoring the generated problems and outputting the problems in a ranking mode according to scores.

By adopting the intelligent problem generating device, some specific problem sets can be generated for the user, and the user can select the intentional problem to read with the destination, so that the reading interest is increased, and the reading quality is improved.

In addition, the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor can implement the steps of the aforementioned intelligent problem generation method.

Drawings

FIG. 1 is a flow chart of a method for generating an intelligent question provided by an embodiment of the present invention;

FIG. 2 is a schematic diagram of an intelligent problem generation apparatus according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a neural network architecture for a problem ranking model;

FIG. 4 is a schematic diagram of a process for syntactic analysis and named entity recognition of a sentence;

FIG. 5 is a diagram of a syntax tree constructed based on the syntactic analysis of a sentence and the named entity recognition results.

Detailed Description

The invention is further described with reference to the following figures and detailed description of embodiments.

The invention aims to generate a high-quality problem aiming at the key content of an article so as to increase the interest of a user in reading the article and improve the reading quality. To this end, the present invention proposes an intelligent question generation method based on deep learning, by which a question can be automatically generated and output for an input article, and referring to fig. 1, the method includes the following steps S1 to S4:

and step S1, extracting key contents of the input article by using the seq2seq model. Specifically, when receiving an article input, the seq2seq model first calls a basic natural language processing unit to perform text preprocessing, including data cleansing: removing blank spaces at two ends of the text, removing illegal symbols, performing capital and small English conversion and the like; secondly, the sentence is processed into words by using the Chinese character segmentation, the words are processed into word vectors by word nesting by using word2vec of Google open source, and the word vectors can better express the correlation between the words and provide better context information; and then, a frame of the model is built by using a deep learning frame TensorFlow, training is carried out, and after training is carried out, key content can be extracted from word vectors of the article formed by preprocessing. The training process comprises the following steps:

firstly, manually extracting key contents of a plurality of articles for training, and forming a training sample by each training article and the manually extracted key contents thereof so as to establish a training set of a seq2seq model; and then, inputting the training set into a seq2seq model, and continuously carrying out iterative training until the model parameters are converged to obtain the trained seq2seq model. The method comprises the following specific steps:

for an input news text sequence x₁,x₂,L,x_mFirstly, an embedded matrix is used to obtain an embedded representation, and the calculation process is as follows:

wherein the content of the first and second substances,

is the embedding matrix, l is the dimension of the embedding matrix, V is the size of the entire vocabulary, x_iRepresenting a news text sequence x₁,x₂,L,x_mI-th word in (1, 2, …, m). Then, the embedded representation sequence of the text is sequentially input into the coding module to obtain a forward hidden state sequence as follows:

wherein the content of the first and second substances,

is a k-dimensional vector, LSTM_fRepresenting a forward LSTM element. At the same time, to capture the reverse information of the sequenceAnd reversely inputting the sequence into a backward LSTM unit to obtain a backward hidden state sequence as follows:

wherein, LSTM_bBackward LSTM cells are indicated. By concatenating the forward and backward hidden state representation sequences, a hidden state representation of the news sequence is obtained, as follows:

wherein the content of the first and second substances,

is the word x_iIs shown in hidden state. The method combines the last word x of the news sequence_mIs represented by a hidden state of_mHidden state representation h as whole news_c＝h_m. Through a series of processes of the encoding module, we have converted the news text sequence into a vector representation, which is used as a context vector when the decoding module extracts key sentences. The decoding module adopts a single-layer LSTM network, and a news context vector h obtained by the encoding module is firstly used_cThe decoding module is initialized. Then, for the key sentence sequence y₁,y₂,...,y_nProcessing results in its embedded representation:

it is noted that we use the same embedding matrix E in encoding and decoding. Also we input the embedded representation of the key sentence sequence into the decoding module in turn to get its hidden state representation:

wherein the content of the first and second substances,

is a hidden state representation of the jth word of the key sentence sequence.

In a more preferred embodiment, it is proposed to add an attention mechanism seq2seq + attention on the basis of seq2seq to solve the problem of out-of-vocabularies, so as to make the extracted sentences of the key contents more smooth. The traditional seq2seq adopts a calculation mode of a fixed word list, converts each word in the word list into a word vector through word2vec, inputs the word vector into an LSTM for training, but when a new word appears in a test set, a model cannot be well processed, and a < UNK > is usually used for replacing the new word, so that the output key content contains a < UNK > symbol, and sentences are not smooth and unclear. Which can lead to a reduction in the quality of the problems generated in the present invention. Therefore, the invention adds an attention mechanism on the upper layer of the seq2seq model, when the keyword extraction is carried out in the model, if a word which does not appear in the word list is encountered, the word is copied from the original text to the output result with a certain probability, thereby ensuring that < UNK > does not appear in the prediction process and ensuring the readability of the sentence. The word is copied from the original text to an output result with a certain probability, namely, the new word (possibly a new word which has an error in the final participle during preprocessing or continuously appears in the network) can be directly generated from the model without any output, and the new word can also be directly copied to the output result, so that the problem of limitation of a fixed word list to the new word is solved.

Step S2, processing each sentence in the key content extracted by the seq2seq model in step S1: syntactic analysis and named entity recognition are performed to build a corresponding syntax tree for each sentence. The process of syntactic analysis and named entity recognition of a sentence roughly comprises: firstly, segmenting words of a sentence, then, carrying out part-of-speech tagging on each word, and then, carrying out named entity identification on each word, such as name of a person, name of a institution, name of a place or other entities marked by names. After the syntactic analysis and named entity recognition are completed, a corresponding syntax tree can be built for the sentence. This step will be described later by way of specific examples.

Step S3, after a corresponding syntax tree is established for each sentence in the key content, matching the syntax tree with a question template in a question template database established in advance, and if a matched question template exists, converting the sentence corresponding to the syntax tree into a question sentence based on the matched question template, thereby generating a question. The format of the question template in the database may be as follows (to name just a few):

question template representing "how many": QP < CD ═ number < CLP

Question template representing "number of days": QP < OD ═ number

The symbols therein are from the Stanford Natural language labs definition of the components present in the syntax tree. When a question template is successfully matched with the grammar tree of the current sentence, the current sentence is rewritten into a question sentence based on the question template by using the question template, and thus a question is generated.

The question template database is formed by learning language rules from a large amount of article data to obtain a large amount of question templates. For each sentence in the key content, the respective syntax tree is used for matching a question template in a database, and once matching is successful, the sentence is directly converted into a corresponding question sentence by using the matched question template, so that a corresponding question is generated. When the sentence is not matched with the question template in the current database, the sentence can not generate a question; and (4) making a new problem template by counting sentences which can not generate problems in batches, and updating the problem template database.

In step S4, after the aforementioned steps S1, S2, and S3, the question may be generated for the first input sentence, and this step sorts and outputs the generated question by using a question sorting model based on a neural network architecture, thereby ensuring that each question appears at an appropriate position. Compared with the traditional sorting method, the method for automatically scoring the problems and sorting and outputting the problems by using the neural network has higher accuracy and does not need to manually set parameters. The training process of the problem ranking model comprises the following steps:

firstly, manually scoring a plurality of questions for training generated from a plurality of articles; comprehensively scoring according to language logic of the generated problems and the value degree of the problems, for example, if the logic is strong, the problems are valuable and high in quality, the score of the problems is higher, otherwise, the score is lower; secondly, extracting features of each problem to obtain a feature set, manually grading each feature set and the corresponding problems to form a training sample, obtaining a plurality of training samples, and forming a training set of the problem ranking model; and inputting the training set into a neural network, and continuously carrying out iterative training until the model parameters are converged to obtain a trained problem ordering model. In the training, in a training sample, a feature set of a problem is used as an Input of the neural network, and an artificial score corresponding to the problem is used as an output of the neural network, for example, referring to fig. 3, the artificial score is a schematic diagram of a neural network model of a problem ranking model, an Input layer is a feature set of each problem, for example, when the training sample is a kth problem, the Input layer is a feature set [ P ] of the kth problem_k1，P_k2，…，P_k10]And the output layer is the artificial score Q of the kth question_kThereby training the model parameters. After training, extracting a corresponding feature set from the generated problem according to the method, inputting the feature set into the model, automatically outputting the score of the problem, and outputting the score according to the order of the score. The process of manual scoring is replaced, labor force is saved, the scoring accuracy can be improved through continuous learning, and therefore the satisfaction degree of problem sequencing is improved.

When a problem for training is subjected to feature extraction, the extracted features include but are not limited to: generating an N-gram model score of the question, an index position of the answer part in the original text, an importance score of the labeled key sentence (the score can be obtained when the key content is extracted by using a seq2seq model) and basic unit statistical information (such as nouns, verbs and the density of stop words in the question), and a conversion rate of the sentence (how many words in a sentence are converted into the answer of the question). Of course, other more valuable features may also be extracted, and are not limited herein.

The implementation of the foregoing step S2 is described in detail below by way of a specific example:

in one embodiment, the process of syntactic analysis and named entity recognition of sentences refers to fig. 4, for example, the sentence "2014, with an epidemic of ebola virus, results in 1700 deaths of many people worldwide. "first, the word is divided, and the symbol" | "indicates the division. Then, for each word, a part-of-speech tagging label is added, for example, "2014" is labeled as "NT", and is a symbol used for representing a time noun in syntactic analysis; in addition, for example, "NN" represents a common noun, and these symbols are all syntactically general and are not explained one by one here. Syntactic analysis is done at the step "Tagging", followed by named entity recognition at the step "name entry". Then, based on the results obtained in fig. 4, the sentence "2014 was fraught with ebola virus. "build syntax tree, get syntax tree as shown in FIG. 5.

The following example illustrates how to generate a corresponding question based on matching the syntax tree with the question template. With continued reference to fig. 4 and 5, when matching the question template using the syntax tree with the word "cause" and the syntactic analysis label "PP" (preposed phrase), and detecting that the sentence contains "cause" and the syntactic analysis label, the question template with cause-effect relationship can be basically matched, so that a question with cause-effect relationship can be generated, for example, what caused death of 1700 people all over the world in the generation of question "2014? ".

The problem sorting output process is carried out by using the problem sorting model, and the input example is as follows:

at present, the related technology of automatically generating questions for articles by using natural language processing technology is mainly applied to the field of education and teaching, for example, a teacher is helped to generate a series of questions from reading documents to evaluate the comprehension degree of the documents by students, so that the workload of the teacher can be greatly reduced, and more energy of the teacher is put into teaching. According to the current investigation situation, a mature technology for solving the problem of intelligent problem generation does not exist in the aspect of Chinese language processing; existing english-related problem generation methods can be classified into three categories: a semantic structure based problem generation algorithm, a template based problem generation algorithm, and a sequence based problem generation algorithm.

With the foregoing intelligent problem generation method of the present invention adopted for the above example, the following 2 problems can be generated (but this is only an example and does not represent that only these 2 types of problems can be generated):

1. what can a teacher be helped by a correlation technique that automatically generates questions for an article using natural language processing techniques?

2. What kind of english-language question generation method is currently available?

3. What are currently relevant techniques for automatically generating questions for articles using natural language processing techniques?

4. According to the current research situation, whether there is a mature technology in the aspect of Chinese language processing to solve the problem of intelligent problem generation?

For the 4 questions, the order of generation is, for example, the above-mentioned order, however, the present invention does not tend to output simply according to the order of generation, but input the 4 questions into the aforementioned question ordering model of the present invention, which can score each sentence after the aforementioned training and output according to the score. For example, if the score of question 4 is the highest, then question 4 is ranked first when the question is output, so that the user can see the highest quality question first.

Another embodiment of the present invention provides a question generation apparatus (or system) based on the aforementioned question generation method, to automatically generate and output a question for an input article. Referring to fig. 2, the question generating means includes a key content extracting means 10, a question constructing means 20, and a question ranking output means 30. The key content extracting apparatus 10 is a seq2seq model trained in advance (the training method and process thereof are described above), and is used for extracting the key content of the article. The question constructing device 20 is mainly implemented by a syntax tree constructing program and a question constructing program, wherein the syntax tree constructing program is used for carrying out syntactic analysis and named entity recognition on each sentence in the key content so as to establish a corresponding syntax tree of each sentence; and the question construction program is used for matching the grammar tree with question templates in a question template database established in advance, and converting sentences corresponding to the grammar tree into question sentences based on the matched question templates to generate questions when the matched question templates exist. The problem ranking output device 30 is implemented by the trained problem ranking model based on the neural network, and can automatically score the generated problem and rank the generated problem according to the score by inputting the feature set of the problem into the problem ranking model. The problem generating device provided by the present invention may be presented in the form of a user mobile terminal application, and may also be implemented on a web page or computer software, which is not limited by the present invention.

In addition, the present invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor can implement the steps of the aforementioned intelligent problem generation method.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims

1. An intelligent question generation method is used for automatically generating questions for input articles and outputting the questions, and is characterized by comprising the following steps:

s1, extracting key contents of the article by using a seq2seq model;

s3, matching the grammar tree with question templates in a question template database established in advance, and if matched question templates exist, converting sentences corresponding to the grammar tree into question sentences based on the matched question templates so as to generate questions; when the grammar tree of the sentence is not matched with all question templates in the question template database, the sentence can not generate a question; counting sentences which can not generate problems so as to formulate a new problem template and update the problem template database;

2. The intelligent problem generation method of claim 1, wherein: the seq2seq model is obtained by training as follows:

manually extracting key contents of a plurality of articles for training, and forming a training sample by each training article and the manually extracted key contents thereof to establish a training set of a seq2seq model; and inputting the training set into a seq2seq model, and continuously carrying out iterative training until the model parameters are converged to obtain the trained seq2seq model.

3. The intelligent problem generation method of claim 2, wherein: the seq2seq model further has an attention mechanism, which is used for directly generating or directly copying the words which do not appear in the pre-established fixed word list as a part of the key content when the key content is extracted in step S1, so as to improve the readability of the generated sentence.

4. The intelligent problem generation method of claim 1, wherein: establishing the question template database by learning language rules from article data; and when the grammar tree of the sentence is matched with a certain question template in the question template database, directly converting the sentence into a question sentence by using the certain question template to generate a corresponding question.

5. The intelligent problem generation method of claim 1, wherein: the training of the problem ranking model comprises:

manually scoring a plurality of questions for training generated from a plurality of articles respectively; extracting features of each problem used for training to obtain a feature set, manually grading each feature set and the corresponding problem to form a training sample, obtaining a plurality of training samples, and forming a training set of the problem ranking model; and training the neural network model by using the training set, and performing continuous iterative training until the model parameters are converged to obtain a trained problem ordering model.

6. The intelligent problem generation method of claim 5, wherein: when the manual scoring is carried out on the questions for training, the scoring is comprehensively carried out according to the language logic of the questions and the value degree of the questions.

7. An intelligent question generation device for automatically generating questions for input articles and outputting the questions, comprising:

the seq2seq model is used for extracting key contents of the article;

a syntax tree construction program for performing syntactic analysis and named entity recognition on each sentence in the key content to establish a syntax tree corresponding to each sentence;

a question construction program for matching the grammar tree with question templates in a question template database established in advance and converting sentences corresponding to the grammar tree into question sentences based on the matched question templates to generate questions when the matched question templates exist; when no matched problem template exists, generating no problem for the sentence corresponding to the grammar tree, making a new problem template by batch statistics of the sentences which can not generate the problem, and updating the new problem template to the problem template database;

and the problem ordering model based on the neural network architecture is used for automatically scoring the generated problems and outputting the problems in a ranking mode according to scores.

8. The intelligent problem generation apparatus of claim 7, wherein: the seq2seq model also has an attention mechanism, which is used for directly generating or directly copying the words which do not appear in the pre-established fixed word list into an output result as a part of the key content when the key content is extracted.

9. The intelligent problem generation apparatus of claim 7, wherein: the problem ordering model is a pre-trained neural network model; the training set adopted when the neural network model is trained comprises a plurality of training samples, and each training sample consists of a feature set used for training a problem and an artificial score of the feature set.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program may, when executed by a processor, implement the steps of the method of any one of claims 1 to 6.