CN108363743B - Intelligent problem generation method and device and computer readable storage medium - Google Patents

Intelligent problem generation method and device and computer readable storage medium Download PDF

Info

Publication number
CN108363743B
CN108363743B CN201810068857.5A CN201810068857A CN108363743B CN 108363743 B CN108363743 B CN 108363743B CN 201810068857 A CN201810068857 A CN 201810068857A CN 108363743 B CN108363743 B CN 108363743B
Authority
CN
China
Prior art keywords
question
training
sentence
model
questions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810068857.5A
Other languages
Chinese (zh)
Other versions
CN108363743A (en
Inventor
韩金新
郑海涛
王伟
陈金元
肖喜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Tsinghua University
Original Assignee
Shenzhen Graduate School Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Tsinghua University filed Critical Shenzhen Graduate School Tsinghua University
Priority to CN201810068857.5A priority Critical patent/CN108363743B/en
Publication of CN108363743A publication Critical patent/CN108363743A/en
Application granted granted Critical
Publication of CN108363743B publication Critical patent/CN108363743B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Abstract

The invention discloses an intelligent question generation method, an intelligent question generation device and a computer readable storage medium, which are used for automatically generating and outputting questions for input articles and comprise the following steps: s1, extracting key contents of the article by using a seq2seq model; s2, carrying out syntactic analysis and named entity recognition on each sentence in the key content to establish a corresponding syntax tree of each sentence; s3, matching the grammar tree with question templates in a question template database established in advance, and if matched question templates exist, converting sentences corresponding to the grammar tree into question sentences based on the matched question templates so as to generate questions; and S4, sorting the generated problems by using a neural network and outputting the sorted problems.

Description

Intelligent problem generation method and device and computer readable storage medium
Technical Field
The invention relates to the technical field of computers and natural language processing, in particular to an intelligent problem generation method based on deep learning and a corresponding device.
Background
With the rapid development of computer networks, more and more information data are available on the networks, and users cannot find their interest points until all information is completely read. At present, for the screening reading of articles, documents and the like on the network, most of the articles are still determined whether to continue reading the articles by browsing titles, however, the method has the defects that: many titles do not accurately and comprehensively reflect the core content of the articles, resulting in the user not being able to accurately find the articles of interest or missing the articles of interest. Therefore, we can consider the relevant technology of natural language processing to concentrate the article content into several relevant questions, and attract the reading interest of the user in the form of questions, when there are just topics that the user cares about or is interested in the questions of the article, the user can be attracted to find answers in the article, so that the reading interest of the user can be greatly increased.
At present, the related technology of automatically generating questions for articles by using natural language processing technology is mainly applied to the field of education and teaching, for example, a teacher is helped to generate a series of questions from reading documents to evaluate the comprehension degree of the documents by students, so that the workload of the teacher can be greatly reduced, and more energy of the teacher is put into teaching.
According to the current investigation situation, a mature technology for solving the problem of intelligent problem generation does not exist in the aspect of Chinese language processing; existing english-related problem generation methods can be classified into three categories: a semantic structure based problem generation algorithm, a template based problem generation algorithm, and a sequence based problem generation algorithm.
Problem generation algorithm based on semantic structure: the semantic roles in the sentences mainly include grammatical roles such as implementers, subjects, topics, targets, tools, time, places, predicates and the like, each sentence in the text is composed of the components, the correlation among words in the sentence can be found by identifying the role of each word in the sentence, and Kunichika and Mazidi and the like use the correlation to generate problems. Kunichika et al generated questions about English stories using grammatical relations to test the level of understanding of stories by different people, where the questions were generated from five perspectives: the method comprises the steps of questioning the content of the whole sentence, questioning by utilizing the correspondence between the similar meaning words and the antisense words, questioning according to the relation between time and space, questioning words in a complex form in the sentence and questioning related phrases, but the result shows that the problems generated in the mode have more grammar errors. Mazidi et al add natural language comprehension content on the basis of semantic structure, and firstly, perform grammatical role labeling and semantic role labeling on text content respectively, and synthesize two results, thereby obtaining good effect on problem generation. For example, for a sentence "Xiaoming yesterday night encounters a little red in the park", semantic character labeling is performed first, as shown in Table 1 below:
TABLE 1
Input sequence Xiaoming liquor Yesterday At night In that Park Encounter with To master Small red
Semantic roles Implementer of Time of day Time of day Location of a site Predicate(s) Test subject
In this example sentence, the semantic role is "Xiaoming (implementer) yesterday night (time) meets (predicate) Small Red (Subject) in park (place)", and then, a question similar to "1, who met Small Red in park yesterday night? 2. Will you see when a little red is encountered in the park? 3. Yesterday of xiao you meet who in the park at night? "and the like. However, in such a method for generating questions based on semantic relations, the generated questions are often too trivial, two questions are often constructed from one subject and the implementer, but in reality, we do not pay attention to all the questions.
Problem generation algorithm based on template: fixed types of questions are generated by manually set rules. Mostow et al, in order to satisfy the requirement of generating a specified question from a structured text for students, learned the expression form of a large amount of texts, first designed three templates to generate what, why, how types of questions, the specific templates are as shown in the following Table 2:
TABLE 2
Type of problem Question template
WHAT What did<character><verb>?
HOW How did<character><verb><complement>?
WHY Why was/were<character><past-participle>?
And experiments are carried out on more than five hundred articles, and a certain result is obtained by adopting a manual evaluation mode. Then, Labutov et al extract low-dimensional ontologies from a large number of articles captured from wikipedia, and manually design templates from these ontologies to generate problems, such as (Person, early life) < whow wee the key inflections on < Person > In the child package >. Lindberg et al, which synthesizes semantic structures and templates, first analyzes the fixed existing patterns of sentence structures, and then creates 60 templates according to the patterns to generate problems, thereby achieving good effects.
Sequence-based problem generation algorithms: in the aspect of natural language processing, a cyclic Neural network (RNN) is mainly used for constructing a seq2seq model for solving the problem of time sequence dependence, the problem generation task is regarded as conversion from a sentence to a sentence, a large amount of text data is converted into a vector by using word2vec, and the method can fully utilize the similarity of words to predict the next word in a maximum probability manner until a terminal symbol appears. The recurrent neural network comprises various improved versions, one of which is LSTM (Long-Short term memory), the LSTM mechanism is very sound and flexible to realize, Serban et al use the model to construct a logic triple (comprising subject, correlation and entity) to construct potential problems, the realization of the model needs a large amount of labeled training data, 100000 English data combinations of < text, problem > which are manually labeled are used to train an LSTM network to generate fixed form problems. Xinya Du et al extracted triples of < sentences, questions, answers > in the SQuAD dataset, then trained neural networks from sentence and paragraph levels to solve the problems, and systematically adopted manual and automatic evaluation methods, the whole scheme seems perfect. Mosafazadeh proposes visualization problem generation, i.e. generating a problem from a picture. Based on the MSCOCO picture description data set, Microsoft enriches a large number of data sets, and utilizes a large number of people on Amazon to label problems in the picture, including event-based and physical problems, wherein 75000 problems are contained in three databases, and then trains a seq2seq model to generate the problems. It can be seen that a large and high quality data set is key to the implementation of a character-based problem generation algorithm.
However, the above method has significant disadvantages in processing complex chinese: because the Chinese is complex in structure and contains the expressed article by rendering layer by layer, unlike the educational article which is directly clear, the problem that the lengthy text is generated in the prototype system can not meet the requirement of the user, for example, the repeated occurrence of similar problems directly reduces the satisfaction degree of the generated problem, and the user only wants to be the problem of positioning the key content in the text, so the lengthy text is not a good input for the problem generation; further, outputting the questions after taking the order of generation of the questions as the final output order of the questions, or using a simple sorting method such as linear regression sorting, cannot ensure that each question is sorted in the most appropriate position.
The above background disclosure is only for the purpose of assisting understanding of the inventive concept and technical solutions of the present invention, and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above content is disclosed before the filing date of the present patent application.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an intelligent problem generation method and device based on deep learning.
One of the technical solutions proposed by the present invention to achieve the above object is as follows:
an intelligent question generation method is used for automatically generating questions for input articles and outputting the questions, and comprises the following steps:
s1, extracting key contents of the article by using a seq2seq model;
s2, carrying out syntactic analysis and named entity recognition on each sentence in the key content to establish a corresponding syntax tree of each sentence;
s3, matching the grammar tree with question templates in a question template database established in advance, and if matched question templates exist, converting key sentences corresponding to the grammar tree into question sentences based on the matched question templates so as to generate questions;
and S4, automatically scoring the generated problems by using a problem ranking model based on a neural network architecture and outputting the problems in a ranking mode according to scores.
The method provided by the invention aims to solve the problem that the key content is automatically extracted from a lengthy Chinese article and is displayed in a problem mode aiming at the key content (or abstract) so that a user can decide whether to continue reading the article by browsing the problem, thereby not only saving the reading time of the user, but also increasing the reading interest of the user by displaying the core content of the article in a problem mode. The method provided by the invention can be used for constructing an intelligent auxiliary reading system, namely, a certain specific problem set can be generated for the user, and the user can select an intentional problem from the problem set to read with a destination, so that the reading interest is increased, and the reading quality is improved.
In addition, the technical scheme of the system device provided by the invention based on the method is as follows:
an intelligent question generation device for automatically generating questions for input articles and outputting the questions, comprising: the seq2seq model is used for extracting key contents of the article; a syntax tree construction program for performing syntactic analysis and named entity recognition on each sentence in the key content to establish a syntax tree corresponding to each sentence; a question construction program for matching the grammar tree with question templates in a question template database established in advance and converting sentences corresponding to the grammar tree into question sentences based on the matched question templates to generate questions when the matched question templates exist; and the problem ordering model based on the neural network architecture is used for automatically scoring the generated problems and outputting the problems in a ranking mode according to scores.
By adopting the intelligent problem generating device, some specific problem sets can be generated for the user, and the user can select the intentional problem to read with the destination, so that the reading interest is increased, and the reading quality is improved.
In addition, the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor can implement the steps of the aforementioned intelligent problem generation method.
Drawings
FIG. 1 is a flow chart of a method for generating an intelligent question provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of an intelligent problem generation apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a neural network architecture for a problem ranking model;
FIG. 4 is a schematic diagram of a process for syntactic analysis and named entity recognition of a sentence;
FIG. 5 is a diagram of a syntax tree constructed based on the syntactic analysis of a sentence and the named entity recognition results.
Detailed Description
The invention is further described with reference to the following figures and detailed description of embodiments.
The invention aims to generate a high-quality problem aiming at the key content of an article so as to increase the interest of a user in reading the article and improve the reading quality. To this end, the present invention proposes an intelligent question generation method based on deep learning, by which a question can be automatically generated and output for an input article, and referring to fig. 1, the method includes the following steps S1 to S4:
and step S1, extracting key contents of the input article by using the seq2seq model. Specifically, when receiving an article input, the seq2seq model first calls a basic natural language processing unit to perform text preprocessing, including data cleansing: removing blank spaces at two ends of the text, removing illegal symbols, performing capital and small English conversion and the like; secondly, the sentence is processed into words by using the Chinese character segmentation, the words are processed into word vectors by word nesting by using word2vec of Google open source, and the word vectors can better express the correlation between the words and provide better context information; and then, a frame of the model is built by using a deep learning frame TensorFlow, training is carried out, and after training is carried out, key content can be extracted from word vectors of the article formed by preprocessing. The training process comprises the following steps:
firstly, manually extracting key contents of a plurality of articles for training, and forming a training sample by each training article and the manually extracted key contents thereof so as to establish a training set of a seq2seq model; and then, inputting the training set into a seq2seq model, and continuously carrying out iterative training until the model parameters are converged to obtain the trained seq2seq model. The method comprises the following specific steps:
for an input news text sequence x1,x2,L,xmFirstly, an embedded matrix is used to obtain an embedded representation, and the calculation process is as follows:
Figure GDA0001622932770000061
wherein the content of the first and second substances,
Figure GDA0001622932770000068
is the embedding matrix, l is the dimension of the embedding matrix, V is the size of the entire vocabulary, xiRepresenting a news text sequence x1,x2,L,xmI-th word in (1, 2, …, m). Then, the embedded representation sequence of the text is sequentially input into the coding module to obtain a forward hidden state sequence as follows:
Figure GDA0001622932770000062
wherein the content of the first and second substances,
Figure GDA0001622932770000063
is a k-dimensional vector, LSTMfRepresenting a forward LSTM element. At the same time, to capture the reverse information of the sequenceAnd reversely inputting the sequence into a backward LSTM unit to obtain a backward hidden state sequence as follows:
Figure GDA0001622932770000064
wherein, LSTMbBackward LSTM cells are indicated. By concatenating the forward and backward hidden state representation sequences, a hidden state representation of the news sequence is obtained, as follows:
Figure GDA0001622932770000065
wherein the content of the first and second substances,
Figure GDA0001622932770000069
is the word xiIs shown in hidden state. The method combines the last word x of the news sequencemIs represented by a hidden state ofmHidden state representation h as whole newsc=hm. Through a series of processes of the encoding module, we have converted the news text sequence into a vector representation, which is used as a context vector when the decoding module extracts key sentences. The decoding module adopts a single-layer LSTM network, and a news context vector h obtained by the encoding module is firstly usedcThe decoding module is initialized. Then, for the key sentence sequence y1,y2,...,ynProcessing results in its embedded representation:
Figure GDA0001622932770000066
it is noted that we use the same embedding matrix E in encoding and decoding. Also we input the embedded representation of the key sentence sequence into the decoding module in turn to get its hidden state representation:
Figure GDA0001622932770000067
wherein the content of the first and second substances,
Figure GDA00016229327700000610
is a hidden state representation of the jth word of the key sentence sequence.
In a more preferred embodiment, it is proposed to add an attention mechanism seq2seq + attention on the basis of seq2seq to solve the problem of out-of-vocabularies, so as to make the extracted sentences of the key contents more smooth. The traditional seq2seq adopts a calculation mode of a fixed word list, converts each word in the word list into a word vector through word2vec, inputs the word vector into an LSTM for training, but when a new word appears in a test set, a model cannot be well processed, and a < UNK > is usually used for replacing the new word, so that the output key content contains a < UNK > symbol, and sentences are not smooth and unclear. Which can lead to a reduction in the quality of the problems generated in the present invention. Therefore, the invention adds an attention mechanism on the upper layer of the seq2seq model, when the keyword extraction is carried out in the model, if a word which does not appear in the word list is encountered, the word is copied from the original text to the output result with a certain probability, thereby ensuring that < UNK > does not appear in the prediction process and ensuring the readability of the sentence. The word is copied from the original text to an output result with a certain probability, namely, the new word (possibly a new word which has an error in the final participle during preprocessing or continuously appears in the network) can be directly generated from the model without any output, and the new word can also be directly copied to the output result, so that the problem of limitation of a fixed word list to the new word is solved.
Step S2, processing each sentence in the key content extracted by the seq2seq model in step S1: syntactic analysis and named entity recognition are performed to build a corresponding syntax tree for each sentence. The process of syntactic analysis and named entity recognition of a sentence roughly comprises: firstly, segmenting words of a sentence, then, carrying out part-of-speech tagging on each word, and then, carrying out named entity identification on each word, such as name of a person, name of a institution, name of a place or other entities marked by names. After the syntactic analysis and named entity recognition are completed, a corresponding syntax tree can be built for the sentence. This step will be described later by way of specific examples.
Step S3, after a corresponding syntax tree is established for each sentence in the key content, matching the syntax tree with a question template in a question template database established in advance, and if a matched question template exists, converting the sentence corresponding to the syntax tree into a question sentence based on the matched question template, thereby generating a question. The format of the question template in the database may be as follows (to name just a few):
question template representing "how many": QP < CD ═ number < CLP
Question template representing "number of days": QP < OD ═ number
Causal relationship problem template: ((IP | PP ═ rea) < (because). |, and (IP | PP | VP) < < (IP | PP | VP < (so | then))
Turning relationship problem template: ((IP | PP ═ front.). IP ═ however) | < (IP | PP ═ front (IP | PP | VP ═ however) >))
The symbols therein are from the Stanford Natural language labs definition of the components present in the syntax tree. When a question template is successfully matched with the grammar tree of the current sentence, the current sentence is rewritten into a question sentence based on the question template by using the question template, and thus a question is generated.
The question template database is formed by learning language rules from a large amount of article data to obtain a large amount of question templates. For each sentence in the key content, the respective syntax tree is used for matching a question template in a database, and once matching is successful, the sentence is directly converted into a corresponding question sentence by using the matched question template, so that a corresponding question is generated. When the sentence is not matched with the question template in the current database, the sentence can not generate a question; and (4) making a new problem template by counting sentences which can not generate problems in batches, and updating the problem template database.
In step S4, after the aforementioned steps S1, S2, and S3, the question may be generated for the first input sentence, and this step sorts and outputs the generated question by using a question sorting model based on a neural network architecture, thereby ensuring that each question appears at an appropriate position. Compared with the traditional sorting method, the method for automatically scoring the problems and sorting and outputting the problems by using the neural network has higher accuracy and does not need to manually set parameters. The training process of the problem ranking model comprises the following steps:
firstly, manually scoring a plurality of questions for training generated from a plurality of articles; comprehensively scoring according to language logic of the generated problems and the value degree of the problems, for example, if the logic is strong, the problems are valuable and high in quality, the score of the problems is higher, otherwise, the score is lower; secondly, extracting features of each problem to obtain a feature set, manually grading each feature set and the corresponding problems to form a training sample, obtaining a plurality of training samples, and forming a training set of the problem ranking model; and inputting the training set into a neural network, and continuously carrying out iterative training until the model parameters are converged to obtain a trained problem ordering model. In the training, in a training sample, a feature set of a problem is used as an Input of the neural network, and an artificial score corresponding to the problem is used as an output of the neural network, for example, referring to fig. 3, the artificial score is a schematic diagram of a neural network model of a problem ranking model, an Input layer is a feature set of each problem, for example, when the training sample is a kth problem, the Input layer is a feature set [ P ] of the kth problemk1,Pk2,…,Pk10]And the output layer is the artificial score Q of the kth questionkThereby training the model parameters. After training, extracting a corresponding feature set from the generated problem according to the method, inputting the feature set into the model, automatically outputting the score of the problem, and outputting the score according to the order of the score. The process of manual scoring is replaced, labor force is saved, the scoring accuracy can be improved through continuous learning, and therefore the satisfaction degree of problem sequencing is improved.
When a problem for training is subjected to feature extraction, the extracted features include but are not limited to: generating an N-gram model score of the question, an index position of the answer part in the original text, an importance score of the labeled key sentence (the score can be obtained when the key content is extracted by using a seq2seq model) and basic unit statistical information (such as nouns, verbs and the density of stop words in the question), and a conversion rate of the sentence (how many words in a sentence are converted into the answer of the question). Of course, other more valuable features may also be extracted, and are not limited herein.
The implementation of the foregoing step S2 is described in detail below by way of a specific example:
in one embodiment, the process of syntactic analysis and named entity recognition of sentences refers to fig. 4, for example, the sentence "2014, with an epidemic of ebola virus, results in 1700 deaths of many people worldwide. "first, the word is divided, and the symbol" | "indicates the division. Then, for each word, a part-of-speech tagging label is added, for example, "2014" is labeled as "NT", and is a symbol used for representing a time noun in syntactic analysis; in addition, for example, "NN" represents a common noun, and these symbols are all syntactically general and are not explained one by one here. Syntactic analysis is done at the step "Tagging", followed by named entity recognition at the step "name entry". Then, based on the results obtained in fig. 4, the sentence "2014 was fraught with ebola virus. "build syntax tree, get syntax tree as shown in FIG. 5.
The following example illustrates how to generate a corresponding question based on matching the syntax tree with the question template. With continued reference to fig. 4 and 5, when matching the question template using the syntax tree with the word "cause" and the syntactic analysis label "PP" (preposed phrase), and detecting that the sentence contains "cause" and the syntactic analysis label, the question template with cause-effect relationship can be basically matched, so that a question with cause-effect relationship can be generated, for example, what caused death of 1700 people all over the world in the generation of question "2014? ".
The problem sorting output process is carried out by using the problem sorting model, and the input example is as follows:
at present, the related technology of automatically generating questions for articles by using natural language processing technology is mainly applied to the field of education and teaching, for example, a teacher is helped to generate a series of questions from reading documents to evaluate the comprehension degree of the documents by students, so that the workload of the teacher can be greatly reduced, and more energy of the teacher is put into teaching. According to the current investigation situation, a mature technology for solving the problem of intelligent problem generation does not exist in the aspect of Chinese language processing; existing english-related problem generation methods can be classified into three categories: a semantic structure based problem generation algorithm, a template based problem generation algorithm, and a sequence based problem generation algorithm.
With the foregoing intelligent problem generation method of the present invention adopted for the above example, the following 2 problems can be generated (but this is only an example and does not represent that only these 2 types of problems can be generated):
1. what can a teacher be helped by a correlation technique that automatically generates questions for an article using natural language processing techniques?
2. What kind of english-language question generation method is currently available?
3. What are currently relevant techniques for automatically generating questions for articles using natural language processing techniques?
4. According to the current research situation, whether there is a mature technology in the aspect of Chinese language processing to solve the problem of intelligent problem generation?
For the 4 questions, the order of generation is, for example, the above-mentioned order, however, the present invention does not tend to output simply according to the order of generation, but input the 4 questions into the aforementioned question ordering model of the present invention, which can score each sentence after the aforementioned training and output according to the score. For example, if the score of question 4 is the highest, then question 4 is ranked first when the question is output, so that the user can see the highest quality question first.
Another embodiment of the present invention provides a question generation apparatus (or system) based on the aforementioned question generation method, to automatically generate and output a question for an input article. Referring to fig. 2, the question generating means includes a key content extracting means 10, a question constructing means 20, and a question ranking output means 30. The key content extracting apparatus 10 is a seq2seq model trained in advance (the training method and process thereof are described above), and is used for extracting the key content of the article. The question constructing device 20 is mainly implemented by a syntax tree constructing program and a question constructing program, wherein the syntax tree constructing program is used for carrying out syntactic analysis and named entity recognition on each sentence in the key content so as to establish a corresponding syntax tree of each sentence; and the question construction program is used for matching the grammar tree with question templates in a question template database established in advance, and converting sentences corresponding to the grammar tree into question sentences based on the matched question templates to generate questions when the matched question templates exist. The problem ranking output device 30 is implemented by the trained problem ranking model based on the neural network, and can automatically score the generated problem and rank the generated problem according to the score by inputting the feature set of the problem into the problem ranking model. The problem generating device provided by the present invention may be presented in the form of a user mobile terminal application, and may also be implemented on a web page or computer software, which is not limited by the present invention.
In addition, the present invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor can implement the steps of the aforementioned intelligent problem generation method.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims (10)

1. An intelligent question generation method is used for automatically generating questions for input articles and outputting the questions, and is characterized by comprising the following steps:
s1, extracting key contents of the article by using a seq2seq model;
s2, carrying out syntactic analysis and named entity recognition on each sentence in the key content to establish a corresponding syntax tree of each sentence;
s3, matching the grammar tree with question templates in a question template database established in advance, and if matched question templates exist, converting sentences corresponding to the grammar tree into question sentences based on the matched question templates so as to generate questions; when the grammar tree of the sentence is not matched with all question templates in the question template database, the sentence can not generate a question; counting sentences which can not generate problems so as to formulate a new problem template and update the problem template database;
and S4, automatically scoring the generated problems by using a problem ranking model based on a neural network architecture and outputting the problems in a ranking mode according to scores.
2. The intelligent problem generation method of claim 1, wherein: the seq2seq model is obtained by training as follows:
manually extracting key contents of a plurality of articles for training, and forming a training sample by each training article and the manually extracted key contents thereof to establish a training set of a seq2seq model; and inputting the training set into a seq2seq model, and continuously carrying out iterative training until the model parameters are converged to obtain the trained seq2seq model.
3. The intelligent problem generation method of claim 2, wherein: the seq2seq model further has an attention mechanism, which is used for directly generating or directly copying the words which do not appear in the pre-established fixed word list as a part of the key content when the key content is extracted in step S1, so as to improve the readability of the generated sentence.
4. The intelligent problem generation method of claim 1, wherein: establishing the question template database by learning language rules from article data; and when the grammar tree of the sentence is matched with a certain question template in the question template database, directly converting the sentence into a question sentence by using the certain question template to generate a corresponding question.
5. The intelligent problem generation method of claim 1, wherein: the training of the problem ranking model comprises:
manually scoring a plurality of questions for training generated from a plurality of articles respectively; extracting features of each problem used for training to obtain a feature set, manually grading each feature set and the corresponding problem to form a training sample, obtaining a plurality of training samples, and forming a training set of the problem ranking model; and training the neural network model by using the training set, and performing continuous iterative training until the model parameters are converged to obtain a trained problem ordering model.
6. The intelligent problem generation method of claim 5, wherein: when the manual scoring is carried out on the questions for training, the scoring is comprehensively carried out according to the language logic of the questions and the value degree of the questions.
7. An intelligent question generation device for automatically generating questions for input articles and outputting the questions, comprising:
the seq2seq model is used for extracting key contents of the article;
a syntax tree construction program for performing syntactic analysis and named entity recognition on each sentence in the key content to establish a syntax tree corresponding to each sentence;
a question construction program for matching the grammar tree with question templates in a question template database established in advance and converting sentences corresponding to the grammar tree into question sentences based on the matched question templates to generate questions when the matched question templates exist; when no matched problem template exists, generating no problem for the sentence corresponding to the grammar tree, making a new problem template by batch statistics of the sentences which can not generate the problem, and updating the new problem template to the problem template database;
and the problem ordering model based on the neural network architecture is used for automatically scoring the generated problems and outputting the problems in a ranking mode according to scores.
8. The intelligent problem generation apparatus of claim 7, wherein: the seq2seq model also has an attention mechanism, which is used for directly generating or directly copying the words which do not appear in the pre-established fixed word list into an output result as a part of the key content when the key content is extracted.
9. The intelligent problem generation apparatus of claim 7, wherein: the problem ordering model is a pre-trained neural network model; the training set adopted when the neural network model is trained comprises a plurality of training samples, and each training sample consists of a feature set used for training a problem and an artificial score of the feature set.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program may, when executed by a processor, implement the steps of the method of any one of claims 1 to 6.
CN201810068857.5A 2018-01-24 2018-01-24 Intelligent problem generation method and device and computer readable storage medium Active CN108363743B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810068857.5A CN108363743B (en) 2018-01-24 2018-01-24 Intelligent problem generation method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810068857.5A CN108363743B (en) 2018-01-24 2018-01-24 Intelligent problem generation method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN108363743A CN108363743A (en) 2018-08-03
CN108363743B true CN108363743B (en) 2020-06-02

Family

ID=63006763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810068857.5A Active CN108363743B (en) 2018-01-24 2018-01-24 Intelligent problem generation method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN108363743B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446519B (en) * 2018-10-10 2020-05-22 西安交通大学 Text feature extraction method fusing data category information
CN109657041B (en) * 2018-12-04 2023-09-29 南京理工大学 Deep learning-based automatic problem generation method
CN111368536A (en) * 2018-12-07 2020-07-03 北京三星通信技术研究有限公司 Natural language processing method, apparatus and storage medium therefor
CN109726274B (en) * 2018-12-29 2021-04-30 北京百度网讯科技有限公司 Question generation method, device and storage medium
CN110196975A (en) * 2019-02-27 2019-09-03 北京金山数字娱乐科技有限公司 Problem generation method, device, equipment, computer equipment and storage medium
CN110209766B (en) * 2019-05-23 2021-01-29 招商局金融科技有限公司 Data display method, electronic device and storage medium
CN110162615B (en) * 2019-05-29 2021-08-24 北京市律典通科技有限公司 Intelligent question and answer method and device, electronic equipment and storage medium
CN110263312B (en) * 2019-06-19 2023-09-12 北京百度网讯科技有限公司 Article generating method, apparatus, server and computer readable medium
CN111124414B (en) * 2019-12-02 2024-02-06 东巽科技(北京)有限公司 Abstract grammar tree word-taking method based on operation link
CN111061851B (en) * 2019-12-12 2023-08-08 中国科学院自动化研究所 Question generation method and system based on given facts
CN111428467A (en) * 2020-02-19 2020-07-17 平安科技(深圳)有限公司 Method, device, equipment and storage medium for generating reading comprehension question topic
CN111339269B (en) * 2020-02-20 2023-09-26 来康科技有限责任公司 Knowledge graph question-answering training and application service system capable of automatically generating templates
CN111522921B (en) * 2020-03-06 2023-06-02 国网浙江省电力有限公司营销服务中心 Data enhancement method for end-to-end dialogue based on sentence rewriting
CN112417885A (en) * 2020-11-17 2021-02-26 平安科技(深圳)有限公司 Answer generation method and device based on artificial intelligence, computer equipment and medium
CN112417119A (en) * 2020-11-19 2021-02-26 上海交通大学 Open domain question-answer prediction method based on deep learning
CN113268564B (en) * 2021-05-24 2023-07-21 平安科技(深圳)有限公司 Method, device, equipment and storage medium for generating similar problems
CN116205234A (en) * 2023-04-24 2023-06-02 中国电子科技集团公司第二十八研究所 Text recognition and generation algorithm based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010107113A (en) * 2000-05-25 2001-12-07 서정연 Reduction of Natural Language Queries into Boolen and Vector Queries Using Syntactic Tree in a Natural Language Information Retrieval System
CN102737042A (en) * 2011-04-08 2012-10-17 北京百度网讯科技有限公司 Method and device for establishing question generation model, and question generation method and device
CN105760546A (en) * 2016-03-16 2016-07-13 广州索答信息科技有限公司 Automatic generating method and device for Internet headlines
CN106815311A (en) * 2016-12-21 2017-06-09 杭州朗和科技有限公司 A kind of problem matching process and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10614126B2 (en) * 2015-05-21 2020-04-07 Oracle International Corporation Textual query editor for graph databases that performs semantic analysis using extracted information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20010107113A (en) * 2000-05-25 2001-12-07 서정연 Reduction of Natural Language Queries into Boolen and Vector Queries Using Syntactic Tree in a Natural Language Information Retrieval System
CN102737042A (en) * 2011-04-08 2012-10-17 北京百度网讯科技有限公司 Method and device for establishing question generation model, and question generation method and device
CN105760546A (en) * 2016-03-16 2016-07-13 广州索答信息科技有限公司 Automatic generating method and device for Internet headlines
CN106815311A (en) * 2016-12-21 2017-06-09 杭州朗和科技有限公司 A kind of problem matching process and device

Also Published As

Publication number Publication date
CN108363743A (en) 2018-08-03

Similar Documents

Publication Publication Date Title
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
CN108287822B (en) Chinese similarity problem generation system and method
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
CN104794169B (en) A kind of subject terminology extraction method and system based on sequence labelling model
CN110134954B (en) Named entity recognition method based on Attention mechanism
CN110851599B (en) Automatic scoring method for Chinese composition and teaching assistance system
CN106569998A (en) Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN111274790B (en) Chapter-level event embedding method and device based on syntactic dependency graph
CN112905736A (en) Unsupervised text emotion analysis method based on quantum theory
CN111368082A (en) Emotion analysis method for domain adaptive word embedding based on hierarchical network
CN111581364B (en) Chinese intelligent question-answer short text similarity calculation method oriented to medical field
CN113704416A (en) Word sense disambiguation method and device, electronic equipment and computer-readable storage medium
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN116049387A (en) Short text classification method, device and medium based on graph convolution
CN110222344B (en) Composition element analysis algorithm for composition tutoring of pupils
CN115544252A (en) Text emotion classification method based on attention static routing capsule network
CN109815497B (en) Character attribute extraction method based on syntactic dependency
CN112417155B (en) Court trial query generation method, device and medium based on pointer-generation Seq2Seq model
CN114547303A (en) Text multi-feature classification method and device based on Bert-LSTM
CN111815426B (en) Data processing method and terminal related to financial investment and research
CN111159405B (en) Irony detection method based on background knowledge
CN113011154A (en) Job duplicate checking method based on deep learning
CN112349294A (en) Voice processing method and device, computer readable medium and electronic equipment
CN112632272A (en) Microblog emotion classification method and system based on syntactic analysis
CN116108840A (en) Text fine granularity emotion analysis method, system, medium and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant