CN108363743A - A kind of intelligence questions generation method, device and computer readable storage medium - Google Patents
A kind of intelligence questions generation method, device and computer readable storage medium Download PDFInfo
- Publication number
- CN108363743A CN108363743A CN201810068857.5A CN201810068857A CN108363743A CN 108363743 A CN108363743 A CN 108363743A CN 201810068857 A CN201810068857 A CN 201810068857A CN 108363743 A CN108363743 A CN 108363743A
- Authority
- CN
- China
- Prior art keywords
- question
- training
- sentence
- model
- questions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000004458 analytical method Methods 0.000 claims abstract description 17
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims description 47
- 238000004590 computer program Methods 0.000 claims description 4
- 238000003062 neural network model Methods 0.000 claims 3
- 238000000605 extraction Methods 0.000 abstract description 3
- 238000004422 calculation algorithm Methods 0.000 description 10
- 239000013598 vector Substances 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 8
- 238000003058 natural language processing Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 239000010410 layer Substances 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 241001115402 Ebolavirus Species 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000034994 death Effects 0.000 description 2
- 231100000517 death Toxicity 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000000692 anti-sense effect Effects 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses intelligence questions generation method, device and computer readable storage mediums, automatically generate problem for the article to input and export, include the following steps:S1, the extraction for carrying out key content to the article using seq2seq models;S2, syntactic analysis and name Entity recognition are carried out to each sentence in the key content, to establish the corresponding syntax tree of each sentence;S3, it is matched with the problems in template database template the problem of pre-establishing using the syntax tree, if there is the template that matches, sentence corresponding with the syntax tree is then converted into the interrogative sentence based on the problem of matching template, to generate problem;S4, it is exported after being ranked up using neural network to the described problem of generation.
Description
Technical Field
The invention relates to the technical field of computers and natural language processing, in particular to an intelligent problem generation method based on deep learning and a corresponding device.
Background
With the rapid development of computer networks, more and more information data are available on the networks, and users cannot find their interest points until all information is completely read. Currently, most of screening reading of articles, documents and the like on the network still determines whether to continue reading the articles by browsing titles, but the method has the defects that: many titles do not accurately and comprehensively reflect the core content of the articles, resulting in the user not being able to accurately find the articles of interest or missing the articles of interest. Therefore, we can consider the relevant technology of natural language processing to concentrate the article content into several relevant questions, and attract the reading interest of the user in the form of questions, when there are just topics that the user cares about or is interested in the questions of the article, the user can be attracted to find answers in the article, so that the reading interest of the user can be greatly increased.
At present, the related technology of automatically generating questions for articles by using natural language processing technology is mainly applied to the field of education and teaching, for example, a teacher is helped to generate a series of questions from reading documents to evaluate the comprehension degree of the documents by students, so that the workload of the teacher can be greatly reduced, and more energy of the teacher is put into teaching.
According to the current investigation situation, a mature technology for solving the problem of intelligent problem generation does not exist in the aspect of Chinese language processing; existing english-related problem generation methods can be classified into three categories: semantic structure based problem generation algorithms, template based problem generation algorithms and sequence based problem generation algorithms.
Problem generation algorithm based on semantic structure: the semantic roles in the sentences mainly include grammatical roles such as implementers, subjects, topics, targets, tools, time, places, predicates and the like, each sentence in the text is composed of the components, the correlation among words in the sentence can be found by identifying the role of each word in the sentence, and Kunichika and Mazidi and the like use the correlation to generate problems. Kunichika et al generated questions about English stories using grammatical relations to test the level of understanding of stories by different people, where the questions were generated from five perspectives: the method comprises the steps of questioning the content of the whole sentence, questioning by utilizing the correspondence between the similar meaning words and the antisense words, questioning according to the relation between time and space, questioning words in a complex form in the sentence and questioning related phrases, but the result shows that the problems generated in the mode have more grammar errors. Mazidi et al add natural language comprehension content on the basis of semantic structure, and perform grammatical role labeling and semantic role labeling on text content respectively to synthesize two results, thereby achieving good effect on problem generation. For example, for a sentence "Xiaoming yesterday night encounters a little red in the park", semantic character labeling is performed first, as shown in Table 1 below:
TABLE 1
Input sequence | Xiaoming liquor | Yesterday | At night | In that | Park | Encounter with | To master | Small red | 。 |
Semantic roles | Implementer of | Time of day | Time of day | Location of a site | Predicate(s) | Test subject |
In this example sentence, the semantic role is "Xiaoming (implementer) yesterday night (time) meets (predicate) Small Red (Subject) in park (place)", and then, a similar problem is generated according to the correlation between the roles "1, who met Small Red in park yesterday night? 2. Will you see when a little red is encountered in the park? 3. Yesterday of xiao you meet who in the park at night? "and the like. However, in such a method for generating questions based on semantic relations, the generated questions are often too trivial, two questions are constructed from one relation between the subject and the implementer, but in reality, we do not pay attention to all the questions.
Problem generation algorithm based on template: fixed types of questions are generated by manually set rules. To meet the requirement of generating a specified problem from a structured text for students, Mostow et al learns the expression forms of a large number of texts, and firstly designs three templates to generate what, what and how types of problems, wherein the specific templates are as shown in table 2 below:
TABLE 2
Type of problem | Question template |
WHAT | Whatdid<character><verb>? |
HOW | Howdid<character><verb><complement>? |
WHY | Whywas/were<character><past-participle>? |
And experiments are carried out on more than five hundred articles, and a certain result is obtained by adopting a manual evaluation mode. Then, Labutov et al extracts low-dimensional ontologies from a large number of articles captured from wikipedia, and then manually designs templates from these ontologies to generate problems, such as (Person, early life) < whow wee the key inflections on < Person > In the child package >. Lindberg et al, which synthesizes semantic structures and templates, first analyzes the fixed existing patterns of sentence structures, and then creates 60 templates according to the patterns to generate problems, thereby achieving good effects.
Sequence-based problem generation algorithms: in the aspect of natural language processing, a cyclic Neural network (RNN) is mainly used for constructing a seq2seq model to solve the problem of time sequence dependence, the problem generation task is regarded as conversion from a sentence to a sentence, a large amount of text data is converted into a vector by using word2vec, and the method can fully utilize the similarity of words to predict the next word in a maximum probability manner until a terminal symbol appears. The recurrent neural network comprises various improved versions, one of which is LSTM (Long-Short term memory), the LSTM mechanism is very sound and flexible to realize, Serban et al use the model to construct a logic triple (comprising subject, correlation and entity) to construct potential problems, the realization of the model needs a large amount of labeled training data, 100000 English data combinations of < text, problem > which are manually labeled are used to train an LSTM network to generate fixed form problems. Xinya Du et al extracted triples of < sentences, questions, answers > in the SQuAD dataset, then trained neural networks from sentence and paragraph levels to solve the problems, and systematically adopted manual and automatic evaluation methods, the whole scheme seems perfect. Mosafazadeh proposes visualization problem generation, i.e. generating a problem from a picture. Based on the MSCOCO picture description data set, Microsoft enriches a large number of data sets, and utilizes a large number of personnel on Amazon to label problems in pictures, including event-based and physical problems, wherein 75000 problems are contained in three databases, and then trains a seq2seq model to generate the problems. It can be seen that a large and high quality data set is key to the implementation of character-based problem generation algorithms.
However, the above method has significant disadvantages in processing complex chinese: because the Chinese language has a complex structure and contains the expression articles by layer-by-layer rendering, unlike the educational articles, the problems generated by the lengthy texts in the prototype system can not meet the requirements of the people, for example, the repeated occurrence of similar problems directly reduces the satisfaction degree of the generated problems, and the people want only the problems of positioning to the key contents in the texts, so the lengthy texts are not good inputs for the problem generation; further, outputting the questions after taking the order of generation of the questions as the final output order of the questions, or using a simple sorting method such as linear regression sorting, cannot ensure that each question is sorted in the most appropriate position.
The above background disclosure is only for the purpose of assisting understanding of the inventive concept and technical solutions of the present invention, and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above contents are disclosed before the filing date of the present patent application.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an intelligent problem generation method and device based on deep learning.
One of the technical solutions proposed by the present invention to achieve the above object is as follows:
an intelligent question generation method is used for automatically generating questions for input articles and outputting the questions, and comprises the following steps:
s1, extracting key contents of the article by using a seq2seq model;
s2, carrying out syntactic analysis and named entity recognition on each sentence in the key content to establish a corresponding syntax tree of each sentence;
s3, matching the grammar tree with question templates in a question template database established in advance, and if matched question templates exist, converting key sentences corresponding to the grammar tree into question sentences based on the matched question templates so as to generate questions;
and S4, automatically scoring the generated problems by using a problem ranking model based on a neural network architecture and outputting the problems in a ranking mode according to scores.
The method provided by the invention aims to automatically extract the key content from the lengthy Chinese article and display the key content (or abstract) in a problem mode so that the user can decide whether to continue reading the article by browsing the problem, thereby saving the reading time of the user and increasing the reading interest of the user by displaying the core content of the article in a problem mode. The method provided by the invention can be used for constructing an intelligent auxiliary reading system, namely, a certain specific problem set can be generated for the user, and the user can select an intentional problem from the problem set to read with a destination, so that the reading interest is increased, and the reading quality is improved.
In addition, the technical scheme of the system device provided by the invention based on the method is as follows:
an intelligent question generation device for automatically generating questions for input articles and outputting the questions comprises: the seq2seq model is used for extracting key contents of the article; a syntax tree construction program for performing syntactic analysis and named entity recognition on each sentence in the key content to establish a syntax tree corresponding to each sentence; a question construction program for matching the grammar tree with question templates in a question template database established in advance and converting sentences corresponding to the grammar tree into question sentences based on the matched question templates to generate questions when the matched question templates exist; and the problem ordering model based on the neural network architecture is used for automatically scoring the generated problems and outputting the problems in a ranking mode according to scores.
By adopting the intelligent problem generating device, some specific problem sets can be generated for the user, and the user can select the intentional problem to read with the destination, so that the reading interest is increased, and the reading quality is improved.
In addition, the present invention further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor can implement the steps of the aforementioned intelligent problem generation method.
Drawings
FIG. 1 is a flow chart of a method for generating an intelligent question provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of an intelligent problem generation apparatus according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a neural network architecture for a problem ranking model;
FIG. 4 is a schematic diagram of a process for syntactic analysis and named entity recognition of a sentence;
FIG. 5 is a diagram of a syntax tree constructed based on the syntactic analysis of a sentence and the named entity recognition results.
Detailed Description
The invention is further described with reference to the following figures and detailed description of embodiments.
The invention aims to generate a high-quality problem aiming at the key content of an article so as to increase the interest of a user in reading the article and improve the reading quality. To this end, the present invention proposes an intelligent question generation method based on deep learning, by which a question can be automatically generated and output for an input article, and referring to fig. 1, the method includes the following steps S1 to S4:
and step S1, extracting key contents of the input article by using the seq2seq model. Specifically, when receiving an article input, the seq2seq model firstly calls a basic natural language processing unit to perform text preprocessing, including data cleansing: removing blank spaces at two ends of the text, removing illegal symbols, performing English size writing conversion and the like; secondly, the sentence is processed into words by using the Chinese character segmentation, the words are processed into word vectors by word nesting by using word2vec of Google open source, and the word vectors can better express the correlation between the words and provide better context information; and then, a frame of the model is built by using a deep learning frame TensorFlow, training is carried out, and after training is carried out, key content can be extracted from word vectors of the article formed by preprocessing. The training process comprises the following steps:
firstly, manually extracting key contents of a plurality of articles for training, and forming a training sample by each training article and the manually extracted key contents thereof so as to establish a training set of a seq2seq model; and then, inputting the training set into a seq2seq model, and continuously carrying out iterative training until model parameters are converged to obtain the trained seq2seq model. The method comprises the following specific steps:
for an input news text sequence x1,x2,…,xmFirstly, an embedded expression is obtained through an embedded matrix, and the calculation process is as follows:
wherein,is the embedding matrix, l is the dimension of the embedding matrix, V is the size of the entire vocabulary, xiRepresenting a news text sequence x1,x2,…,xmI-th word in (1, 2, …, m). Then, the embedded representation sequence of the text is sequentially input into the coding module to obtain a forward hidden state sequence as follows:
wherein,is a k-dimensional vector, LSTMfRepresenting a forward LSTM element. Meanwhile, in order to capture the reverse information of the sequence, the sequence is reversely input into a backward LSTM unit, and the backward hidden state sequence is obtained as follows:
wherein, LSTMbBackward LSTM cells are indicated. Concatenating the forward and backward hidden state representation sequences results in a hidden state representation of the news sequence, as follows:
wherein,is the word xiIs shown in hidden state. The method combines the last word x of the news sequencemIs represented by a hidden state ofmHidden state representation h as whole newsc=hm. Through a series of processes of the encoding module, we have converted the news text sequence into a vector representation, which is used as a context vector when the decoding module extracts key sentences. The decoding module adopts a single-layer LSTM network, and the news context vector h obtained by the encoding module is firstly usedcThe decoding module is initialized. Then, for the key sentence sequence y1,y2,...,ynProcessing results in its embedded representation:
it is noted that we use the same embedding matrix E in encoding and decoding. Also we input the embedded representation of the key sentence sequence into the decoding module in turn to get its hidden state representation:
wherein,is a hidden state representation of the jth word of the key sentence sequence.
In a more preferred embodiment, it is proposed to add an attention mechanism seq2seq + attention on the basis of seq2seq to solve the problem of out-of-vocabularies, making the extracted sentences of the key content more smooth. The traditional seq2seq adopts a calculation mode of a fixed word list, converts each word in the word list into a word vector through word2vec, inputs the word vector into an LSTM for training, but when a new word appears in a test set, a model cannot be well processed, and a < UNK > is usually used for replacing the new word, so that the output key content contains a < UNK > symbol, and sentences are not smooth and unclear. Which can lead to a reduction in the quality of the problems generated in the present invention. Therefore, the invention adds an attention mechanism on the upper layer of the seq2seq model, when the keyword extraction is carried out in the model, if a word which does not appear in the word list is encountered, the word is copied from the original text to the output result with a certain probability, thereby ensuring that < UNK > does not appear in the prediction process and ensuring the readability of the sentence. The word is copied from the original text to an output result with a certain probability, namely, the new word (which may be a new word with errors in the preceding segmentation or continuously appearing in the network) can be directly generated from the model without any output, and the new word can also be directly copied to the output result, so that the problem of limitation of a fixed word list to the new word is overcome.
Step S2, processing each sentence in the key content extracted by the seq2seq model in step S1: syntactic analysis and named entity recognition are performed to build a corresponding syntax tree for each sentence. The process of syntactic analysis and named entity recognition of a sentence roughly comprises: firstly, segmenting words of a sentence, then, carrying out part-of-speech tagging on each word, and then, carrying out named entity recognition on each word, such as name of a person, name of a institution, name of a place or other entities marked by names. After syntactic analysis and named entity recognition are completed, a corresponding syntax tree can be established for the sentence. This step will be described later by way of specific examples.
Step S3, after a corresponding syntax tree is established for each sentence in the key content, matching the syntax tree with a question template in a question template database established in advance, and if a matched question template exists, converting the sentence corresponding to the syntax tree into a question sentence based on the matched question template, thereby generating a question. The format of the question template in the database may be as follows (to name a few):
question template representing "how many": QP < CD ═ number < CLP
Question template representing "number of days": QP < OD ═ number
Causal relationship problem template: ((IP | PP ═ rea) < (because). |, and (IP | PP | VP) < < (IP | PP | VP < (so | then))
Turning relationship problem template: ((IP | PP ═ front.). IP ═ however) | < (IP | PP ═ front (IP | PP | VP ═ however) >))
The symbols therein are from the Stanford Natural language labs definition of the components present in the syntax tree. When a question template is successfully matched with the grammar tree of the current sentence, the current sentence is rewritten into a question sentence based on the question template by using the question template, and thus a question is generated.
The question template database is formed by learning language rules from a large amount of article data to obtain a large amount of question templates. For each sentence in the key content, the respective syntax tree is used for matching a question template in a database, and once matching is successful, the sentence is directly converted into a corresponding question sentence by using the matched question template, so that a corresponding question is generated. When the sentence is not matched with the question template in the current database, the sentence can not generate a question; and (4) making a new problem template by counting sentences which can not generate problems in batches, and updating the problem template database.
In step S4, after the aforementioned steps S1, S2, and S3, the question may be generated for the first input article, and this step sorts the generated questions by using a question sorting model based on a neural network architecture and outputs the sorted questions, thereby ensuring that each question appears at an appropriate position. Compared with the traditional sorting method, the method for automatically scoring the problems and sorting and outputting the problems by using the neural network has higher accuracy and does not need to manually set parameters. The training process of the problem ranking model comprises the following steps:
firstly, manually scoring a plurality of questions for training generated from a plurality of articles; comprehensively scoring according to language logic of the generated questions and the value degree of the questions, for example, if the logic is strong, the questions are valuable and high in quality, the score of the questions is higher, otherwise, the score is lower; secondly, extracting features of each problem to obtain a feature set, manually grading each feature set and the corresponding problems to form a training sample, obtaining a plurality of training samples, and forming a training set of the problem ranking model; and inputting the training set into a neural network, and continuously carrying out iterative training until the model parameters are converged to obtain a trained problem ordering model. Wherein, in training, in a training sample, the feature set of the problem is used as the input of the neural network, and the artificial score corresponding to the problem is used as the output of the neural network, for example, referring to fig. 3, the model diagram of the neural network of the problem ranking model, the input layerThe Input layer is a feature set of each question, for example, when the training sample is the kth question, the Input layer is the feature set [ P ] of the kth questionk1,Pk2,…,Pk10]And the output layer is the artificial score Q of the kth questionkThereby training the model parameters. After training, extracting a corresponding feature set from the generated problem according to the method, inputting the feature set into the model, and outputting the score of the problem automatically according to the order of the score. The method replaces the manual scoring process, not only saves labor power, but also can improve the scoring accuracy through continuous learning, thereby improving the satisfaction degree of problem sequencing.
When a problem for training is subjected to feature extraction, the extracted features include, but are not limited to: generating the N-gram model score of the question, the index position of the answer part in the original text, the importance score of the labeled key sentence (which can be obtained when the key content is extracted by using the seq2seq model) and basic unit statistical information (such as nouns, verbs and the density of stop words in the question), and the conversion rate of the sentence (how many words in a sentence are converted into the answer of the question). Of course, other more valuable features may also be extracted, and are not limited herein.
The implementation of the foregoing step S2 is described in detail below by way of a specific example:
in one embodiment, the process of syntactic analysis and named entity recognition of sentences refers to fig. 4, for example, the sentence "2014, with an epidemic of ebola virus, results in 1700 deaths of many people worldwide. "first, the word is divided, and the symbol" | "indicates the division. Then, for each word, a part-of-speech tagging label is added, for example, "2014" is labeled as "NT", and is a symbol used for representing a time noun in syntactic analysis; in addition, for example, "NN" represents a common noun, and these symbols are common to syntactic analysis and are not explained here. Syntactic analysis is done at the step "Tagging", followed by named entity recognition at the step "nameentity". Then, based on the results obtained in fig. 4, the sentence "2014" was fraught with ebola virus. "build syntax tree, get syntax tree as shown in FIG. 5.
The following example illustrates how to generate a corresponding question based on matching the syntax tree with the question template. With continued reference to fig. 4 and 5, when matching the question templates using the syntax tree with the word "cause" and the syntactic analysis label "PP" (preposed phrase), and detecting that the sentence contains "cause" and the syntactic analysis label, the question template with cause-effect relationship can be basically matched, so that a question with cause-effect relationship can be generated, for example, what caused death of many people 1700 worldwide in the generation of question "2014? ".
The problem sorting output process is carried out by using the problem sorting model, and the input example is as follows:
at present, the related technology of automatically generating questions for articles by using natural language processing technology is mainly applied to the field of education and teaching, for example, a teacher is helped to generate a series of questions from reading documents to evaluate the comprehension degree of the documents by students, so that the workload of the teacher can be greatly reduced, and more energy of the teacher is put into teaching. According to the current investigation situation, a mature technology for solving the problem of intelligent problem generation does not exist in the aspect of Chinese language processing; existing english-related problem generation methods can be classified into three categories: a problem generation algorithm based on semantic structures, a problem generation algorithm based on templates and a problem generation algorithm based on sequences.
With the foregoing intelligent problem generation method of the present invention adopted for the above example, the following 2 problems can be generated (but this is only an example and does not represent that only these 2 types of problems can be generated):
1. a correlation technique for automatically generating questions for articles using natural language processing techniques may help teachers do so?
2. What kind of english-language question generation method is currently available?
3. What are currently relevant techniques for automatically generating questions for articles using natural language processing techniques?
4. According to the current research situation, whether there is a mature technology in the aspect of Chinese language processing to solve the problem of intelligent problem generation?
For the 4 questions, the order of generation is, for example, the above-mentioned order, however, the present invention does not tend to output simply according to the order of generation, but input the 4 questions into the aforementioned question ordering model of the present invention, which can score each sentence after the aforementioned training and output according to the score. For example, if the score of question 4 is the highest, then question 4 is ranked first when the question is output, so that the user can see the highest quality question first.
Another embodiment of the present invention provides a question generation apparatus (or system) based on the aforementioned question generation method, to automatically generate and output a question for an input article. Referring to fig. 2, the question generating means includes a key content extracting means 10, a question constructing means 20, and a question ranking output means 30. The key content extracting apparatus 10 is a seq2seq model trained in advance (the training method and process thereof are described above), and is used for extracting the key content of the article. The question constructing device 20 is mainly implemented by a syntax tree constructing program and a question constructing program, wherein the syntax tree constructing program is used for carrying out syntactic analysis and named entity recognition on each sentence in the key content so as to establish a corresponding syntax tree of each sentence; and the question construction program is used for matching the grammar tree with a question template in a pre-established question template database, and converting sentences corresponding to the grammar tree into question sentences based on the matched question template to generate questions when the matched question template exists. The question ranking output device 30 is realized by the trained question ranking model based on the neural network, and can automatically score the questions and rank the questions according to the scores by inputting the feature sets of the generated questions into the question ranking model. The problem generating device provided by the present invention can be presented in the form of a user mobile terminal application, and can also be implemented on a web page or computer software, which is not limited by the present invention.
In addition, the present invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor can implement the steps of the aforementioned intelligent problem generation method.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.
Claims (10)
1. An intelligent question generation method is used for automatically generating questions for input articles and outputting the questions, and is characterized by comprising the following steps:
s1, extracting key contents of the article by using a seq2seq model;
s2, carrying out syntactic analysis and named entity recognition on each sentence in the key content to establish a corresponding syntax tree of each sentence;
s3, matching the grammar tree with question templates in a question template database established in advance, and if matched question templates exist, converting sentences corresponding to the grammar tree into question sentences based on the matched question templates so as to generate questions;
and S4, automatically scoring the generated problems by using a problem ranking model based on a neural network architecture and outputting the problems in a ranking mode according to scores.
2. The intelligent problem generation method of claim 1, wherein: the seq2seq model is obtained by training as follows:
manually extracting key contents of a plurality of articles for training, and forming a training sample by each training article and the manually extracted key contents thereof to establish a training set of a seq2seq model; and inputting the training set into a seq2seq model, and continuously carrying out iterative training until the model parameters are converged to obtain the trained seq2seq model.
3. The intelligent problem generation method of claim 2, wherein: the seq2seq model further has an attention mechanism, which is used for directly generating or directly copying the words which do not appear in the pre-established fixed word list as a part of the key content when the key content is extracted in step S1, so as to improve the readability of the generated sentence.
4. The intelligent problem generation method of claim 1, wherein: establishing the question template database by learning language rules from article data; when the grammar tree of the sentence is matched with a certain question template in the question template database, the sentence is directly converted into a question sentence by utilizing the certain question template to generate a corresponding question; when the grammar tree of the sentence is not matched with all question templates in the question template database, the sentence can not generate a question; and counting sentences which can not generate the question so as to formulate a new question template and update the question template database.
5. The intelligent problem generation method of claim 1, wherein: the training of the problem ranking model comprises:
manually scoring a plurality of questions for training generated from a plurality of articles respectively; extracting features of each problem used for training to obtain a feature set, manually grading each feature set and the corresponding problem to form a training sample, obtaining a plurality of training samples, and forming a training set of the problem ranking model; and training the neural network model by using the training set, and performing continuous iterative training until the model parameters are converged to obtain a trained problem ordering model.
6. The intelligent problem generation method of claim 5, wherein: when the manual scoring is carried out on the questions for training, the scoring is comprehensively carried out according to the language logic of the questions and the value degree of the questions.
7. An intelligent question generation device for automatically generating questions for input articles and outputting the questions, comprising:
the seq2seq model is used for extracting key contents of the article;
a syntax tree construction program for performing syntactic analysis and named entity recognition on each sentence in the key content to establish a syntax tree corresponding to each sentence;
a question construction program for matching the grammar tree with question templates in a question template database established in advance and converting sentences corresponding to the grammar tree into question sentences based on the matched question templates to generate questions when the matched question templates exist;
and the problem ordering model based on the neural network architecture is used for automatically scoring the generated problems and outputting the problems in a ranking mode according to scores.
8. The intelligent problem generation apparatus of claim 7, wherein: the seq2seq model also has an attention mechanism, which is used for directly generating or directly copying the words which do not appear in the pre-established fixed word list into an output result as a part of the key content when the key content is extracted.
9. The intelligent problem generation apparatus of claim 7, wherein: the problem ordering model is a pre-trained neural network model; the training set adopted when the neural network model is trained comprises a plurality of training samples, and each training sample consists of a feature set used for training a problem and an artificial score of the feature set.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program may, when executed by a processor, implement the steps of the method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810068857.5A CN108363743B (en) | 2018-01-24 | 2018-01-24 | Intelligent problem generation method and device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810068857.5A CN108363743B (en) | 2018-01-24 | 2018-01-24 | Intelligent problem generation method and device and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108363743A true CN108363743A (en) | 2018-08-03 |
CN108363743B CN108363743B (en) | 2020-06-02 |
Family
ID=63006763
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810068857.5A Active CN108363743B (en) | 2018-01-24 | 2018-01-24 | Intelligent problem generation method and device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108363743B (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446519A (en) * | 2018-10-10 | 2019-03-08 | 西安交通大学 | A kind of text feature of fused data classification information |
CN109657041A (en) * | 2018-12-04 | 2019-04-19 | 南京理工大学 | The problem of based on deep learning automatic generation method |
CN109726274A (en) * | 2018-12-29 | 2019-05-07 | 北京百度网讯科技有限公司 | Problem generation method, device and storage medium |
CN110162615A (en) * | 2019-05-29 | 2019-08-23 | 北京市律典通科技有限公司 | A kind of intelligent answer method, apparatus, electronic equipment and storage medium |
CN110196975A (en) * | 2019-02-27 | 2019-09-03 | 北京金山数字娱乐科技有限公司 | Problem generation method, device, equipment, computer equipment and storage medium |
CN110209766A (en) * | 2019-05-23 | 2019-09-06 | 招商局金融科技有限公司 | Method for exhibiting data, electronic device and storage medium |
CN110263312A (en) * | 2019-06-19 | 2019-09-20 | 北京百度网讯科技有限公司 | Article generation method, device, server and computer-readable medium |
CN111061851A (en) * | 2019-12-12 | 2020-04-24 | 中国科学院自动化研究所 | Given fact-based question generation method and system |
CN111124414A (en) * | 2019-12-02 | 2020-05-08 | 东巽科技(北京)有限公司 | Abstract syntax tree word-taking method based on operation link |
CN111339269A (en) * | 2020-02-20 | 2020-06-26 | 来康科技有限责任公司 | Knowledge graph question-answer training and application service system with automatically generated template |
CN111368536A (en) * | 2018-12-07 | 2020-07-03 | 北京三星通信技术研究有限公司 | Natural language processing method, apparatus and storage medium therefor |
CN111428467A (en) * | 2020-02-19 | 2020-07-17 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for generating reading comprehension question topic |
CN111522921A (en) * | 2020-03-06 | 2020-08-11 | 国网浙江省电力有限公司电力科学研究院 | Statement rewriting-based end-to-end dialogue oriented data enhancement method |
CN112417885A (en) * | 2020-11-17 | 2021-02-26 | 平安科技(深圳)有限公司 | Answer generation method and device based on artificial intelligence, computer equipment and medium |
CN112417119A (en) * | 2020-11-19 | 2021-02-26 | 上海交通大学 | Open domain question-answer prediction method based on deep learning |
CN113111663A (en) * | 2021-04-28 | 2021-07-13 | 东南大学 | Abstract generation method fusing key information |
CN113268564A (en) * | 2021-05-24 | 2021-08-17 | 平安科技(深圳)有限公司 | Method, device and equipment for generating similar problems and storage medium |
CN113705208A (en) * | 2021-09-01 | 2021-11-26 | 国网江苏省电力有限公司电力科学研究院 | Chinese question automatic generation method and device based on domain terms and key sentences |
CN113743087A (en) * | 2021-09-07 | 2021-12-03 | 珍岛信息技术(上海)股份有限公司 | Text generation method and system based on neural network vocabulary extension paragraphs |
CN116205234A (en) * | 2023-04-24 | 2023-06-02 | 中国电子科技集团公司第二十八研究所 | Text recognition and generation algorithm based on deep learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20010107113A (en) * | 2000-05-25 | 2001-12-07 | 서정연 | Reduction of Natural Language Queries into Boolen and Vector Queries Using Syntactic Tree in a Natural Language Information Retrieval System |
CN102737042A (en) * | 2011-04-08 | 2012-10-17 | 北京百度网讯科技有限公司 | Method and device for establishing question generation model, and question generation method and device |
CN105760546A (en) * | 2016-03-16 | 2016-07-13 | 广州索答信息科技有限公司 | Automatic generating method and device for Internet headlines |
US20160342628A1 (en) * | 2015-05-21 | 2016-11-24 | Oracle International Corporation | Textual query editor for graph databases that performs semantic analysis using extracted information |
CN106815311A (en) * | 2016-12-21 | 2017-06-09 | 杭州朗和科技有限公司 | A kind of problem matching process and device |
-
2018
- 2018-01-24 CN CN201810068857.5A patent/CN108363743B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20010107113A (en) * | 2000-05-25 | 2001-12-07 | 서정연 | Reduction of Natural Language Queries into Boolen and Vector Queries Using Syntactic Tree in a Natural Language Information Retrieval System |
CN102737042A (en) * | 2011-04-08 | 2012-10-17 | 北京百度网讯科技有限公司 | Method and device for establishing question generation model, and question generation method and device |
US20160342628A1 (en) * | 2015-05-21 | 2016-11-24 | Oracle International Corporation | Textual query editor for graph databases that performs semantic analysis using extracted information |
CN105760546A (en) * | 2016-03-16 | 2016-07-13 | 广州索答信息科技有限公司 | Automatic generating method and device for Internet headlines |
CN106815311A (en) * | 2016-12-21 | 2017-06-09 | 杭州朗和科技有限公司 | A kind of problem matching process and device |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109446519A (en) * | 2018-10-10 | 2019-03-08 | 西安交通大学 | A kind of text feature of fused data classification information |
CN109657041B (en) * | 2018-12-04 | 2023-09-29 | 南京理工大学 | Deep learning-based automatic problem generation method |
CN109657041A (en) * | 2018-12-04 | 2019-04-19 | 南京理工大学 | The problem of based on deep learning automatic generation method |
CN111368536A (en) * | 2018-12-07 | 2020-07-03 | 北京三星通信技术研究有限公司 | Natural language processing method, apparatus and storage medium therefor |
CN109726274A (en) * | 2018-12-29 | 2019-05-07 | 北京百度网讯科技有限公司 | Problem generation method, device and storage medium |
CN109726274B (en) * | 2018-12-29 | 2021-04-30 | 北京百度网讯科技有限公司 | Question generation method, device and storage medium |
CN110196975A (en) * | 2019-02-27 | 2019-09-03 | 北京金山数字娱乐科技有限公司 | Problem generation method, device, equipment, computer equipment and storage medium |
CN110209766A (en) * | 2019-05-23 | 2019-09-06 | 招商局金融科技有限公司 | Method for exhibiting data, electronic device and storage medium |
CN110162615A (en) * | 2019-05-29 | 2019-08-23 | 北京市律典通科技有限公司 | A kind of intelligent answer method, apparatus, electronic equipment and storage medium |
CN110263312B (en) * | 2019-06-19 | 2023-09-12 | 北京百度网讯科技有限公司 | Article generating method, apparatus, server and computer readable medium |
CN110263312A (en) * | 2019-06-19 | 2019-09-20 | 北京百度网讯科技有限公司 | Article generation method, device, server and computer-readable medium |
CN111124414A (en) * | 2019-12-02 | 2020-05-08 | 东巽科技(北京)有限公司 | Abstract syntax tree word-taking method based on operation link |
CN111124414B (en) * | 2019-12-02 | 2024-02-06 | 东巽科技(北京)有限公司 | Abstract grammar tree word-taking method based on operation link |
CN111061851B (en) * | 2019-12-12 | 2023-08-08 | 中国科学院自动化研究所 | Question generation method and system based on given facts |
CN111061851A (en) * | 2019-12-12 | 2020-04-24 | 中国科学院自动化研究所 | Given fact-based question generation method and system |
WO2021164284A1 (en) * | 2020-02-19 | 2021-08-26 | 平安科技(深圳)有限公司 | Method, apparatus and device for generating reading comprehension question, and storage medium |
CN111428467B (en) * | 2020-02-19 | 2024-05-07 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for generating problem questions for reading and understanding |
CN111428467A (en) * | 2020-02-19 | 2020-07-17 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for generating reading comprehension question topic |
CN111339269B (en) * | 2020-02-20 | 2023-09-26 | 来康科技有限责任公司 | Knowledge graph question-answering training and application service system capable of automatically generating templates |
CN111339269A (en) * | 2020-02-20 | 2020-06-26 | 来康科技有限责任公司 | Knowledge graph question-answer training and application service system with automatically generated template |
CN111522921B (en) * | 2020-03-06 | 2023-06-02 | 国网浙江省电力有限公司营销服务中心 | Data enhancement method for end-to-end dialogue based on sentence rewriting |
CN111522921A (en) * | 2020-03-06 | 2020-08-11 | 国网浙江省电力有限公司电力科学研究院 | Statement rewriting-based end-to-end dialogue oriented data enhancement method |
CN112417885A (en) * | 2020-11-17 | 2021-02-26 | 平安科技(深圳)有限公司 | Answer generation method and device based on artificial intelligence, computer equipment and medium |
CN112417119A (en) * | 2020-11-19 | 2021-02-26 | 上海交通大学 | Open domain question-answer prediction method based on deep learning |
CN113111663A (en) * | 2021-04-28 | 2021-07-13 | 东南大学 | Abstract generation method fusing key information |
CN113268564A (en) * | 2021-05-24 | 2021-08-17 | 平安科技(深圳)有限公司 | Method, device and equipment for generating similar problems and storage medium |
CN113268564B (en) * | 2021-05-24 | 2023-07-21 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for generating similar problems |
CN113705208A (en) * | 2021-09-01 | 2021-11-26 | 国网江苏省电力有限公司电力科学研究院 | Chinese question automatic generation method and device based on domain terms and key sentences |
CN113705208B (en) * | 2021-09-01 | 2024-05-28 | 国网江苏省电力有限公司电力科学研究院 | Automatic Chinese problem generation method and device based on field terms and key sentences |
CN113743087B (en) * | 2021-09-07 | 2024-04-26 | 珍岛信息技术(上海)股份有限公司 | Text generation method and system based on neural network vocabulary extension paragraph |
CN113743087A (en) * | 2021-09-07 | 2021-12-03 | 珍岛信息技术(上海)股份有限公司 | Text generation method and system based on neural network vocabulary extension paragraphs |
CN116205234A (en) * | 2023-04-24 | 2023-06-02 | 中国电子科技集团公司第二十八研究所 | Text recognition and generation algorithm based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN108363743B (en) | 2020-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
CN107798140B (en) | Dialog system construction method, semantic controlled response method and device | |
CN111475629A (en) | Knowledge graph construction method and system for math tutoring question-answering system | |
CN111274790B (en) | Chapter-level event embedding method and device based on syntactic dependency graph | |
CN110134954B (en) | Named entity recognition method based on Attention mechanism | |
CN110851599A (en) | Automatic scoring method and teaching and assisting system for Chinese composition | |
CN111159405B (en) | Irony detection method based on background knowledge | |
CN111581364B (en) | Chinese intelligent question-answer short text similarity calculation method oriented to medical field | |
CN110222344B (en) | Composition element analysis algorithm for composition tutoring of pupils | |
CN112417155B (en) | Court trial query generation method, device and medium based on pointer-generation Seq2Seq model | |
CN115510814B (en) | Chapter-level complex problem generation method based on dual planning | |
CN111368082A (en) | Emotion analysis method for domain adaptive word embedding based on hierarchical network | |
CN112349294B (en) | Voice processing method and device, computer readable medium and electronic equipment | |
CN112905736A (en) | Unsupervised text emotion analysis method based on quantum theory | |
CN116049387A (en) | Short text classification method, device and medium based on graph convolution | |
Da et al. | Deep learning based dual encoder retrieval model for citation recommendation | |
Lee | Natural Language Processing: A Textbook with Python Implementation | |
CN111815426B (en) | Data processing method and terminal related to financial investment and research | |
CN116842168B (en) | Cross-domain problem processing method and device, electronic equipment and storage medium | |
CN113011154A (en) | Job duplicate checking method based on deep learning | |
Žitko et al. | Automatic question generation using semantic role labeling for morphologically rich languages | |
CN116108840A (en) | Text fine granularity emotion analysis method, system, medium and computing device | |
Bai et al. | Gated character-aware convolutional neural network for effective automated essay scoring | |
CN116186241A (en) | Event element extraction method and device based on semantic analysis and prompt learning, electronic equipment and storage medium | |
Abdulwahab et al. | Deep Learning Models for Paraphrases Identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |