CN112380836A - Intelligent Chinese message question generating method - Google Patents
Intelligent Chinese message question generating method Download PDFInfo
- Publication number
- CN112380836A CN112380836A CN202011261252.1A CN202011261252A CN112380836A CN 112380836 A CN112380836 A CN 112380836A CN 202011261252 A CN202011261252 A CN 202011261252A CN 112380836 A CN112380836 A CN 112380836A
- Authority
- CN
- China
- Prior art keywords
- question
- template
- model
- sub
- answer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for generating an intelligent message question, which comprises the following steps: s1: obtaining the question-answer pairs related to the Chinese emotion by using a crawler technology, and generating a triple corpus which can be used for model training through manual processing and triple extractionS2: the relation-based template learning algorithm based on seq2seq is adopted, and a template question generation model M is constructed through training to realize the relation-basedGenerating a template question with a theme, and then performing theme text replacement on the template question to obtain a final generated question qr(ii) a S3: the interface of the intelligent Chinese sentence generation system is used for receiving parameters required by the server, processing the model and returning the structured result. The template learning algorithm adopted by the invention utilizes the LSTM deep learning model to learn the general template of the question, and can learn the question generation mechanism on the semantic level, so that the generated question is more compliant and has important theoretical significance and practical value.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method for generating intelligent Qiaoqing question sentences.
Background
In various methods for generating question, template matching is a classical mature method, and batch question generation can be rapidly performed aiming at named entities in a certain field on the premise of establishing related subject question templates. However, the question generated by this method is often of a single sentence pattern, and the generated question mostly belongs to several broad categories, so that it is difficult to diversify and complicate the question angle, and the language lacks diversity, and sometimes misleading information may be generated. Especially in the field of Qiaoqing question answering, it is more complicated to generate a question with specified subject and relationship from abundant and specific Qiaoqing corpus.
Disclosure of Invention
The invention mainly aims to overcome the limitations of rigidity and poor diversity of the sentence patterns of the question generated by the traditional template matching method, and provides the intelligent question generation method.
The invention adopts the following technical scheme:
a method for generating intelligent Chinese emotional question sentences comprises the following steps:
s1: obtaining the question-answer pairs related to the Chinese emotion by using a crawler technology, and generating a triple corpus which can be used for model training through manual processing and triple extraction
S2: adopting a template learning algorithm based on seq2seq, constructing a template question generation model M through training, realizing the generation of the template question based on the relation and the theme, and then performing theme text replacement on the template question to obtain the final generated question qr;
S3: the interface of the intelligent Chinese sentence generation system is used for receiving parameters required by the server, processing the model and returning the structured result.
Specifically, the implementation process of step S1 further includes the following steps:
s11: using 'Chinese' or 'qiangbi' or 'waduo' as key words, crawling a webpage by using a crawler technology to obtain the love and answer pairs of the Chinese;
s12: the available question and answer corpus B is screened out manuallyQAQ represents a question set and a represents an answer set;
s13: from question-answer pair material B using dependency parsing techniqueQAExtract triplets from a treeWhere T represents a subject entity, R represents a relationship, O represents an object, a tripleIs a set of
Specifically, the implementation process of step S2 further includes the following steps:
s21: based on the obtained relation set R ═ { R ═ R1,r2,…,rnN denotes the total number of relationships for a certain relationship riObtaining a question Q' related to the question, whereinSelecting a question set Q 'with complete subjects, predicates and objects from the Q', and then selecting the optimal question set Q 'from the question set Q'Question qi;
S22: for the optimal question qiIs replaced with a subject text template tag SUB, thereby forming a template question qi′;
S23: constructing a set Q of input and output data pairs according to the set R of relationshipstrain={qtrain=[(SUB,SEP,ri),qi′]Where SEP is a separator and input data is (SUB, SEP, r) } (i ═ 1,2, …, n)i) The output data is qi′;
S24: the obtained QtrainAnd as an input set of the seq2seq model, training to obtain a model M, and mapping corresponding template question q' to input data (SUB, SEP, r) formed by an arbitrary relation r.
S25: replacing the SUB label in the template question q' with the subject t to obtain the finally generated question qr。
Specifically, the implementation process of step S3 further includes the following steps:
s31: receiving the first two elements t and r of the fact triple through an interface, wherein t is a subject of a question sentence which a user wants to obtain, and r is a corresponding relation of a text t of a question text which the user asks;
s32: model processing; input data (SUB, SEP, r) formed based on the parameter r is sent into the model M to obtain an output template question q ', and a SUB label in q' is replaced by a subject t to obtain a finally generated question qr;
S33: structuring output result question q using json interfacer。
Specifically, the material B is subjected to question-answer pairQAExtract triplets from a treeThe method comprises the following steps: and the subject vocabulary constitutes T in the triple.
Specifically, the material B is subjected to question-answer pairQAExtract triplets from a treeThe method comprises the following steps: the predicate vocabulary, constitutes R in the triplet.
Specifically, the material B is subjected to question-answer pairQAExtract triplets from a treeThe method comprises the following steps: object vocabulary, constituting the O in the triplet.
Specifically, the input data (SUB, SEP, r) formed based on the parameter r is fed into the model M, which further comprises: generating a vector u through one-hot coding before data are input into a model Mi,ui∈R|D|And R is a real number set, and D is a dictionary generated according to the topics and the relations in the obtained Qiaoqing question-answer pairs.
As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:
(1) according to the method, on the premise that linguistic data which are as rich as possible and have specificity are collected and preprocessed, on the basis of fact triples extracted from the question and answer linguistic data, a training data pair set is constructed to serve as input of a model according to subjects and relations in the triples, template questions based on the relations and the subjects are generated, then subject texts of the generated template questions are replaced, and finally target questions are obtained;
(2) the LSTM (long short term memory network) employed in the present invention is a seq2seq framework based on encoding (Encoder) -decoding (Decoder). Compared with the traditional RNN, the LSTM adds a structure which can judge whether input information is useful or not, can selectively filter and forget information, and has good play in processing and predicting important events with long intervals in time sequence.
(3) The invention starts from the limitation of the traditional template matching method, combines a template method with a seq2seq model, provides a template learning algorithm based on seq2seq, and realizes automatic question generation; the method improves the current situation that manual editing generates question sentences, and overcomes the defects of rigor and poor diversity of the question sentence patterns generated by the template matching method. When the system for generating the Qiao emotion question is used actually, the target question can be quickly obtained by only calling the corresponding interface and transmitting the theme and the relation of the target question as parameters to the interface, and the system is accurate and convenient.
Drawings
FIG. 1 is a diagram of a project framework of a smart message question generation system according to an embodiment of the present invention;
FIG. 2 is a diagram of a result of a question-answer pair obtained from a Baidu known web page using a crawler technology in an embodiment of the present invention;
FIG. 3 is a diagram of a question-answer pair result generated using manual editing in an embodiment of the present invention;
FIG. 4 is a diagram illustrating the results of the data of the triplet set after structured specification according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a process of extracting an optimal question and forming a template question based on a question-answer relationship set according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating an Encode-Decoder sequence mapping process according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating a result of a one-hot encoding dictionary generated in an embodiment of the present invention;
FIG. 8 is a diagram illustrating an example of a sequence set of an input Encoder according to an embodiment of the present invention;
FIG. 9 is a diagram of an example sequence set for training a Decoder according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating the model prediction generation in an embodiment of the present invention;
FIG. 11 shows the result of the HTTP interface test according to the embodiment of the present invention
The invention is described in further detail below with reference to the figures and specific examples.
Detailed Description
The invention is further described below by means of specific embodiments.
The technical solutions in the embodiments of the present invention will be described in detail below with reference to the drawings in the embodiments of the present invention, and the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments.
The invention provides a system and a method for generating an intelligent message question, wherein the flow steps of the specific construction method are shown in figure 1:
firstly, crawling hundred degrees by using a crawler technology to know the content of a platform webpage, acquiring question and answer pairs related to the Chinese emotion, and generating a triple corpus set for model training through manual processing and triple extraction
Secondly, a template learning algorithm based on seq2seq is adopted, a template question generation model M is constructed through training, template question generation based on relation and subject can be achieved, and then subject text replacement is carried out on the template question to obtain a final generated question qr;
And finally, developing an HTTP interface of the intelligent Chinese sentence generation system, receiving parameters required by the server and returning a structured result in a short time.
Specifically, in the step of obtaining the initial corpus, first, a crawler technology is used to crawl a Baidu known platform webpage, and a question-answer pair is obtained by using "Chinese" or "qiaoneng" or "wading" as a keyword, and the result is shown in fig. 2.
Secondly, the obtained question-answer pairs are processed in a manual editing mode to obtain the question-answer pair corpus BQAAnd { Q, a }, wherein Q represents a question set and a represents an answer set. The manual editing mode can effectively solve the difficulties of flexible and changeable grammar and complex structure of Chinese, has incomparable advantages of strong pertinence and high precision, and obtains better performance in automatic evaluation and manual evaluation. For some excellent declarative corpora, high quality question-answer pairs can be generated in this way, and the result is shown in fig. 3.
Finally, using dependency parsing technique, feed B is parsed from question-answer pairsQAExtract triplets from a treeWhere T represents a subject entity, R represents a relationship, and O represents an object. Triple unitIs a set ofThe results after structured specification are shown in fig. 4.
Triples extracted based on the above stepsGet an emotion question-answer relation set R ═ { R1,r2,…,rnN denotes the total number of relationships for a certain relationship riObtain a question Q' associated therewith (wherein). Selecting a question set Q 'with complete subjects, predicates and objects from Q', and then selecting a relation r of the predicate from the question set QiQuestion q with the least number of questionsiIs an optimal question. Question-question qiIs replaced with a subject text template tag SUB, thereby forming a template question qi'. The flow chart of the above steps is shown in fig. 5.
Constructing a set Q of input and output data pairs according to a set R of Qiaoqing relationshipstrain={qtrain=[(SUB,SEP,ri),qi′]Where SEP is a separator and input data is (SUB, SEP, r) } (i ═ 1,2, …, n)i) The output data is qi'. The obtained QtrainThe model M is obtained through training as an input set of the seq2seq model, and a corresponding template question q' can be mapped to input data (SUB, SEP, r) formed by an arbitrary relation r, and the Encode-Decoder sequence mapping is shown in FIG. 6.
The input of the model is composed of a string of digital sequences translated by one-hot coding, the one-hot coding dictionary is generated according to the topics and the relations in the obtained Qiao emotion question-answer pairs, and the result is shown in fig. 7.
The sequence set of the input Encoder after encoding is shown in fig. 8. The sequence set used to train the Decoder after encoding is shown in fig. 9.
The simple test result of the model is shown in fig. 10, and it can be seen from the test result that the method provided by the invention can generate the qianqiang recognition question of the specified topic and relationship according to the topic and relationship in the triple, and the obtained question is more diversified.
In the process of calling the HTTP interface of the intelligent Chinese sentence generating system, a user is required to input the relation between the subject of a required question and a question to be asked about the subject. The interface uses json to structure the transmitted data. The called interface works normally, and after the theme and the relation parameters transmitted by the POST are received, the corresponding result in the json format is successfully returned. The test results are shown in fig. 11.
The above description is only an embodiment of the present invention, but the design concept of the present invention is not limited thereto, and any insubstantial modifications made by using the design concept should fall within the scope of infringing the present invention.
Claims (8)
1. An intelligent message generating method is characterized by comprising the following steps:
s1: obtaining the question-answer pairs related to the Chinese emotion by using a crawler technology, and generating a triple corpus which can be used for model training through manual processing and triple extraction
S2: adopting a template learning algorithm based on seq2seq, constructing a template question generation model M through training, realizing the generation of the template question based on the relation and the theme, and then performing theme text replacement on the template question to obtain the final generated question qr;
S3: the interface of the intelligent Chinese sentence generation system is used for receiving parameters required by the server, processing the model and returning the structured result.
2. The method of claim 1, wherein the step S1 further comprises the following steps:
s11: using 'Chinese' or 'qiangbi' or 'waduo' as key words, crawling a webpage by using a crawler technology to obtain the love and answer pairs of the Chinese;
s12: the available question and answer corpus B is screened out manuallyQAQ represents a question set and a represents an answer set;
3. The method of claim 2, wherein the step S2 further comprises the following steps:
s21: based on the obtained relation set R ═ { R ═ R1,r2,...,rnN denotes the total number of relationships for a certain relationship riObtaining a question Q' related to the question, whereinSelecting a question set Q ' with complete subjects, predicates and objects from the Q ', and then selecting an optimal question Q ' from the question Qi;
S22: for the optimal question qiIs replaced with a subject text template tag SUB, thereby forming a template question qi′;
S23: constructing a set Q of input and output data pairs according to the set R of relationshipstrain={qtrain=[(SUB,SEP,ri),qi′]1, 2.., n), wherein SEP is a separator and the input data is (SUB, SEP, r)i) The output data is qi′;
S24: the obtained QtrainAnd as an input set of the seq2seq model, training to obtain a model M, and mapping corresponding template question q' to input data (SUB, SEP, r) formed by an arbitrary relation r.
S25: replacing the SUB label in the template question q' with the subject t to obtain the finally generated question qr。
4. The method of claim 3, wherein the step S3 further comprises the following steps:
s31: receiving the first two elements t and r of the fact triple through an interface, wherein t is a subject of a question sentence which a user wants to obtain, and r is a corresponding relation of a text t of a question text which the user asks;
s32: model processing; input data (SUB, SEP, r) formed based on the parameter r is sent into the model M to obtain an output template question q ', and a SUB label in q' is replaced by a subject t to obtain a finally generated question qr;
S33: structuring output result question q using json interfacer。
8. The system of claim 4, wherein the input data (SUB, SEP, r) formed based on the parameter r is fed into the model M, and further comprising: generating a vector u through one-hot coding before data are input into a model Mi,ui∈R|D|And R is a real number set, and D is a dictionary generated according to the topics and the relations in the obtained Qiaoqing question-answer pairs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011261252.1A CN112380836A (en) | 2020-11-12 | 2020-11-12 | Intelligent Chinese message question generating method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011261252.1A CN112380836A (en) | 2020-11-12 | 2020-11-12 | Intelligent Chinese message question generating method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112380836A true CN112380836A (en) | 2021-02-19 |
Family
ID=74583320
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011261252.1A Pending CN112380836A (en) | 2020-11-12 | 2020-11-12 | Intelligent Chinese message question generating method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112380836A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112905744A (en) * | 2021-02-25 | 2021-06-04 | 华侨大学 | Qiaoqing question and answer method, device, equipment and storage device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017181834A1 (en) * | 2016-04-19 | 2017-10-26 | 中兴通讯股份有限公司 | Intelligent question and answer method and device |
CN109062939A (en) * | 2018-06-20 | 2018-12-21 | 广东外语外贸大学 | A kind of intelligence towards Chinese international education leads method |
CN110196896A (en) * | 2019-05-23 | 2019-09-03 | 华侨大学 | A kind of intelligence questions generation method towards the study of external Chinese characters spoken language |
CN111566654A (en) * | 2018-01-10 | 2020-08-21 | 国际商业机器公司 | Machine learning integrating knowledge and natural language processing |
CN111753101A (en) * | 2020-06-30 | 2020-10-09 | 华侨大学 | Knowledge graph representation learning method integrating entity description and type |
-
2020
- 2020-11-12 CN CN202011261252.1A patent/CN112380836A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017181834A1 (en) * | 2016-04-19 | 2017-10-26 | 中兴通讯股份有限公司 | Intelligent question and answer method and device |
CN111566654A (en) * | 2018-01-10 | 2020-08-21 | 国际商业机器公司 | Machine learning integrating knowledge and natural language processing |
CN109062939A (en) * | 2018-06-20 | 2018-12-21 | 广东外语外贸大学 | A kind of intelligence towards Chinese international education leads method |
CN110196896A (en) * | 2019-05-23 | 2019-09-03 | 华侨大学 | A kind of intelligence questions generation method towards the study of external Chinese characters spoken language |
CN111753101A (en) * | 2020-06-30 | 2020-10-09 | 华侨大学 | Knowledge graph representation learning method integrating entity description and type |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112905744A (en) * | 2021-02-25 | 2021-06-04 | 华侨大学 | Qiaoqing question and answer method, device, equipment and storage device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104050160B (en) | Interpreter's method and apparatus that a kind of machine is blended with human translation | |
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
WO2021031480A1 (en) | Text generation method and device | |
CN114116994A (en) | Welcome robot dialogue method | |
CN110717018A (en) | Industrial equipment fault maintenance question-answering system based on knowledge graph | |
CN112765345A (en) | Text abstract automatic generation method and system fusing pre-training model | |
CN112417134B (en) | Automatic abstract generation system and method based on voice text deep fusion features | |
CN113672708A (en) | Language model training method, question and answer pair generation method, device and equipment | |
CN113157885B (en) | Efficient intelligent question-answering system oriented to knowledge in artificial intelligence field | |
CN110245253B (en) | Semantic interaction method and system based on environmental information | |
CN112364132A (en) | Similarity calculation model and system based on dependency syntax and method for building system | |
CN115759042A (en) | Sentence-level problem generation method based on syntax perception prompt learning | |
CN111428104A (en) | Epilepsy auxiliary medical intelligent question-answering method based on viewpoint type reading understanding | |
CN111523328B (en) | Intelligent customer service semantic processing method | |
CN103885924A (en) | Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method | |
CN114238645A (en) | Relationship selection method based on BERT twin attention network and fusion graph embedding characteristics | |
CN114444481B (en) | Sentiment analysis and generation method of news comment | |
CN115858750A (en) | Power grid technical standard intelligent question-answering method and system based on natural language processing | |
CN115455167A (en) | Geographic examination question generation method and device based on knowledge guidance | |
CN109933773A (en) | A kind of multiple semantic sentence analysis system and method | |
CN112349294B (en) | Voice processing method and device, computer readable medium and electronic equipment | |
CN112380836A (en) | Intelligent Chinese message question generating method | |
CN116522165B (en) | Public opinion text matching system and method based on twin structure | |
CN117271745A (en) | Information processing method and device, computing equipment and storage medium | |
CN114281966A (en) | Question template generation method, question answering device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210219 |