CN112256851A - Method and device for generating educational robot dialogue data set and storage medium - Google Patents

Method and device for generating educational robot dialogue data set and storage medium Download PDF

Info

Publication number
CN112256851A
CN112256851A CN202011147186.5A CN202011147186A CN112256851A CN 112256851 A CN112256851 A CN 112256851A CN 202011147186 A CN202011147186 A CN 202011147186A CN 112256851 A CN112256851 A CN 112256851A
Authority
CN
China
Prior art keywords
conversation
dialogue
robot
data set
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011147186.5A
Other languages
Chinese (zh)
Inventor
闫晓宇
于丹
李雪
马壮
王宇
管浩言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Neusoft Education Technology Group Co ltd
Original Assignee
Dalian Neusoft Education Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Neusoft Education Technology Group Co ltd filed Critical Dalian Neusoft Education Technology Group Co ltd
Priority to CN202011147186.5A priority Critical patent/CN112256851A/en
Publication of CN112256851A publication Critical patent/CN112256851A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method and a device for generating a dialogue data set of an educational robot and a storage medium. The method comprises the following steps: acquiring a knowledge point list of a target course, and constructing a closed domain of the conversation robot according to the knowledge point list; setting a conversation intention and a word slot, wherein the word slot is key information in a conversation and is selected according to the type of the conversation intention, and the type of the conversation intention comprises a notification class and a requirement class; constructing a question template, wherein the question template comprises a first question template and a second question template; generating a conversation target according to the closed domain of the conversation robot; and generating a plurality of rounds of dialogue sentences based on the dialogue targets and the question templates. The invention can be applied to the fields of natural language processing and data generation, in particular to the generation of a multi-turn question data set, and the task-driven multi-turn dialogue robot in a specific closed domain is trained based on the generated data set.

Description

Method and device for generating educational robot dialogue data set and storage medium
Technical Field
The present invention relates to the field of natural language processing and data generation, and in particular, to a method, an apparatus, and a storage medium for generating a dialogue data set for an educational robot.
Background
At present, most of task-driven multi-turn dialog data sets for closed domains are generated artificially, which can obtain dialogs closest to human natural language, such as DSTC series data sets, WOZ, MultiWOZ, CrossWOZ, and the like. The data sets are generated for specific domains such as restaurants, scenic spots, hotels and the like, and the contained dialogue contents are tightly coupled to the domains, so that the data sets are difficult to apply to the dialogue robot training process of other domains. Furthermore, the time period required to generate a data set manually is long, and more importantly, manually generating a dialogue data set is expensive, which provides resistance to the study of many rounds of dialogue.
On the other hand, although we can write simple codes according to the requirements to automatically generate some dialogue sentences, such text structure is difficult to apply to the algorithm of the dialogue robot. There is currently no established method to automatically generate a data set suitable as input for a multi-turn dialog-related algorithm.
Disclosure of Invention
The invention discloses a method and a device for generating a multi-turn dialogue data set of an educational robot and a storage medium. The technical problem that a data set suitable for being input as a multi-turn dialogue related algorithm is not generated automatically in the prior art is solved.
The technical means adopted by the invention are as follows:
a method of generating an educational robot dialog data set, comprising:
acquiring a knowledge point list of a target course, and constructing a closed domain of the conversation robot according to the knowledge point list;
setting a conversation intention and a word slot, wherein the word slot is key information in a conversation and is selected according to the type of the conversation intention, and the type of the conversation intention comprises a notification class and a requirement class;
constructing a question template, wherein the question template comprises a first question template and a second question template, the first question template comprises necessary information for multi-turn conversation, and the second question is called when any one turn of question behind the first turn of multi-turn conversation is generated;
generating a conversation target according to the closed domain of the conversation robot;
and generating a plurality of rounds of dialogue sentences based on the dialogue targets and the question templates.
Further, the constructing the closed domain of the dialogue robot according to the knowledge point list further comprises switching the closed domain of the dialogue robot by updating the knowledge point list.
Further, the notification class represents an intention that information needs to be provided to the robot; the request class represents an intent to desire information from the robot.
An educational robot dialogue model training method, comprising:
repeatedly executing the method for generating an educational robot dialogue data set according to any one of the above items to generate an educational robot dialogue data set including question sentences of a plurality of rounds of dialogue;
training an education robot dialogue model by using the education robot dialogue data set as a training data set;
and realizing man-machine conversation based on the education robot conversation model.
An apparatus for generating a dialogue data set for an educational robot, comprising:
the acquisition unit is used for acquiring a knowledge point list of the target course and constructing a closed domain of the conversation robot according to the knowledge point list;
the device comprises a setting unit, a processing unit and a processing unit, wherein the setting unit is used for setting a conversation intention and a word slot, the word slot is key information in a conversation and is selected according to the type of the conversation intention, and the type of the conversation intention comprises a notification class and a requirement class;
the system comprises a construction unit, a query unit and a query unit, wherein the construction unit is used for constructing a query template, the query template comprises a first query template and a re-query template, the first query template comprises necessary information for multi-turn conversation, and the re-query is called when any one turn of query after the first turn of the multi-turn conversation is generated;
the object generation unit is used for generating a conversation object according to the closed domain of the conversation robot;
and the sentence generating unit is used for generating a plurality of rounds of dialogue sentences based on the dialogue targets and the question sentence templates.
An educational robot dialogue model training method, comprising:
a training data generating unit for repeatedly performing the method of generating an educational robot dialogue data set including question sentences of a plurality of rounds of dialogue as described in any one of the above, generating an educational robot dialogue data set;
a training unit for training an educational robot dialogue model using the educational robot dialogue data set as a training data set;
and the man-machine conversation unit is used for realizing man-machine conversation based on the education robot conversation model.
A computer-readable storage medium having a set of computer instructions stored therein; the set of computer instructions, when executed by the processor, implement the method of generating an educational robot dialog data set or the method of training an educational robot dialog model described above.
Compared with the prior art, the invention has the following advantages:
1. the method can automatically generate the dialogue data set in the task-driven multi-turn dialogue research field, reduces the cost of manually generating data, and provides flexible and automatic data set support for research in this direction. The method can generate the data set related to the specific closed domain, and the method for generating the data set has good generalization, namely the method can be switched to the closed domain generating data set in any education field, and the generated dialogue data structure can be applied to algorithms of multi-turn dialogue robots.
2. The invention can be only used for generating question sentences in a plurality of rounds of conversations, reasonably avoids the problem that the question answers are not easy to match, and can generate the effect of progressive conversation rounds.
Based on the reasons, the invention can be widely popularized in the field of man-machine conversation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a method for generating a dialogue data set of an educational robot according to the present invention.
FIG. 2 is a diagram of a data structure of a knowledge point word slot in an embodiment.
FIG. 3 is a diagram illustrating a target data structure of a multi-turn dialog according to an embodiment.
Fig. 4 is first wheel session data generated in the example.
Fig. 5 is second wheel session data generated in the example.
Fig. 6 is a third round of dialogue data generated in the embodiment.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The dialogue type of educational dialogue robots is task-driven, and a large volume of data sets is required to train this type of multi-turn dialogue algorithm model. The data set is in a closed domain, the structure of the data set is a plurality of rounds of conversations with questions and answers, and most of the existing generation methods are generated manually. Faced with the expensive cost of manual generation, automatically generating multiple rounds of conversational data sets faces several difficulties: first, how correctly questions and answers in a certain turn in a multi-turn dialog are matched; second, how to generate a data set for a particular closed domain; thirdly, whether the method for generating the data set has good generalization, namely whether the method can be switched to a closed domain of any education field to generate the data set; fourth, how to generate the effect of the progress of the conversation turns; fifth, whether the generated dialogue data structure can be applied to algorithms of a multi-turn dialogue robot, such as intention recognition and dialogue management, etc. The method of the present patent presents a solution to the above problems.
As shown in fig. 1, the present invention discloses a method for generating a dialogue data set for an educational robot, comprising:
and S1, acquiring the knowledge point list of the target course, and constructing a closed domain of the conversation robot according to the knowledge point list. This step is easy to implement since the knowledge points of a certain course can be exhaustive. Further, switching the closed domain of the dialogue robot can be realized by updating the knowledge point list.
S2, setting a conversation intention and a word slot, wherein the word slot is key information in the conversation and is selected according to the type of the conversation intention, and the type of the conversation intention comprises a notification class and a requirement class. When generating a conversation around a knowledge point, the conversation robot wants to obtain human conversation intents and word slots and corresponding slot values. In the present method, the intention of the dialog is set to two: notification and demand. Wherein the notification is an intention of the human being to provide some information to the robot and the request is an intention of the human being to wish to obtain the relevant information from the robot. The word slot is key information in the dialog, such as definition or properties of a knowledge point.
S3, constructing a question template, wherein the question template comprises a first question template and a second question template, the first question template comprises necessary information for multi-turn conversation, the second question is called when any one turn of question after the first turn of multi-turn conversation is generated, and the constructed question template comprises two types of first question and second question. The first question may contain information necessary for the progress of multiple rounds of the dialog, such as the name of the knowledge point, which is called when the first question of multiple rounds of the dialog is generated. The question re-question omits some information and is called when any question after the first turn of the multi-turn dialog is generated.
And S4, generating a dialogue target according to the closed domain of the dialogue robot. In daily life, people do not aim at performing task-driven conversations in a closed domain, i.e., the conversations are spread around some objects. The method also generates dialogue targets before generating dialogue sentences, and the generation of sentences aims at completing the targets.
And S5, generating a plurality of rounds of dialogue sentences based on the dialogue targets and the question sentence templates.
The embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Specifically, taking the generation of the machine learning course as an example, the process of generating the multiple rounds of dialogue data sets in the closed domain is as follows:
firstly, a knowledge point list is obtained.
For machine learning this course, instead of getting a list of knowledge point names in the relevant books, we can for example choose a knowledge point named "0/1 loss function" as the subject of a multi-turn conversation, which will also be spread around. Table 1 is a list of acquired knowledge points.
Table 1 list of knowledge points
Knowledge points
0/1 loss function
LASSO
5x2 Cross validation
And secondly, setting an intention and word slot.
Preferably, the attributes of all knowledge points are initialized first, which in turn comprises: "Domain" is the "knowledge point" domain; "name" is the name of the knowledge point; the 6 attributes "define", "properties", "apply", "benefit and disadvantage", "implementation tool", "optimization improvement" are initialized to an empty list.
Specifically, in the dialogue of the knowledge points, the word slot related to the intention of the notification is set as the name of the knowledge point, and the word slot related to the intention of the demand is set as the definition, the property, the application, the advantages and the disadvantages, the implementation tool and the optimization improvement of the knowledge point. Thus, the word slot data structure for each knowledge point may be initialized. Taking the "0/1 loss function" as an example, as shown in fig. 2, it has 6 word slots available for expanding a dialog in the field of "knowledge points".
The setting of the word slots depends on the actual initial setting, and the word slots can be added or reduced at will according to the actual use requirements, for example, the word slots such as "reference learning website", "recommendation book" and the like can be added.
And thirdly, constructing a question template.
The constructed question template comprises two types of first question and second question. The first question template contains necessary information for multiple rounds of conversation, and the preferable necessary information in the method is optional knowledge point names. It will be called randomly when the first turn of the dialog is generated, where the "name" will be replaced with the name of the knowledge point when called.
The question re-template is called after the first question, and no matter how many turns of the dialog are, the specified number of question re-templates can be easily generated by randomly calling the question re-template. The underline "_" in the first and second questions will be randomly replaced by one of the 6 word slots to which the "requirement" intent is related when the template is called.
TABLE 2 first question template and second question template
First question Question again
What is the _ of name A _?
What is the name _ of in machine learning What is again that
I want to know about name \ u What its _ is
And fourthly, generating a dialogue object.
In daily life, people do not aim at performing task-driven conversations in a closed domain, i.e., the conversations are spread around some objects. The method also generates dialogue targets before generating dialogue sentences, and the generation of sentences aims at completing the targets.
The targets are all randomly generated, and the target data in the multiple rounds of dialogue with the number 0 is shown in fig. 3. Where the number "1" represents the number of the closed field and "false" represents that the object has not been completed. The goal at this point indicates that our multi-turn dialog will expand around the knowledge point of "0/1 loss function" and ask what its "definitions", "applications" and "properties" are. Further, for a certain round of dialog, it is set that the contents in the target list are to be involved in the dialog process. Each item in the target list is a sub-target, and the sub-targets are a list, wherein the list sequentially comprises: a ranking number of the dialog; the domain to which the conversation relates; the slot name that needs to be asked; the slot value corresponding to the slot name; the sub-goal complete status.
And fifthly, generating a dialogue statement.
Based on the question template and the dialog target, multiple turns of dialog text can be finally generated. Wherein "context" is a generated dialog statement, "role" is user, which means that the dialog is generated by a simulation user, "dialog _ act" is a dialog sub-target to be completed in a certain turn of the multi-turn dialog, "user _ state" is continuously updated as the number of turns of the dialog increases and the sub-target is completed, and finally, the states of all targets are updated from "false" to "true".
It should be noted that the number of dialog turns can be set according to the actual application requirements, and 3 dialog turns are preferably performed in this embodiment.
The 1 st round provides the name of the knowledge point as '0/1 loss function', a word slot is randomly selected for asking questions, the definition of the word slot is defined in the round, and therefore the first question template is called to generate a first dialog sentence 'which can be interpreted by you for defining 0/1 loss functions'. It can be seen that the "name" and "definition" words in "user _ state" are both updated to "true" as shown in fig. 4. Specifically, based on the target list of fig. 3, first wheel session data is first generated. The format is dictionary, the keys of the dictionary are 4, including 'content', 'role', 'conversation action' and 'target completion state'. Wherein the value of the "content" key is the text information of the first round of dialog; the value of the "role" key is "user"; the value of the "dialogue act" key is a sub-target in the target list that appears in the first round of text information; the value of the 'target completion state' key is a list, the sub-targets in the 'conversation behavior' are sequentially found in the target list, the completion state of the sub-targets is changed into true, and the updated list is the 'target completion state' list.
The 2 nd round continues to randomly select a word slot for questioning against the knowledge point of "0/1 loss function", and for its nature in this round, a re-question template is called to generate a second conversational sentence "how to correctly understand its nature". Also, the "property" word slot states in "user _ state" are all updated to "true" as shown in FIG. 5. Specifically, second wheel session data is generated based on the target list of fig. 3. The format is dictionary, the keys of the dictionary are 4, including 'content', 'role', 'conversation action' and 'target completion state'. Wherein the value of the "content" key is the text information of the second round of dialog; the value of the "role" key is "user"; the value of the "dialogue action" key is a sub-target in the target list appearing in the second round of text information; the value of the "target completion state" key is a list, the sub-targets in the "conversation behavior" are sequentially found in the first round of "target completion state" list, the completion states of the sub-targets are changed into true, and the updated list is the "target completion state" list.
The 3 rd round continues to randomly select a word slot to ask questions, and for its application in this round, a re-question template is called to generate a third dialogue sentence "explain which bars are applied again". Similarly, the word slot status of "application" in "user _ state" is updated to "true", and all the word slot statuses at this time are "true" indicating that all the conversation objects are completed and the conversation is ended, as shown in fig. 6.
When the number of the set dialog turns increases, more turns of dialog data can be obtained according to the above-described procedure.
So far, we have described a data sample automatically generated by the method, and the number of multiple rounds of conversations to be obtained can be arbitrarily set manually when the method is applied, and the data structure of each multiple round of conversations is similar to the sample.
An educational robot dialogue model training method, comprising:
repeatedly executing the method for generating an educational robot dialogue data set according to any one of the above items to generate an educational robot dialogue data set;
training an education robot dialogue model by using the education robot dialogue data set as an input data set;
and realizing man-machine conversation based on the education robot conversation model.
An apparatus for generating a dialogue data set for an educational robot, comprising:
the acquisition unit is used for acquiring a knowledge point list of the target course and constructing a closed domain of the conversation robot according to the knowledge point list;
the device comprises a setting unit, a processing unit and a processing unit, wherein the setting unit is used for setting a conversation intention and a word slot, the word slot is key information in a conversation and is selected according to the type of the conversation intention, and the type of the conversation intention comprises a notification class and a requirement class;
the system comprises a construction unit, a query unit and a query unit, wherein the construction unit is used for constructing a query template, the query template comprises a first query template and a re-query template, the first query template comprises necessary information for multi-turn conversation, and the re-query is called when any one turn of query after the first turn of the multi-turn conversation is generated;
the object generation unit is used for generating a conversation object according to the closed domain of the conversation robot;
and the sentence generating unit is used for generating a plurality of rounds of dialogue sentences based on the dialogue targets and the question sentence templates.
An educational robot dialogue model training method, comprising:
a training data generating unit for repeatedly performing the method of generating an educational robot dialogue data set including question sentences of a plurality of rounds of dialogue as described in any one of the above, generating an educational robot dialogue data set;
a training unit for training an educational robot dialogue model using the educational robot dialogue data set as a training data set;
and the man-machine conversation unit is used for realizing man-machine conversation based on the education robot conversation model.
For the embodiments of the present invention, the description is simple because it corresponds to the above embodiments, and for the related similarities, please refer to the description in the above embodiments, and the detailed description is omitted here.
A computer-readable storage medium having a set of computer instructions stored therein; the set of computer instructions, when executed by the processor, implement the method of generating an educational robot dialog data set or the method of training an educational robot dialog model described above.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A method for generating a dialogue data set for an educational robot, comprising:
acquiring a knowledge point list of a target course, and constructing a closed domain of the conversation robot according to the knowledge point list;
setting a conversation intention and a word slot, wherein the word slot is key information in a conversation and is selected according to the type of the conversation intention, and the type of the conversation intention comprises a notification class and a requirement class;
constructing a question template, wherein the question template comprises a first question template and a second question template, the first question template comprises necessary information for multi-turn conversation, and the second question is called when any one turn of question behind the first turn of multi-turn conversation is generated;
generating a conversation target according to the closed domain of the conversation robot;
and generating a plurality of rounds of dialogue sentences based on the dialogue targets and the question templates.
2. The method for generating a dialogue data set for an educational robot according to claim 1, wherein the constructing of the closed domain of the dialogue robot based on the knowledge point list further comprises switching the closed domain of the dialogue robot by updating the knowledge point list.
3. The generation method of a dialogue data set for an educational robot according to claim 1, wherein the notification class represents an intention to be provided to a robot information; the request class represents an intent to desire information from the robot.
4. A training method for a dialogue model of an educational robot, comprising:
repeatedly executing the method for generating an educational robot dialogue data set according to any one of claims 1 to 3, generating an educational robot dialogue data set including question sentences of a plurality of rounds of dialogue;
training an education robot dialogue model by using the education robot dialogue data set as a training data set;
and realizing man-machine conversation based on the education robot conversation model.
5. An apparatus for generating a dialogue data set for an educational robot, comprising:
the acquisition unit is used for acquiring a knowledge point list of the target course and constructing a closed domain of the conversation robot according to the knowledge point list;
the device comprises a setting unit, a processing unit and a processing unit, wherein the setting unit is used for setting a conversation intention and a word slot, the word slot is key information in a conversation and is selected according to the type of the conversation intention, and the type of the conversation intention comprises a notification class and a requirement class;
the system comprises a construction unit, a query unit and a query unit, wherein the construction unit is used for constructing a query template, the query template comprises a first query template and a re-query template, the first query template comprises necessary information for multi-turn conversation, and the re-query is called when any one turn of query after the first turn of the multi-turn conversation is generated;
the object generation unit is used for generating a conversation object according to the closed domain of the conversation robot;
and the sentence generating unit is used for generating a plurality of rounds of dialogue sentences based on the dialogue targets and the question sentence templates.
6. A training method for a dialogue model of an educational robot, comprising:
a training data generation unit for repeatedly executing the method of generating an educational robot dialogue data set according to any one of claims 1 to 3, generating an educational robot dialogue data set including question sentences of a plurality of rounds of dialogue;
a training unit for training an educational robot dialogue model using the educational robot dialogue data set as a training data set;
and the man-machine conversation unit is used for realizing man-machine conversation based on the education robot conversation model.
7. A computer-readable storage medium having a set of computer instructions stored therein; the set of computer instructions, when executed by a processor, implement the method of generating an educational robot dialog data set according to any of claims 1-3 or the method of training an educational robot dialog model according to claim 4.
CN202011147186.5A 2020-10-23 2020-10-23 Method and device for generating educational robot dialogue data set and storage medium Pending CN112256851A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011147186.5A CN112256851A (en) 2020-10-23 2020-10-23 Method and device for generating educational robot dialogue data set and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011147186.5A CN112256851A (en) 2020-10-23 2020-10-23 Method and device for generating educational robot dialogue data set and storage medium

Publications (1)

Publication Number Publication Date
CN112256851A true CN112256851A (en) 2021-01-22

Family

ID=74263582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011147186.5A Pending CN112256851A (en) 2020-10-23 2020-10-23 Method and device for generating educational robot dialogue data set and storage medium

Country Status (1)

Country Link
CN (1) CN112256851A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118101A (en) * 2021-11-26 2022-03-01 北京百度网讯科技有限公司 Dialogue data generation method and device, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018133761A1 (en) * 2017-01-17 2018-07-26 华为技术有限公司 Method and device for man-machine dialogue
CN110008319A (en) * 2019-02-27 2019-07-12 百度在线网络技术(北京)有限公司 Model training method and device based on dialog template
CN110110050A (en) * 2018-01-22 2019-08-09 北京大学 A kind of generation method of media event production question and answer data set
CN110188182A (en) * 2019-05-31 2019-08-30 中国科学院深圳先进技术研究院 Model training method, dialogue generation method, device, equipment and medium
CN110955767A (en) * 2019-12-04 2020-04-03 中国太平洋保险(集团)股份有限公司 Algorithm and device for generating intention candidate set list set in robot dialogue system
CN111078844A (en) * 2018-10-18 2020-04-28 上海交通大学 Task-based dialog system and method for software crowdsourcing
CN111209381A (en) * 2020-01-03 2020-05-29 北京搜狗科技发展有限公司 Time management method and device in conversation scene

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018133761A1 (en) * 2017-01-17 2018-07-26 华为技术有限公司 Method and device for man-machine dialogue
CN110110050A (en) * 2018-01-22 2019-08-09 北京大学 A kind of generation method of media event production question and answer data set
CN111078844A (en) * 2018-10-18 2020-04-28 上海交通大学 Task-based dialog system and method for software crowdsourcing
CN110008319A (en) * 2019-02-27 2019-07-12 百度在线网络技术(北京)有限公司 Model training method and device based on dialog template
CN110188182A (en) * 2019-05-31 2019-08-30 中国科学院深圳先进技术研究院 Model training method, dialogue generation method, device, equipment and medium
CN110955767A (en) * 2019-12-04 2020-04-03 中国太平洋保险(集团)股份有限公司 Algorithm and device for generating intention candidate set list set in robot dialogue system
CN111209381A (en) * 2020-01-03 2020-05-29 北京搜狗科技发展有限公司 Time management method and device in conversation scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈勣: "在线教育客服数据挖掘与对话机器人设计", 中国优秀硕士学位论文全文数据库 信息科技辑 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118101A (en) * 2021-11-26 2022-03-01 北京百度网讯科技有限公司 Dialogue data generation method and device, equipment and medium
CN114118101B (en) * 2021-11-26 2022-12-09 北京百度网讯科技有限公司 Dialogue data generation method and device, equipment and medium

Similar Documents

Publication Publication Date Title
CN107704563B (en) Question recommendation method and system
Latham et al. A conversational intelligent tutoring system to automatically predict learning styles
Mansour Science teachers' views of science and religion vs. the Islamic perspective: Conflicting or compatible?
CN109446509B (en) Dialogue corpus intention analysis method and system and electronic equipment
Berland et al. Supporting the scientific practices through epistemologically responsive science teaching
CN109389870A (en) A kind of data adaptive method of adjustment and its device applied in electronic instruction
CN111078856B (en) Group chat conversation processing method and device and electronic equipment
CN112667796B (en) Dialogue reply method and device, electronic equipment and readable storage medium
CN109524008A (en) A kind of audio recognition method, device and equipment
CN112256851A (en) Method and device for generating educational robot dialogue data set and storage medium
Rafferty et al. Greater learnability is not sufficient to produce cultural universals
CN111444729A (en) Information processing method, device, equipment and readable storage medium
McDaniel et al. Individual differences in structure building: Impacts on comprehension and learning, theoretical underpinnings, and support for less able structure builders
CN115114404A (en) Question and answer method and device for intelligent customer service, electronic equipment and computer storage medium
CN110555100A (en) Multi-product demand matching method and system based on graph generation free dialogue
CN115470329A (en) Dialog generation method and device, computer equipment and storage medium
CN110275946A (en) A kind of FAQ automatic question-answering method and device
Harrison Realigning Philosophy and Wisdom in the 21st Century
CN115132353A (en) Method, device and equipment for generating psychological question automatic response model
Maffia Exploiting the potential of primary historical sources in primary school: a focus on teacher’s actions
Thomas et al. Automatic answer assessment in LMS using latent semantic analysis
Koshinda et al. Machine-learned ranking based non-task-oriented dialogue agent using twitter data
CN113051375A (en) Question-answering data processing method and device based on question-answering equipment
Hill et al. Promoting and assessing mathematical generalising
CN110459079A (en) Text new word based on voice vocabulary spells training method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 116000 room 206, no.8-9, software garden road, Ganjingzi District, Dalian City, Liaoning Province

Applicant after: Neusoft Education Technology Group Co.,Ltd.

Address before: 116000 room 206, no.8-9, software garden road, Ganjingzi District, Dalian City, Liaoning Province

Applicant before: Dalian Neusoft Education Technology Group Co.,Ltd.