CN115438170A

CN115438170A - Dialog model generation method, dialog model application method, dialog model generation system, dialog model application system, dialog model generation equipment and dialog model application equipment

Info

Publication number: CN115438170A
Application number: CN202211394964.XA
Authority: CN
Inventors: 连怡鑫; 刘剑锋; 杜晓薇; 王宝元
Original assignee: Beijing Hongmian Xiaoice Technology Co Ltd
Current assignee: Beijing Hongmian Xiaoice Technology Co Ltd
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2022-12-06

Abstract

The invention discloses a method, a system, equipment and a storage medium for generating and applying a dialogue model, and relates to the technical field of computers. A conversation model generation method comprises the steps of obtaining a preset conversation sample; preprocessing a conversation sample to obtain ternary group data; inputting the triple data into a preset natural language model for semantic understanding, and outputting a semantic intention corresponding to the triple data; and inputting the ternary data and the semantic intention into an initial neural network for training to obtain a dialogue model. And then, a dialogue model application method is used for applying the dialogue model generated by the dialogue model generation method, so that a dialogue model with a good dialogue effect is trained through multi-source multi-modal data, and based on the application dialogue model, the dialogue migration scene capacity of the dialogue model is greatly improved, and the information richness and accuracy of dialogue reply are enhanced.

Description

Dialog model generation method, dialog model application method, dialog model generation system, dialog model application system, dialog model generation equipment and dialog model application equipment

Technical Field

The invention relates to the technical field of computers, in particular to a method, a system, equipment and a storage medium for generating and applying a conversation model.

Background

In the related art, the existing academic community dialogue system is represented by generative dialogue pre-training, which generally does not support dialogue reply generation based on various knowledge (such as text, picture, user portrait knowledge), while the existing industrial community dialogue system is represented by task-based dialogue and retrieval-based dialogue, such as some intelligent customer service only supports dialogue reply generation based on dialogue context and dialogue state management; the generalization of some intelligent dialogue systems is influenced by the corpus, and the dialogue effect is not good.

The current mainstream dialogue system generally has the conditions that model knowledge is insufficient, and dialogue capabilities in different scenes cannot be migrated, and particularly in the information world varying day by day, topic knowledge and scenes communicated by users are changeable, so that severe challenges are caused to the mainstream dialogue system.

Disclosure of Invention

In view of the above, the present invention provides a method, a system, a device and a storage medium for generating and applying a dialog model, so as to train a dialog model with a better dialog effect through multi-source multi-modal data, and based on the application of the dialog model, greatly improve the dialog migration scene capability of the dialog model, and enhance the information richness and accuracy of dialog reply.

In a first aspect, the present invention provides a method for generating a dialogue model, including:

acquiring a preset conversation sample;

preprocessing the dialogue sample to obtain ternary group data;

inputting the triple data into a preset natural language model for semantic understanding, and outputting a semantic intention corresponding to the triple data;

and inputting the triple data and the semantic intention into an initial neural network for training to obtain a dialogue model, wherein the dialogue model at least comprises a visual encoder, a text encoder and a decoder.

Preferably, according to the dialogue model generation method provided by the invention,

the obtaining of the preset dialog sample includes:

determining a target extraction path from a plurality of extraction paths by using a preset sample extraction strategy;

and acquiring the dialogue sample according to the sample extraction strategy and the target extraction path.

the preprocessing the dialogue sample to obtain triple data comprises the following steps:

cleaning the dialogue sample to obtain a standard dialogue sample;

and formatting the standard dialogue sample according to a preset triple format to generate the triple data.

the step of formatting the standard dialogue sample according to a preset triple format to generate the triple data includes:

formatting the standard dialogue sample according to the triple format to obtain initial triple data;

detecting the initial triple data according to a preset detection strategy, and identifying abnormal initial triple data;

and screening the ternary group data from the initial ternary group data according to the abnormal initial ternary group data.

In a second aspect, the present invention also provides a dialog model application method applied to the dialog model generation method according to the first aspect, wherein the dialog model at least comprises a visual encoder, a text encoder and a decoder;

the dialogue model application method comprises the following steps:

acquiring user data of a user in different scenes, wherein the user data at least comprises image data and text data;

inputting the image data into a visual encoder to perform image coding to obtain image coded data, and inputting the text data into a text encoder to perform text coding to obtain text coded data;

inputting the image coded data and the text coded data into a sensing sampler for processing to obtain corresponding image hidden variables and text hidden variables;

and inputting the image hidden variable and the text hidden variable into a decoder for processing, and outputting the current reply sentence.

Preferably, according to the dialogue model application method provided by the present invention,

the inputting the image hidden variable and the text hidden variable into a decoder for processing and outputting the current reply sentence corresponding to the user data comprises:

generating a dialog generation task under the condition that the decoder detects the image hidden variable and/or the text hidden variable;

and inputting the conversation generation task into the conversation model and outputting the current reply statement.

In a third aspect, the present invention further provides a system for generating a dialogue model, the system comprising:

the conversation sample acquisition module is used for acquiring a preset conversation sample;

the preprocessing module is used for preprocessing the dialogue sample to obtain ternary group data;

the semantic understanding module is used for inputting the triple data into a preset natural language model for semantic understanding and outputting a semantic intention corresponding to the triple data;

and the dialogue model generation module is used for inputting the triple data and the semantic intention into an initial neural network for training to obtain a dialogue model.

In a fourth aspect, the present invention further provides a dialog model application system, including:

the system comprises a user data acquisition module, a user data acquisition module and a user data processing module, wherein the user data acquisition module is used for acquiring user data of a user in different scenes, and the user data at least comprises image data and text data;

the encoding module is used for inputting the image data into a visual encoder to perform image encoding to obtain image encoding data, and inputting the text data into a text encoder to perform text encoding to obtain text encoding data;

the generation hidden variable module is used for inputting the image coded data and the text coded data into a sensing sampler for processing to obtain corresponding image hidden variables and text hidden variables;

and the output reply statement module is used for inputting the image hidden variable and the text hidden variable into a decoder for processing and outputting the current reply statement corresponding to the user data.

In a fifth aspect, the present invention further provides an electronic device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the dialogue model generation method according to any one of the first aspect or to implement the steps of the dialogue model application method according to the second aspect.

In a sixth aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, which computer program, when executed by a processor, implements the steps of the dialogue model generation method according to any one of the first aspects described above, or implements the steps of the dialogue model application method according to the second aspect described above.

In a seventh aspect, the present invention also provides a computer program product comprising a computer program which, when being executed by a processor, carries out the steps of the dialogue model generation method according to any one of the above first aspects, or carries out the steps of the dialogue model application method according to the second aspect.

The invention provides a method, a system, equipment and a storage medium for generating and applying a dialogue model, wherein the dialogue model generating method is used for generating a dialogue model by acquiring a preset dialogue sample; preprocessing the dialogue sample to obtain ternary group data; inputting the triple data into a preset natural language model for semantic understanding, and outputting a semantic intention corresponding to the triple data;

and inputting the triple data and the semantic intention into an initial neural network for training to obtain a dialogue model. The dialogue model application method comprises the steps of collecting user data of a user in different scenes, wherein the user data at least comprises image data and text data; inputting the image data into a visual encoder to perform image coding to obtain image coded data, and inputting the text data into a text encoder to perform text coding to obtain text coded data; inputting the image coded data and the text coded data into a sensing sampler for processing to obtain corresponding image hidden variables and text hidden variables; and inputting the image hidden variable and the text hidden variable into a decoder for processing, and outputting the current reply sentence. The method realizes training of the dialogue model with good dialogue effect through multi-source multi-modal data, greatly improves the dialogue transfer scene capacity of the dialogue model based on the application dialogue model, and enhances the information richness and accuracy of dialogue reply.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of a dialogue model generation method provided by the present invention;

FIG. 2 is a flow diagram of one embodiment of a dialog model application method provided by the present invention;

FIG. 3 is a second schematic diagram of a dialogue model application provided by the present invention;

FIG. 4 is a schematic diagram of a dialogue model generation system provided by the present invention;

FIG. 5 is a schematic diagram of the structure of a dialogue model application system provided by the present invention;

fig. 6 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A method, system, device and storage medium for generating and applying a dialogue model according to the present invention are described with reference to fig. 1-6.

As shown in fig. 1, which is a schematic diagram of an implementation flow of a dialog model generation method according to an embodiment of the present invention, a dialog model generation method may include, but is not limited to, steps S100 to S400.

S100, acquiring a preset conversation sample;

s200, preprocessing the dialogue sample to obtain ternary group data;

s300, inputting the triple data into a preset natural language model for semantic understanding, and outputting a semantic intention corresponding to the triple data;

s400, inputting the triple data and the semantic intention into an initial neural network for training to obtain a dialogue model, wherein the dialogue model at least comprises a visual encoder, a text encoder and a decoder.

In step S100 of some embodiments, a preset dialog sample is obtained.

It is understood that the computer executing program may determine a target extraction path from a plurality of extraction paths by using a preset sample extraction policy, and then obtain the dialog sample according to the sample extraction policy and the target extraction path.

It should be noted that the dialogue sample at least includes a sample dialogue format, an extraction path, and a dialogue statement.

In step S200 of some embodiments, the dialogue sample is preprocessed to obtain triple data.

It can be understood that, after the step of obtaining the preset dialog sample in step S100 is executed, the specific execution steps may be firstly, performing cleaning processing on the dialog sample to obtain a standard dialog sample, then performing formatting processing on the standard dialog sample according to a triple format to obtain initial triple data, detecting the initial triple data according to a preset detection strategy, identifying abnormal initial triple data, and screening the triple data from the initial triple data according to the abnormal initial triple data.

In step S300 of some embodiments, the triple data is input into a preset natural language model for semantic understanding, and a semantic intention corresponding to the triple data is output.

It can be understood that, after the step of preprocessing the dialogue sample to obtain triple data is completed in step S200, the specific execution step may be to input the triple data into a preset natural language model for semantic understanding, and output a semantic intention corresponding to the triple data.

It should be noted that the triple data is input into the Natural Language model to perform Natural Language understanding, that is, natural Language Processing, and Natural Language Processing (NLP) is a subject of Language, and a subject of analyzing, understanding and Processing Natural Language is analyzed, understood and processed by using computer technology, that is, a computer is used as a powerful tool for Language research, and Language information is quantitatively researched under the support of the computer, and Language description which can be used by both people and the computer is provided. The method comprises two parts of Natural Language Understanding (NLU) and Natural Language Generation (NLG).

The natural language processing can classify processing tasks to form a plurality of subtasks, and the traditional mechanical learning method can process the plurality of subtasks in the natural language by utilizing methods such as SVM (support vector machine model), markov model, CRF (conditional random field model) and the like, so that the accuracy of processing results is further improved.

In some embodiments, deep learning is a big branch of machine learning, and a deep learning model, such as a convolutional neural network, a cyclic neural network, or the like, needs to be applied in natural language processing to complete the process of natural language classification and understanding by learning the generated word vectors.

Optionally, the triple data is input into a preset natural language model for semantic understanding, and a semantic intention corresponding to the triple data is output, and it can also be understood that semantic understanding can be performed based on "knowledge" assisted by "history dialogue" of the triple data when performing semantic understanding on "knowledge" of the triple data.

In some embodiments of the present application, the natural language model is a deep learning model, and the triple data is input into a preset natural language model for semantic understanding, so as to output a semantic intention corresponding to the triple data.

In step S400 of some embodiments, the triple data and the semantic intent are input to an initial neural network for training, resulting in a dialogue model.

It can be understood that, after the step S300 of inputting the triplet data into a preset natural language model for semantic understanding and outputting a semantic intention corresponding to the triplet data is completed, the specific implementation step may be to input the triplet data and the semantic intention into an initial neural network for training to obtain a dialogue model. The generated dialogue model can be used for intelligently recommending appropriate reply sentences.

The processing of massive natural languages by using a semi-supervised or unsupervised machine learning method also corresponds to the development process of machine learning, and can be roughly divided into two stages: traditional machine learning based on a linear model of a discrete representation, deep learning based on a non-linear model of a continuous representation.

The deep learning is a computer automatic learning algorithm, which comprises an input layer, a hidden layer and an output layer, wherein the input layer is a large amount of data provided by researchers, the data are processing objects of the algorithm, namely ternary data and the semantic intention, the number of layers of the hidden layer is determined by a training strategy, the process of performing feature marking on the data by the algorithm, finding rules in the data and establishing connection between feature points is performed, the output layer is an obtained training result, generally, the more data obtained by the input layer, the more the number of layers of the hidden layer, the better the distinguishing result of the data is, and the higher the accuracy of the obtained dialogue model is when outputting dialogue sentences.

In some embodiments of the present invention, the obtaining of the preset dialog sample includes:

It is understood that the computer executing program determines a target extraction path from the plurality of extraction paths according to the configured sample extraction policy, thereby obtaining the dialogue sample according to the sample extraction policy and the target extraction path.

It should be noted that the plurality of extraction paths at least include a local extraction path, a web page extraction path, a cloud data extraction path, and the like.

And the target extraction path is, for example, a webpage extraction path, and the dialogue sample is extracted according to the sample extraction strategy and the webpage extraction path.

Further, when extracting the dialogue sample, the dialogue sample can be extracted automatically for an extraction dialogue sample program developed in advance by using Python language.

In some embodiments of the present invention, the preprocessing the dialog sample to obtain triple data includes:

cleaning the dialogue sample to obtain a standard dialogue sample;

It will be appreciated that the computer program first performs a cleaning process on the dialogue sample to obtain a standard dialogue sample.

For example, the extracted dialog sample 1 is as follows: above the dialog 1: "Xiaobing classmate, ask for how your mood is today

"reply sentence 1: ' I have a very good mood today, you woollen

”

Dialog sample 2 is as follows: dialogue above 2: "Xiaobing classmate, ask for how your mood is today

"reply statement 2 is null.

The dialog sample 2 is an abnormal dialog sample data, and if there is data loss, there is no good training effect in training the generated dialog model, so that the extracted dialog sample needs to be cleaned to obtain a standard dialog sample.

After cleaning the dialogue sample to obtain a standard dialogue sample, formatting the standard dialogue sample according to the triple format to obtain initial triple data, detecting the initial triple data according to a preset detection strategy, identifying abnormal initial triple data, and screening out the triple data from the initial triple data according to the abnormal initial triple data.

In some embodiments of the present invention, the formatting the standard dialog sample according to a preset triplet format to generate the triplet data includes:

It will be appreciated that the standard dialog sample is formatted according to the triplet format resulting in initial triplet data.

It should be noted that the format of the triplet is "[ knowledge, history above, current reply statement ]".

Formatting the standard dialog sample according to the triple format to obtain initial triple data, where the initial triple data may include: [ personality figure, historical dialogue upper text, current reply sentence ], [ scene tag, historical dialogue upper text, current reply sentence ], [ knowledge, historical dialogue upper text, current reply sentence ], [ local memory, historical dialogue upper text, current reply sentence ].

And detecting the initial triple data according to a preset detection strategy, and identifying abnormal initial triple data so as to screen out normal triple data after formatting.

The abnormal initial ternary group data may be initial ternary group data lacking a relationship, initial ternary group data lacking a local memory, and the like.

These anomalous initial triple data are identified for screening.

The preset detection strategy may be to detect each field of the initial triplet and a corresponding field length, so as to identify abnormal initial triplet data according to each field and the corresponding field length.

It is understood that after the abnormal initial triple data is screened out, the triple data is screened out from the initial triple data according to the abnormal initial triple data.

It should be further noted that the triple data is polymorphic, that is, it may be in a text form or a picture form, and the personality portrait and the scene tag are both data in a text form.

When the triple data is in the form of pictures, such as illustrations in news webpage data; or picture information of speaker discussion, such as that the speaker has a diet picture and starts to discuss the picture with another speaker. Such pictures can come from either a web search or a speaker's local album (local database). It should be further noted that the video information may be converted into multiple pictures through frame-by-frame processing, and thus the processing for pictures may also be extended to support video input.

The invention provides a dialogue model generation method, a system, equipment and a storage medium, wherein the dialogue model generation method is realized by acquiring a preset dialogue sample; preprocessing the dialogue sample to obtain ternary group data; inputting the triple data into a preset natural language model for semantic understanding, and outputting a semantic intention corresponding to the triple data; and inputting the ternary group data and the semantic intention into an initial neural network for training to obtain a dialogue model. The method realizes training of the dialogue model with good dialogue effect through multi-source multi-modal data, greatly improves the dialogue transfer scene capacity of the dialogue model based on the application dialogue model, and enhances the information richness and accuracy of dialogue reply.

As shown in fig. 2, which is a schematic flow chart illustrating an implementation of a dialog model application method applying the above-mentioned dialog model generation method according to an embodiment of the present invention, the dialog model application method may include, but is not limited to, steps S210 to S240.

S210, collecting user data of a user in different scenes, wherein the user data at least comprises image data and text data;

s220, inputting the image data into a visual encoder to perform image coding to obtain image coded data, and inputting the text data into a text encoder to perform text coding to obtain text coded data;

s230, inputting the image coded data and the text coded data into a perception sampler for processing to obtain corresponding image hidden variables and text hidden variables;

s240, inputting the image hidden variable and the text hidden variable into a decoder for processing, and outputting the current reply sentence.

In step S210 of some embodiments, user data of the user in different scenarios is collected.

It will be appreciated that the computer program collects user data for different scenarios of the user.

It should be noted that the user data includes at least image data and text data. The user data is conversational text data, such as asking the user how to learn from ice-cold, and how to do so today

”

The scene at least comprises: finance, news, sports, entertainment, live broadcasting, question answering, traveling, makeup, E-commerce and other scenes.

In step S220 of some embodiments, the image data is input to a visual encoder for image encoding to obtain image encoded data, and the text data is input to a text encoder for text encoding to obtain text encoded data.

It is understood that after the step of collecting user data of the user in different scenarios in step S210 is completed, the specific steps may be: and inputting the image data into a visual encoder to perform image coding to obtain image coded data, and inputting the text data into a text encoder to perform text coding to obtain text coded data.

The textEncoder Basic is a simple and practical text encoder, can help a user to change the encoding or line feed character types of a plurality of text files, can also carry out the operation of independently opening, editing and saving all the files, and is very simple and convenient.

In step S230 of some embodiments, the image coded data and the text coded data are input to a perceptual sampler for processing, so as to obtain corresponding image hidden variables and text hidden variables.

It can be understood that after the step S220 of inputting the image data into the visual encoder for image encoding to obtain image encoded data and inputting the text data into the text encoder for text encoding to obtain text encoded data is completed, the specific step may be inputting the image encoded data obtained in step S220 and the text encoded data into the sensing sampler for processing to obtain corresponding image hidden variables and text hidden variables.

The sensing sampler compresses the input data length to a fixed length. Taking text encoding data as an example, a text encoding input sensing sampler with a length of 1000, for example, finally a text hidden variable with a length of 100 is output.

In step S240 of some embodiments, the image hidden variable and the text hidden variable are input into a decoder for processing, and a current reply sentence is output.

It can be understood that after the step S230 of inputting the image coded data and the text coded data into the perceptual sampler for processing to obtain corresponding image hidden variables and text hidden variables is performed, the specific steps may be generating a dialog generating task, inputting the dialog generating task into the dialog model, and outputting the current reply statement.

In some embodiments of the present invention, the inputting the image hidden variable and the text hidden variable into a decoder for processing and outputting a current reply sentence corresponding to the user data includes:

It is to be understood that, in the case where the image hidden variable and/or the text hidden variable is detected by the decoder, a dialog generating task is generated, the dialog generating task is input to the dialog model, and the current reply sentence is output. More specifically, after inputting the dialog generation task into the dialog model, the dialog model may automatically generate a current reply statement based on the first two triplets of triplets (knowledge, historical dialog context) and output the generated current reply statement through the dialog model.

For example, the dialog that the user asks is as follows: "Xiaobing Tong Zhi, how today is

The dialogue model generation system can call weather data on the same day which is published on the network in real time, and learn corresponding knowledge in the triple data, so that a current reply statement' Beijing is cloudy in the current weather, 8-15 ℃ and has haze, and a user is required to make protection. "

Fig. 3 is a schematic diagram of a dialog model application provided by the present invention, and example of triple data [ scene tags, historical dialog context, current reply statements ] is as follows: for a scene label 'live broadcast', when a model is actually input, a plurality of detailed natural language descriptions are performed on the label, such as: the method is characterized in that the method is a live broadcast scene, a host uses a computer or a mobile phone to broadcast the live doing things, and audiences can enjoy the favorite host or contents for money and achieve the purpose of entertainment or teaching, or the method is a live broadcast scene, the host mainly attracts users in an entertainment or teaching mode to achieve the purpose of profit, and not only two characters are broadcast in a live broadcast mode.

With this type of augmentation method, as shown in fig. 3 below, when the new scene is a sales robot scene, we also make a natural language description of the vertical field of "sales": "sales scenario, sales force primarily provides products to customers to serve them for profitability". Because the model has learned the "live" scenario during the training phase, which is also "profitable" and "serves the user," the model can migrate the "live" scenario dialogue capability learned during the training phase to the new "sales" scenario.

The invention provides a dialogue model application method, a system, equipment and a storage medium, wherein the dialogue model application method is characterized in that user data of a user in different scenes are collected, and the user data at least comprises image data and text data; inputting the image data into a visual encoder to perform image coding to obtain image coded data, and inputting the text data into a text encoder to perform text coding to obtain text coded data; inputting the image coded data and the text coded data into a sensing sampler for processing to obtain corresponding image hidden variables and text hidden variables; and inputting the image hidden variable and the text hidden variable into a decoder for processing, and outputting the current reply sentence. The method realizes training of the dialogue model with good dialogue effect through multi-source multi-modal data, greatly improves the dialogue transfer scene capacity of the dialogue model based on the application dialogue model, and enhances the information richness and accuracy of dialogue reply.

In the following, a description is given of an dialogue model generation system according to the present invention, and a dialogue model generation system described below and an dialogue model generation method described above may be referred to in correspondence with each other.

Referring to fig. 4, a schematic structural diagram of a dialog model generation system provided by the present invention is shown, where the dialog model generation system includes:

a session sample obtaining module 410, configured to obtain a preset session sample;

the preprocessing module 420 is configured to preprocess the dialog sample to obtain triple data;

the semantic understanding module 430 is configured to input the triple data into a preset natural language model for semantic understanding, and output a semantic intention corresponding to the triple data;

and a dialogue model generation module 440, configured to input the triple data and the semantic intent into an initial neural network for training, so as to obtain a dialogue model, where the dialogue model at least includes a visual encoder, a text encoder, and a decoder.

According to a dialog model generation system of the present invention, the get dialog sample module 410 is configured to determine a target extraction path from a plurality of extraction paths by using a preset sample extraction policy;

According to the dialog model generation system, the preprocessing module 420 is used for cleaning the dialog sample to obtain a standard dialog sample;

According to the dialog model generation system of the present invention, the preprocessing module 420 is configured to format the standard dialog sample according to the triplet format to obtain initial triplet data;

The invention provides a method, a system, equipment and a storage medium for generating and applying a conversation model, wherein the conversation model generating method is used for acquiring a preset conversation sample; preprocessing the dialogue sample to obtain ternary group data; inputting the triple data into a preset natural language model for semantic understanding, and outputting a semantic intention corresponding to the triple data;

and inputting the ternary group data and the semantic intention into an initial neural network for training to obtain a dialogue model. The dialogue model application method comprises the steps of collecting user data of a user in different scenes, wherein the user data at least comprises image data and text data; inputting the image data into a visual encoder to perform image coding to obtain image coding data, and inputting the text data into a text encoder to perform text coding to obtain text coding data; inputting the image coded data and the text coded data into a sensing sampler for processing to obtain corresponding image hidden variables and text hidden variables; and inputting the image hidden variable and the text hidden variable into a decoder for processing, and outputting the current reply sentence. The method realizes training of the dialogue model with good dialogue effect through multi-source multi-modal data, greatly improves the dialogue transfer scene capacity of the dialogue model based on the application dialogue model, and enhances the information richness and accuracy of dialogue reply.

In the following, a description is given of an application system of a dialogue model according to the present invention, and a corresponding reference may be made between the application system of a dialogue model described below and the application method of a dialogue model described above.

Referring to fig. 5, a schematic structural diagram of a dialog model application system provided in the present invention is shown, where the dialog model application system includes:

a collect user data module 510, configured to collect user data of a user in different scenes, where the user data at least includes image data and text data;

the encoding module 520 is configured to input the image data into a visual encoder to perform image encoding to obtain image encoded data, and input the text data into a text encoder to perform text encoding to obtain text encoded data;

a hidden variable generation module 530, configured to input the image coded data and the text coded data into a sensing sampler for processing, so as to obtain corresponding hidden variables of the image and hidden variables of the text;

and an output reply statement module 540, configured to input the image hidden variable and the text hidden variable into a decoder for processing, and output a current reply statement corresponding to the user data.

According to the dialog model application system provided by the present invention, the output reply statement module 540 is configured to generate a dialog generation task when the decoder detects the hidden image variable and/or the hidden text variable;

Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor) 610, a communication Interface (Communications Interface) 620, a memory (memory) 630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. Processor 610 may invoke logic instructions in memory 630 to perform a dialogue model generation method comprising: acquiring a preset conversation sample; preprocessing the dialogue sample to obtain ternary group data; inputting the triple data into a preset natural language model for semantic understanding, and outputting a semantic intention corresponding to the triple data; and inputting the triple data and the semantic intention into an initial neural network for training to obtain a dialogue model. And executing a dialog model application method, the method comprising: acquiring user data of a user in different scenes, wherein the user data at least comprises image data and text data; inputting the image data into a visual encoder to perform image coding to obtain image coded data, and inputting the text data into a text encoder to perform text coding to obtain text coded data; inputting the image coded data and the text coded data into a sensing sampler for processing to obtain corresponding image hidden variables and text hidden variables; and inputting the image hidden variable and the text hidden variable into a decoder for processing, and outputting the current reply sentence.

In addition, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer-readable storage medium, the computer program, when executed by a processor, being capable of executing a method for generating a dialogue model provided by the above methods, the method comprising: acquiring a preset conversation sample; preprocessing the dialogue sample to obtain ternary group data; inputting the triple data into a preset natural language model for semantic understanding, and outputting a semantic intention corresponding to the triple data; and inputting the ternary group data and the semantic intention into an initial neural network for training to obtain a dialogue model. And executing a dialog model application method, the method comprising: acquiring user data of a user in different scenes, wherein the user data at least comprises image data and text data; inputting the image data into a visual encoder to perform image coding to obtain image coded data, and inputting the text data into a text encoder to perform text coding to obtain text coded data; inputting the image coded data and the text coded data into a sensing sampler for processing to obtain corresponding image hidden variables and text hidden variables; and inputting the image hidden variable and the text hidden variable into a decoder for processing, and outputting the current reply sentence.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a dialog model generation method provided by the above methods, the method comprising: acquiring a preset conversation sample; preprocessing the dialogue sample to obtain ternary group data; inputting the triple data into a preset natural language model for semantic understanding, and outputting a semantic intention corresponding to the triple data; and inputting the triple data and the semantic intention into an initial neural network for training to obtain a dialogue model. And executing a conversation model application method, the method comprising: acquiring user data of a user in different scenes, wherein the user data at least comprises image data and text data; inputting the image data into a visual encoder to perform image coding to obtain image coded data, and inputting the text data into a text encoder to perform text coding to obtain text coded data; inputting the image coded data and the text coded data into a sensing sampler for processing to obtain corresponding image hidden variables and text hidden variables; and inputting the image hidden variable and the text hidden variable into a decoder for processing, and outputting the current reply sentence.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A dialogue model generation method, comprising:

acquiring a preset conversation sample;

preprocessing the dialogue sample to obtain ternary group data;

2. The dialogue model generation method of claim 1,

the obtaining of the preset dialog sample includes:

3. The dialogue model generation method of claim 1,

cleaning the dialogue sample to obtain a standard dialogue sample;

4. The dialogue model generation method of claim 3,

the step of formatting the standard dialog sample according to a preset triple format to generate the triple data includes:

5. A dialogue model application method applied to the dialogue model generation method of any one of claims 1 to 4, wherein the dialogue model includes at least a visual encoder, a text encoder, and a decoder;

the dialogue model application method comprises the following steps:

inputting the image data into the visual encoder to perform image coding to obtain image coding data, and inputting the text data into the text encoder to perform text coding to obtain text coding data;

and inputting the image hidden variable and the text hidden variable into the decoder for processing, and outputting the current reply sentence.

6. The dialogue model application method of claim 5,

7. A dialogue model generation system, the system comprising:

the conversation sample obtaining module is used for obtaining a preset conversation sample;

and the dialogue model generation module is used for inputting the triple data and the semantic intention into an initial neural network for training to obtain a dialogue model, wherein the dialogue model at least comprises a visual encoder, a text encoder and a decoder.

8. A dialogue model application system, the system comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the dialogue model generation method according to any one of claims 1 to 4 or the steps of the dialogue model application method according to claim 5 or 6.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the dialogue model generation method according to any one of claims 1 to 4 or implements the steps of the dialogue model application method according to claim 5 or 6.