CN115396831A

CN115396831A - Interaction model generation method, device, equipment and storage medium

Info

Publication number: CN115396831A
Application number: CN202110503070.9A
Authority: CN
Inventors: 邢彪; 丁东; 胡皓; 陈嫦娇
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Zhejiang Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Zhejiang Co Ltd
Priority date: 2021-05-08
Filing date: 2021-05-08
Publication date: 2022-11-25

Abstract

The invention discloses an interaction model generation method, device, equipment and storage medium, and belongs to the technical field of network communication. According to the method and the device, when the target scene corresponding to the current message creation request does not belong to the existing scene, the interaction model to be migrated with the highest matching degree with the target scene is searched according to the description of the current message scene, then the weight except for the layer to be trained in the interaction model to be migrated is migrated to the new interaction model, then the new interaction model is trained to generate the target interaction model corresponding to the target scene, and as the new interaction model is generated in the interaction model migration mode, the problems of complexity and time consumption caused by completely reestablishing the interaction model can be avoided, and the generation speed of the new interaction model is improved.

Description

Interaction model generation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of network communication technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating an interaction model.

Background

The fifth generation mobile communication technology (5G) message is oriented to thousands of industries, the served scenes present the characteristics of diversification, differentiation and individuation, and the interaction modes of different 5G message scenes can have certain differences.

In the prior art, an interaction model of a 5G message is directly applied to multiple scenes, but due to the diversification of interaction scenes, one interaction model cannot adapt to all scenes well, and in order to enable the interaction model to adapt to a required scene, an interaction model corresponding to the required scene needs to be newly built, but the period of newly building the interaction model is too long, so that the message interaction experience of a terminal user is influenced.

Disclosure of Invention

The invention mainly aims to provide an interactive model generation method, an interactive model generation device, interactive model generation equipment and a storage medium, and aims to solve the technical problem that the message interaction experience of a terminal user is influenced by the periodic process of newly building an interactive model.

In order to achieve the above object, the present invention provides an interaction model generation method, where the interaction model migration method includes:

when a current message creating request is received, extracting a current message scene description from the current message creating request;

when the target scene corresponding to the current message creation request does not belong to the existing scene, searching the interactive model to be migrated with the highest matching degree with the target scene according to the description of the current message scene;

migrating weights except for the layer to be trained in the interactive model to be migrated to a newly-built interactive model;

and training the newly-built interaction model to generate a target interaction model corresponding to the target scene.

Preferably, when the target scene corresponding to the current message creation request does not belong to an existing scene, the step of searching the interaction model to be migrated with the highest matching degree with the target scene according to the description of the current message scene includes:

when the target scene corresponding to the current message creation request does not belong to the existing scene, acquiring an interaction characteristic sequence of the interaction model corresponding to each existing scene;

serializing the current message scene description to obtain a current scene sequence;

determining the matching degree between the current scene sequence and the interactive characteristic sequence of the interactive model corresponding to each existing scene through a pre-trained deep neural network;

and taking the interaction model corresponding to the interaction characteristic sequence with the maximum matching degree as the interaction model to be migrated with the highest matching degree with the target scene.

Preferably, before the step of determining, by using a pre-trained deep neural network, a matching degree between the current message scene description and interaction characteristic information of an interaction model corresponding to each existing scene, the interaction model migration method further includes:

acquiring historical scene description corresponding to the historical scene creation message, and acquiring interaction characteristic information of interaction models corresponding to all existing scenes;

serializing the historical scene description to obtain a historical scene sequence;

carrying out serialization processing on the interaction characteristic information of the interaction model corresponding to each existing scene to obtain an interaction characteristic sequence of the interaction model corresponding to each existing scene;

acquiring a preset matching degree between the historical scene sequence and an interactive characteristic sequence of an interactive model corresponding to each existing scene;

and training the initial deep neural network based on the historical scene sequence, the interactive characteristic sequence of the interactive model corresponding to each existing scene and the acquired preset matching degree to obtain a pre-trained deep neural network.

Preferably, the step of training the initial deep neural network based on the historical scene description, the interactive feature sequence of the interactive model corresponding to each existing scene, and the obtained preset matching degree to obtain a pre-trained deep neural network includes:

and training the initial deep neural network through a gradient descent algorithm based on the historical scene sequence, the interactive characteristic sequence of the interactive model corresponding to each existing scene and the acquired preset matching degree to obtain a pre-trained deep neural network.

Preferably, the layer to be trained is a fully connected layer;

the step of migrating the weights except for the layer to be trained in the interactive model to be migrated to the newly-built interactive model comprises the following steps:

acquiring weights except for the full connection layer in the interactive model to be migrated;

and taking the weight except the full connection layer in the interactive model to be migrated as the initial weight of the newly-built interactive model.

Preferably, before the step of training the newly-created interaction model to generate the target interaction model corresponding to the target scene, the interaction model migration method further includes:

acquiring target interaction training data corresponding to the target scene;

the step of training the newly-built interaction model to generate a target interaction model corresponding to the target scene includes:

and training the newly-built interaction model according to the target interaction training data to generate a target interaction model corresponding to the target scene.

Preferably, the step of training the newly-built interaction model according to the target interaction training data to generate a target interaction model corresponding to the target scene includes:

training the newly-built interaction model through a gradient descent algorithm according to the target interaction training data, to generate a target interaction model corresponding to the target scene.

In addition, to achieve the above object, the present invention further provides an interaction model generation apparatus, including:

the description extraction module is used for extracting the scene description of the current message from the current message creation request when the current message creation request is received;

the model searching module is used for searching the interactive model to be migrated with the highest matching degree with the target scene according to the description of the current message scene when the target scene corresponding to the current message creating request does not belong to the existing scene;

the weight migration module is used for migrating the weights except for the layer to be trained in the interactive model to be migrated to the newly-built interactive model;

and the model training module is used for training the newly-built interaction model to generate a target interaction model corresponding to the target scene.

In addition, to achieve the above object, the present invention also provides an interaction model generation apparatus, including: a memory, a processor and an interaction model generation program stored on the memory and executable on the processor, the interaction model generation program being configured to implement the steps of the interaction model generation method as described above.

In addition, to achieve the above object, the present invention further provides a storage medium having an interaction model generation program stored thereon, the interaction model generation program implementing the steps of the interaction model generation method as described above when executed by a processor.

According to the method and the device, when the target scene corresponding to the current message creation request does not belong to the existing scene, the interaction model to be migrated with the highest matching degree with the target scene is searched according to the description of the current message scene, then the weight except for the layer to be trained in the interaction model to be migrated is migrated to the new interaction model, then the new interaction model is trained to generate the target interaction model corresponding to the target scene, and as the new interaction model is generated in the interaction model migration mode, the problems of complexity and time consumption caused by completely reestablishing the interaction model can be avoided, and the generation speed of the new interaction model is improved.

Drawings

Fig. 1 is a schematic structural diagram of an interaction model generation device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart diagram illustrating a first embodiment of an interaction model generation method according to the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of an interaction model generation method according to the present invention;

FIG. 4 is a schematic flow chart illustrating pre-training of a deep neural network according to a second embodiment of the interaction model generation method of the present invention;

FIG. 5 is a flowchart illustrating a third exemplary embodiment of an interaction model generation method according to the present invention;

FIG. 6 is a block diagram of a newly created interaction model in a third embodiment of an interaction model generation method according to the present invention;

FIG. 7 is a flowchart illustrating a fourth exemplary embodiment of an interaction model generation method according to the present invention;

fig. 8 is a block diagram showing the structure of the interaction model generation apparatus according to the first embodiment of the present invention.

The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an interaction model generation device of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the interaction model generation apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the architecture shown in FIG. 1 does not constitute a limitation of the interaction model generation apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, the memory 1005, which is a storage medium, may include therein an operating system, a data storage module, a network communication module, a user interface module, and an interaction model generation program.

In the interaction model generation apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the interaction model generation device of the present invention may be provided in the interaction model generation device, and the interaction model generation device calls the interaction model generation program stored in the memory 1005 through the processor 1001 and executes the interaction model generation method provided by the embodiment of the present invention.

An embodiment of the present invention provides an interaction model generation method, and referring to fig. 2, fig. 2 is a schematic flow diagram of a first embodiment of the interaction model generation method of the present invention.

In this embodiment, the method for generating an interaction model includes the following steps:

s10: when a current message creation request is received, extracting a current message scene description from the current message creation request.

It should be noted that, for the current message creation request, the current message creation request is usually sent by an industry client using a terminal device, and a current message scenario description is usually carried in the request.

It is understood that the current message scenario description is an expression of the function to be implemented for characterizing the scenario, for example, the current message scenario description of an education category may be: the method and the system realize the collection and analysis of the daily health condition reported information of students and teachers and issue epidemic prevention health guidance to the students and the teachers.

In specific implementation, an execution main body (i.e., an interaction model generation device) of the method of the embodiment may be specifically a 5G message open platform, and specifically, the 5G message open platform may help an industry client to implement multi-scenario A2P (a new e-commerce mode of an agent service provider to a production consumer) communication as needed, and the industry client may quickly complete deployment of a message application through the platform without performing complex code development, and help the industry client to create a 5G message application of itself simply and conveniently.

The 5G message is rich media message application based on GSMA international communication standard, and provides direct, convenient, large-capacity and rich-content video rich media information service, so that comprehensive content service and communication value are brought to clients. The 5G message is based on the converged communication (RCS) and Messaging as a Platform (MaaP) standards, has intelligent interaction capability, and can send multimedia content such as Rich media cards. The 5G message system relies on 5GMC (namely 5G message center) and MaaP, and the 5GMC has the capability of processing short messages and basic multimedia messages and can provide the 5G message/short message receiving and sending for a 5G message terminal. The MaaP platform is an industry gateway supporting rich media information and provides 5G multimedia message service for industry users.

It should be understood that the 5G message provides enhanced personal and application messaging service, implements "message as a service", and introduces a new message interaction mode, namely chat robot (Chatbot), so that the user can intuitively and conveniently enjoy various 5G application services such as payment charging, ticket ordering, hotel reservation, logistics inquiry, restaurant reservation, take-out order and the like in a message window. Chatbot is a service offered by industry clients to end users in a conversational format that simulates human intelligent conversations based on artificial intelligence software to provide specific service functions to the users.

S20: and when the target scene corresponding to the current message creation request does not belong to the existing scene, searching the interaction model to be migrated with the highest matching degree with the target scene according to the description of the current message scene.

It should be noted that, whether a target scene corresponding to the current message creation request belongs to an existing scene may be determined by the current message scene description, and specifically, the current message scene description may be compared with interaction feature information of an interaction model corresponding to each existing scene; if the current message scene description is different from the interaction characteristic information of the interaction model corresponding to each existing scene, the target scene corresponding to the current message creation request can be determined not to belong to the existing scene; and if the current message scene description is the same as the interactive characteristic information of the interactive model corresponding to the current scene, determining that the target scene corresponding to the current message creation request belongs to the current scene.

It can be understood that, when the target scene corresponding to the current message creation request does not belong to the existing scene, since the interaction model of the corresponding existing scene does not exist, the interaction model to be migrated with the highest matching degree with the target scene can be searched according to the description of the current message scene.

In a specific implementation, when the target scene corresponding to the current message creation request belongs to an existing scene, the corresponding interaction model may not need to be re-created, and at this time, the corresponding interaction model may be directly invoked, that is, the interaction model of the corresponding existing scene may be directly invoked.

S30: and migrating the weights except for the layer to be trained in the interactive model to be migrated to the newly-built interactive model.

It should be noted that, for a newly-created interaction model, a layer to be trained may be set therein, and weights except for the layer to be trained may all be obtained by migrating the interaction model to be migrated.

It can be understood that, because the fully connected layer belongs to the last layer in the interaction model, that is, the interaction result of the newly created interaction model is determined by the fully connected layer, in order to further increase the generation speed of the newly created interaction model, in this embodiment, the layer to be trained may be the fully connected layer; specifically, in the step S30, the weights of the interactive model to be migrated except for the fully connected layer may be obtained first, and then the weights of the interactive model to be migrated except for the fully connected layer may be used as the initial weights of the new interactive model.

In specific implementation, the migration refers to freezing a weight parameter of a partial network layer of the model trained in a large-scale source domain, transferring the weight parameter to a small-scale target domain, retraining a final full-link layer of the model, and storing the parameter after training, aiming at an overfitting problem caused by a small sample data set. In the migration learning, the existing knowledge is called a source domain (source domain), and the new knowledge to be learned is called a target domain (target domain). The migration is needed because the data labels are difficult to obtain, and when the data labels of some tasks are difficult to obtain, the learning can be migrated through other tasks which are easy to obtain the labels and are similar to the tasks; building models de novo is complex and time consuming, i.e., requires learning efficiency to be accelerated through migration learning.

The definition of migration may be: given a source domain Ds = { Xs, fs (X) } and a learning task Ts, a target domain DT = { Xt, ft (X) } and a learning task Tt, migration learning assists in learning a prediction function Ft () of a target at the target domain DT by using knowledge acquired by the learning task Ts and the source domain Ds = { Xs, fs (X) } on the condition that the source domain is different from the target domain or the learning task Tt is different from the learning task Ts.

S40: and training the newly-built interaction model to generate a target interaction model corresponding to the target scene.

In a specific implementation, after the newly-built interaction model is trained in multiple rounds, weights after the multiple rounds of training convergence can be used as the weights of the newly-built interaction model, so as to generate a target interaction model corresponding to the target scene.

It should be understood that the new interaction model is also usually implemented by using a Deep Neural Network (DNN), where "depth" in the DNN means that there are many layers of hidden layers in the middle. So deep learning is really a neural network with many hidden layers. A Neuron (Neuron) is a basic unit of a neural network, also called a Node (Node), which receives an Input (Input) from an external or other Node and calculates an Output (Output) by an Activation Function (Activation Function); each input corresponds to a Weight (Weight), i.e., the relative importance of each input received by the node; a Bias (Bias) may be understood as a special input.

In this embodiment, when a target scene corresponding to a current message creation request does not belong to an existing scene, a to-be-migrated interaction model with the highest matching degree with the target scene is searched according to the description of the current message scene, weights in the to-be-migrated interaction model except for a to-be-trained layer are migrated to a new interaction model, and then the new interaction model is trained to generate the target interaction model corresponding to the target scene.

Referring to fig. 3, fig. 3 is a flowchart illustrating an interactive model generating method according to a second embodiment of the present invention.

Based on the first embodiment described above, in the present embodiment, the step S20 includes:

s201: and when the target scene corresponding to the current message creation request does not belong to the existing scene, acquiring an interaction characteristic sequence of the interaction model corresponding to each existing scene.

It should be noted that, for each existing scenario, the existing scenario includes corresponding interaction characteristic information for characterizing the characteristics of the scenario, for example, the interaction characteristic information of an interaction model of an day service class is: the user mainly inquires about the temperature, the humidity, the precipitation, the clothes and other items of the city in which the user is located within three to seven days in the future, and at the moment, the interactive characteristic information can be serialized for facilitating the subsequent processing through the pre-trained deep neural network model.

Specifically, the interactive feature information may be first subjected to text cleaning, that is, the interactive feature information is subjected to word segmentation processing, so that the word segmentation result is "user", "mainly used", "query", "within three days to seven days in the future", "located city", "temperature", "humidity", "precipitation", "dressing", and "other matters", but some commonly used words exist in the word segmentation result, that is, words with little influence on the meaning (for example, "mainly used", "terms", and other words), at this time, these words may be removed, the text cleaning result is "user queries within three days to seven days in the future about temperature, humidity, rainfall, dressing", and then the text cleaning result may be converted into an integer sequence based on the corresponding relationship between each word and the integer, so that a corresponding interactive feature sequence may be obtained, for example: [ "user": 40, "query": 105, "within three to seven days in the future": 8, "city of place": 278, "temperature": 89, "humidity": 164, "humidity": 59, "rain": 21, "dressing": 303], at which point the interactive feature sequence is [40, 105,8, 278, 89, 164, 59, 21, 303].

Of course, when the pre-trained deep neural network is obtained, the interaction feature sequence corresponding to the interaction model in each existing scene is needed, so the interaction feature sequence corresponding to the interaction model in each existing scene can be obtained and stored in advance, and therefore, the interaction feature sequence is directly obtained in step S201.

S202: and carrying out serialization processing on the current message scene description to obtain a current scene sequence.

It should be noted that, for the current message scene description, in order to determine the matching degree between the current message scene description and each current scene, the current message scene description may be serialized in the same manner as the interactive feature information, where text cleaning is performed first, and then the text cleaning result is converted into an integer sequence to obtain the current scene sequence.

S203: and determining the matching degree between the current scene sequence and the interactive characteristic sequence of the interactive model corresponding to each existing scene through a pre-trained deep neural network.

It can be understood that, for the pre-trained deep neural network, since training is performed, the matching degree between the two sequences can be determined, and thus, the matching degree between the current scene sequence and the interaction feature sequence of the interaction model corresponding to each existing scene can be determined through the pre-trained deep neural network.

S204: and taking the interactive model corresponding to the interactive characteristic sequence with the maximum matching degree as the interactive model to be migrated with the highest matching degree with the target scene.

In a specific implementation, since the interaction model corresponding to the interaction feature sequence with the greatest matching degree is generally the interaction model corresponding to the interaction feature sequence with the highest matching degree described in the current message scene, theoretically, the interaction model should have the highest matching degree with the current scene, and therefore, the interaction model corresponding to the interaction feature sequence with the greatest matching degree can be used as the interaction model to be migrated, which has the highest matching degree with the target scene.

In this embodiment, the current message scene description is serialized to obtain a current scene sequence, then the matching degree between the current scene sequence and the interactive feature sequences of the interactive models corresponding to the current scenes is determined through a pre-trained deep neural network, and then the interactive model corresponding to the interactive feature sequence with the largest matching degree is used as the interactive model to be migrated with the highest matching degree with the target scene, so that artificial subjective judgment can be avoided as much as possible, and the interactive model to be migrated can be determined as efficiently and objectively as possible.

Referring to fig. 4, fig. 4 is a flowchart illustrating a method for generating an interaction model according to a third embodiment of the present invention.

Based on the second embodiment, in this embodiment, before the step S203, the interaction model migration method further includes:

s2021: and acquiring historical scene description corresponding to the historical scene creation message, and acquiring interaction characteristic information of the interaction model corresponding to each existing scene.

It should be noted that, because there may generally be historical message creation requests sent by the industry client in the historical information, that is, message creation requests sent by the industry client before the current time, and there may be some requests that require a new interaction model in the message creation requests, at this time, these requests may be used as historical scene creation messages, and these historical scene creation messages may also have historical scene descriptions, so that the historical scene descriptions corresponding to the historical scene creation messages may be obtained.

It can be understood that, as can be seen from the above description, each existing scene includes corresponding interaction feature information, and therefore, the interaction feature information of the interaction model corresponding to each existing scene can also be obtained, and of course, the existing scene in this step refers to the existing scene of the historical scene creation message, that is, after the existing scene of the historical scene creation message is received, and a new interaction model corresponding to the historical scene creation message is generated, the new interaction model is also used as the interaction model corresponding to the existing scene.

S2022: and carrying out serialization processing on the historical scene description to obtain a historical scene sequence.

It should be noted that, for the historical scene description, in order to facilitate training of the initial deep neural network, in this embodiment, the historical scene description may be serialized in the same manner as the interactive feature information, and text cleaning is performed first, and then a text cleaning result is converted into an integer sequence, so as to obtain a historical scene sequence.

S2023: and carrying out serialization processing on the interaction characteristic information of the interaction model corresponding to each existing scene so as to obtain an interaction characteristic sequence of the interaction model corresponding to each existing scene.

It is understood that the specific serialization processing procedure in this step can refer to the related description in S201.

S2024: and acquiring the preset matching degree between the historical scene sequence and the interactive characteristic sequence of the interactive model corresponding to each existing scene.

In a specific implementation, the preset matching degree between the historical scene sequence and the interaction feature sequence of the interaction model corresponding to each existing scene may be generated in a manual labeling manner, that is, the preset matching degree input by a worker may be received, and of course, the value range may be set to 0 to 5.

S2025: and training the initial deep neural network based on the historical scene sequence, the interactive characteristic sequence of the interactive model corresponding to each existing scene and the acquired preset matching degree to obtain a pre-trained deep neural network.

It should be noted that, a total data set for training may be generated based on the historical scene sequence, the interaction feature sequence of the interaction model corresponding to each existing scene, and the acquired preset matching degree, and in order to facilitate subsequent processing, in this embodiment, the total data set may be divided into a training set and a test set, where 90% of the total data set is classified as the training set and 10% of the total data set is classified as the test set. The training set is used for training the deep neural network, and the testing set is used for testing the deep neural network.

It can be understood that, when the initial deep neural network is trained, referring to fig. 5, a deep learning framework can be used for building a multi-branch deep neural network model, a historical scene sequence with preset matching degree and an interactive characteristic sequence of an interactive model corresponding to each existing scene are converted into multi-dimensional space vectors through a word embedding layer, the model extracts feature vectors input by a full connection layer (Dense) and a rejection layer (Dropout) after combination, the similarity relation between every two is automatically extracted, and the predicted similarity score y is finally output _i 。

For the deep neural network model, two parallel input layers are provided, and are respectively used for inputting a historical scene sequence ak and an interaction characteristic sequence si of an interaction model corresponding to each existing scene, and of course, only one interaction characteristic sequence of an interaction model corresponding to an existing scene is usually input each time;

the method is characterized by further comprising two word embedding layers (embedding) which are arranged in parallel, the dimension of input data is set to be scene _ length and NLP _ length respectively, and the dimension of output data is set to be 128 dimensions of the size of a vector space which needs to be converted from a word. The word embedding layer is used for performing vector mapping (word embedding) on each word in the input sequence, namely converting the input sequence into a vector with a fixed shape of 128 dimensions, wherein scene _ length is the maximum length of a historical scene sequence, and NLP _ length is the maximum length of an interactive characteristic sequence of an interactive model corresponding to the existing scene;

wherein, there are also two parallel resicape layers for converting the data shape from (batch _ size, input _ length, embedding _ size) to (batch _ size);

the system also comprises a merging layer (merge) used for splicing the space vectors of the two types of data according to the dimension of a column;

and the hidden layer comprises 2 full connection layers and 2 dropout layers. The first fully-connected layer contains 128 neurons, the second fully-connected layer contains 64 neurons, and the activation functions used by the fully-connected layers are 'relu'. A Dropout layer is introduced after each fully connected layer to effectively avoid overfitting (overfitting), the Dropout layer refers to discarding neurons with a probability p and leaving other neurons with a probability q =1-p, in this embodiment, the discarding probability =0.2 may be set, that is, 20% of the neurons may be randomly ignored to be disabled, and of course, other values may be set as needed.

Wherein, the output layer contains 1 sense neuron, the activation function is set as 'relu', and the matching degree y between the predicted historical scene sequence and the interactive characteristic sequence of the interactive model corresponding to each existing scene is output _i 。

In a specific implementation, the deep neural network model may be trained for 1000 rounds (epochs = 1000), the batch size is set to 10 (batch _ size = 10), the Mean absolute value Error MSE (Mean Squared Error) is selected as a loss function, i.e., an objective function (loss = 'MSE'), and the Error between the predicted matching degree and the preset matching degree is calculated, with the training objective being to minimize the Error:

wherein, y _i To determine the degree of matching between the predicted historical scene sequence and the interaction feature sequence of the interaction model corresponding to the existing scene i,

and the preset matching degree between the historical scene sequence and the interactive characteristic sequence of the interactive model corresponding to the existing scene i.

In order to improve the training efficiency, in this embodiment, the initial deep neural network may be trained through a gradient descent algorithm based on the historical scene sequence, the interaction feature sequence of the interaction model corresponding to each existing scene, and the obtained preset matching degree, so as to obtain a pre-trained deep neural network.

In a specific implementation, the gradient descent algorithm may select an adam optimizer, and the adam optimizer improves the learning speed of the conventional gradient descent (optimizer = 'adam'). The neural network can find the optimal weight value which enables the objective function to be minimum through gradient descent, the training error gradually descends along with the increase of the number of training rounds, and the model gradually converges. After the training is completed, the calculated neural network weight can be substituted into the initial deep neural network to obtain the pre-trained deep neural network.

In this embodiment, the initial deep neural network is trained based on the historical scene sequence, the interaction feature sequence of the interaction model corresponding to each existing scene, and the obtained preset matching degree, so as to obtain the pre-trained deep neural network, and the pre-trained deep neural network can be obtained very quickly and conveniently.

Referring to fig. 6, fig. 6 is a schematic flowchart of an interaction model generation method according to a fourth embodiment of the present invention.

Based on the foregoing first embodiment, in this embodiment, before the step S40, the interaction model migration method further includes:

s300: and acquiring target interactive training data corresponding to the target scene.

The step S40 includes:

s41: and training the newly-built interaction model according to the target interaction training data to generate a target interaction model corresponding to the target scene.

It should be noted that, for the newly-created interaction model, the model structure of the newly-created interaction model is consistent with the model structure selected by the interaction model to be migrated, but different target scenes correspond to different target interaction training data, so that the target interaction training data corresponding to the target scenes needs to be obtained in advance, referring to fig. 7, the newly-created interaction model may build an attention mechanism neural network through a deep learning framework, and may specifically include three parts, namely, an end user state attribute feature extractor, an end user question semantic extractor, and a chatbot answer generator. The terminal user state attribute feature extractor utilizes a Long Short-Term Memory network (LSTM) to extract features of user state attributes acquired from a terminal side of a user, the terminal user question semantic extractor utilizes the LSTM to extract features of a question initiated by a terminal user through a 5G message, and finally the chatbot reply generator utilizes an attention neural network to focus attention on the user state attributes according to the question of the user.

In a specific implementation, for the target interaction scenario, in this embodiment, the target interaction training data may include: an end user state attribute set, an end user question set, and a chatbot answer set.

The set of end-user state attributes may be represented as: v = { V1, V2.,. VN }, where vN is the feature vector of the nth word. The state attribute can comprise current position information of the user, terminal information, user ordering information and the like;

the end-user question set may be expressed as: q = { Q1, Q2.,. QT }, where qT is the feature vector for the tth word;

the chatbot answer set may be expressed as: a = { a1, a 2.,. AM }, where aM is the feature vector of the mth word.

The terminal user state attribute feature extractor is used for inputting indexed problem texts, and the length of each index sequence is terminal _ length, so that the shape of output data of the word embedding layer is (None); first, through the word embedding layer: each word is converted into a vector with word embedding, the input data dimension is terminal _ vocab _ size, the output is set to the space vector needed to convert the word into 128 dimensions, the input sequence length is terminal _ length, so the word embedding layer outputs data with the shape of (None, terminal _ length, 128). The word embedding layer is used for carrying out vector mapping on input words and converting the index of each word into a 128-dimensional fixed shape vector; then, extracting a feature vector V of the attribute text of the terminal user through three LSTM layers (each layer comprises 64 LSTM neurons, and an activation function is relu) and three dropout layers;

the terminal user question semantic extractor is used for inputting indexed question texts, and the length of each index sequence is query _ length, so that the shape of output data of the layer is (None, query _ length); firstly, through a word embedding layer, each word is converted into a vector by the word embedding layer, the dimension of input data is query _ vocab _ size, the output is set to be a space vector which needs to convert the word into 128 dimensions, and the length of an input sequence is query _ length, so that the shape of output data of the word embedding layer is (None, query _ length, 128). The word embedding layer is used for carrying out vector mapping on input words and converting the index of each word into a 128-dimensional fixed shape vector; then, extracting a feature vector Q of a question text of a terminal user through three LSTM layers (each layer comprises 64 LSTM neurons, and an activation function is relu) and three dropout layers;

the chatbot answer generator comprises two fully connected attention layers and one fully connected layer, wherein in a normal condition, an answer corresponding to a question of an end user is generally related to a certain attribute of the end user, and the attention mechanism is to generate an attention weight related to each state attribute of the user by combining information of the question of the end user and the state attribute of the end user and weight the state attribute information, thereby realizing that attention is put on a specific state attribute and linking the question of the user with the related state attribute. Model learning assigns greater attention weights to those state attributes that are more relevant to the problem. The introduction of an attention (attention) mechanism allows the model to focus on relevant parts of the input sequence as required, and the attention network assigns an attention weight to each input, which is recalculated at each output step, the closer the input is to 1 if it is relevant to the current operation, and the closer to 0 otherwise.

The operation of each fully connected attention layer is as follows: firstly, adding a characteristic vector V of an attribute text of a terminal user and a characteristic vector Q of a question text of the terminal user, inputting the result into a full-connection attention layer, finally outputting an intermediate vector ha, and then outputting an intermediate vector h _a Inputting the attention distribution into the softmax function and outputting the attention distribution attention of each state attribute of the user _v 。

attention _v ＝softmax(W _h h _a +b _h )

Wherein, W _V 、b _V 、W _Q 、b _Q 、W _h And b _h Are all weights;

thereafter, according to the distribution c of attention weights _i To calculate the attention weight attention _i The sum of products with the user state attribute feature vector vi,

the output layer is a full connection (Dense) layer, the number of Dense full connection neurons is answer _ vocab _ size, the activation function is set to be 'softmax', and the softmax output result is sent to multiple types of cross entropy loss functions. The shape of the output data of this layer is (None), and the output shape of the attention decoding layer is converted into the dimension of the final output.

In order to improve the training efficiency, in this embodiment, the newly-built interaction model may be trained through a gradient descent algorithm according to the target interaction training data, so as to generate a target interaction model corresponding to the target scene.

In a specific implementation, the number of training rounds may be set to 1000 (epochs = 1000), the batch size may be set to 100 (batch _ size = 100), and the category cross entropy of the category cross entropy may be selected as a loss function, i.e., an objective function (loss = 'category cross entropy'), and the gradient descent algorithm may select an adam optimizer, and improve the learning speed of the conventional gradient descent (optizer = 'adam') through the adam optimizer. The neural network can find the optimal weight value which enables the target function to be minimum through gradient descent, and the neural network can learn the weight value automatically through training. After the training is finished, the calculated neural network weight can be substituted into the newly-built interaction model to generate a target interaction model corresponding to the target scene.

In this embodiment, the target interaction training data corresponding to the target scene is acquired, and then the newly-built interaction model is trained according to the target interaction training data to generate a target interaction model corresponding to the target scene, so that the adaptation degree of the target interaction model and the target scene is ensured.

In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores an interaction model generation program, and the interaction model generation program, when executed by a processor, implements the steps of the interaction model generation method described above.

Referring to fig. 8, fig. 8 is a block diagram illustrating a first embodiment of an interaction model generation apparatus according to the present invention.

As shown in fig. 8, an interaction model generation apparatus provided in an embodiment of the present invention includes:

a description extracting module 801, configured to, when a current message creation request is received, extract a current message scene description from the current message creation request;

a model searching module 802, configured to search, according to the description of the current message scene, an interaction model to be migrated that has a highest matching degree with the target scene when the target scene corresponding to the current message creation request does not belong to an existing scene;

the weight migration module 803 is configured to migrate weights of the interaction model to be migrated, except for the layer to be trained, to the newly-built interaction model;

and the model training module 804 is configured to train the newly-established interaction model to generate a target interaction model corresponding to the target scene.

Further, other embodiments or specific implementation manners of the interaction model generation apparatus of the present invention may refer to the above method embodiments, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An interaction model generation method, wherein the interaction model migration method comprises:

2. The interaction model generation method according to claim 1, wherein when the target scene corresponding to the current message creation request does not belong to an existing scene, the step of searching for the interaction model to be migrated that has the highest matching degree with the target scene according to the current message scene description comprises:

3. The interaction model generation method of claim 2, wherein before the step of determining the degree of matching between the current message scene description and the interaction feature information of the interaction model corresponding to each existing scene by the pre-trained deep neural network, the interaction model migration method further comprises:

4. The interaction model generation method according to claim 3, wherein the step of training the initial deep neural network based on the historical scene description, the interaction feature sequence of the interaction model corresponding to each existing scene, and the obtained preset matching degree to obtain a pre-trained deep neural network comprises:

5. The interaction model generation method of any one of claims 1 to 4, wherein the layer to be trained is a fully connected layer;

the step of migrating the weights of the interactive model to be migrated except for the layer to be trained to the newly-built interactive model comprises the following steps:

acquiring weights of the interactive model to be migrated except for a full connection layer;

6. The interaction model generation method of any one of claims 1 to 4, wherein before the step of training the newly-created interaction model to generate the target interaction model corresponding to the target scene, the interaction model migration method further comprises:

acquiring target interaction training data corresponding to the target scene;

7. The interaction model generation method of claim 6, wherein the step of training the newly-created interaction model according to the target interaction training data to generate a target interaction model corresponding to the target scene comprises:

and training the newly-built interaction model through a gradient descent algorithm according to the target interaction training data to generate a target interaction model corresponding to the target scene.

8. An interaction model generation apparatus, characterized in that the interaction model generation apparatus comprises:

9. An interaction model generation apparatus, characterized in that the apparatus comprises: memory, a processor and an interaction model generation program stored on the memory and executable on the processor, the interaction model generation program being configured to implement the steps of the interaction model generation method of any of claims 1 to 7.

10. A storage medium, characterized in that the storage medium has stored thereon an interaction model generation program which, when executed by a processor, implements the steps of the interaction model generation method according to any one of claims 1 to 7.