CN114139553A

CN114139553A - Dialog text generation method and device, electronic equipment and storage medium

Info

Publication number: CN114139553A
Application number: CN202111473465.5A
Authority: CN
Inventors: 袁梦菲
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-04

Abstract

The embodiment of the application provides a dialog text generation method and device, electronic equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring original dialogue data; carrying out feature extraction on the original dialogue data to obtain target dialogue data; coding the target dialogue data to obtain an initial dialogue hidden variable; resampling the initial dialogue hidden variables according to preset attribute feature data to obtain standard dialogue hidden variables; decoding the standard dialogue hidden variable to obtain a target dialogue text; and carrying out semantic analysis processing on the target dialog text to obtain a standard dialog text. The method and the device for processing the dialog text can improve the accuracy of the dialog text.

Description

Dialog text generation method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for generating a dialog text, an electronic device, and a storage medium.

Background

With the development of the artificial intelligence technology, the utilization rate of the conversation robot is gradually increased, the conversation robot can identify user semantics according to information input by a user, and then corresponding replies are generated according to the user semantics. At present, a dialog robot mainly generates a dialog text by using a dialog generation model based on an Encoder-decoder or seq2seq form, and the dialog text is often generated in a single reply sentence in the dialog text and cannot be used in different application scenes, so that the accuracy of the dialog text is influenced. Therefore, how to apply the dialog text to different dialog scenes and improve the accuracy of the dialog text becomes a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the application mainly aims to provide a dialog text generation method, a dialog text generation device, an electronic device and a storage medium, and aims to improve the accuracy of dialog texts.

In order to achieve the above object, a first aspect of an embodiment of the present application provides a dialog text generation method, where the method includes:

acquiring original dialogue data;

carrying out feature extraction on the original dialogue data to obtain target dialogue data;

coding the target dialogue data to obtain an initial dialogue hidden variable;

resampling the initial dialogue hidden variable according to preset attribute feature data to obtain a standard dialogue hidden variable

Decoding the standard dialogue hidden variable to obtain a target dialogue text;

and carrying out semantic analysis processing on the target dialog text to obtain a standard dialog text.

In some embodiments, the step of performing feature extraction on the original dialog data to obtain target dialog data includes:

extracting entity dialogue features in the original dialogue data;

classifying the entity dialogue features through a pre-trained sequence classifier to obtain dialogue parameter features;

and carrying out convolution processing on the dialogue parameter characteristics to obtain the target dialogue data.

In some embodiments, the step of encoding the target dialog data to obtain an initial dialog hidden variable includes:

mapping the target dialogue data to a preset vector space to obtain target dialogue characteristics;

coding the target dialogue features according to a preset coding sequence and a preset coding dimension to obtain a first dialogue hidden variable;

and performing downsampling processing on the first session hidden variable to obtain the initial session hidden variable.

In some embodiments, the resampling the initial dialog hidden variable according to preset attribute feature data to obtain a standard dialog hidden variable includes:

analyzing the attribute characteristic data to generate a target Gaussian distribution network;

and resampling the initial dialogue hidden variable through the target Gaussian distribution network to obtain a standard dialogue hidden variable.

In some embodiments, the step of decoding the standard dialog hidden variable to obtain the target dialog text includes:

decoding the standard dialogue hidden variable to obtain a second dialogue hidden variable;

performing upsampling processing on the second dialogue hidden variable to obtain a target dialogue hidden variable;

and activating the target dialogue hidden variable through a preset function to obtain the target dialogue text.

In some embodiments, the step of performing semantic analysis processing on the target dialog text to obtain a standard dialog text includes:

extracting semantic features of the target dialog text;

segmenting the target dialogue text according to the semantic features to obtain target dialogue fields;

calculating the similarity of the target dialogue field and the reference dialogue field;

and obtaining the standard dialog text according to the similarity.

In some embodiments, the step of obtaining the standard dialog text according to the similarity includes:

screening the dialogue sentences in a preset sentence database according to the similarity to obtain a candidate sentence set;

performing supplementary processing on the dialogue sentences in the candidate sentence set to obtain standard dialogue sentences;

and performing fusion processing on the standard dialogue sentences to obtain the standard dialogue texts.

In order to achieve the above object, a second aspect of an embodiment of the present application proposes a dialog text generation apparatus, including:

the original dialogue data acquisition module is used for acquiring original dialogue data;

the characteristic extraction module is used for extracting the characteristics of the original dialogue data to obtain target dialogue data;

the coding module is used for coding the target dialogue data to obtain an initial dialogue hidden variable;

the resampling module is used for resampling the initial dialogue hidden variable according to preset attribute feature data to obtain a standard dialogue hidden variable;

the decoding module is used for decoding the standard dialogue hidden variable to obtain a target dialogue text;

and the semantic analysis module is used for performing semantic analysis processing on the target dialog text to obtain a standard dialog text.

To achieve the above object, a third aspect of the embodiments of the present application provides an electronic device, which includes a memory, a processor, a computer program stored on the memory and executable on the processor, and a data bus for implementing connection communication between the processor and the memory, wherein the computer program, when executed by the processor, implements the method of the first aspect.

To achieve the above object, a fourth aspect of the embodiments of the present application proposes a storage medium, which is a computer-readable storage medium for computer-readable storage, and the storage medium stores one or more computer programs, which are executable by one or more processors to implement the method of the first aspect.

The method, the device, the electronic equipment and the storage medium for generating the dialog text are characterized in that original dialog data are obtained; the original dialogue data is subjected to feature extraction to obtain target dialogue data, data with low relevance in the dialogue data can be effectively removed, the total amount of the data is reduced, and the data rationality is improved. And then, encoding the target dialogue data to obtain an initial dialogue hidden variable, resampling the initial dialogue hidden variable according to preset attribute feature data to obtain a standard dialogue hidden variable, and adding required text attribute information to the initial dialogue hidden variable by the method, so that the accuracy and diversity of the generated dialogue text are improved. And then, decoding the standard dialogue hidden variable to obtain the target dialogue text, so that each dialogue statement in the generated target dialogue text can be more reasonable, the matching performance of the inquiry statement and the reply statement in the target dialogue text is improved, and the dialogue text can meet the requirements of different dialogue scenes. And finally, performing semantic analysis processing on the target dialog text, and performing part-of-speech analysis and semantic error correction on the sentences in the target dialog text to obtain a standard dialog text, thereby further improving the quality of the dialog text.

Drawings

Fig. 1 is a flowchart of a dialog text generation method provided in an embodiment of the present application;

FIG. 2 is a flowchart of step S102 in FIG. 1;

FIG. 3 is a flowchart of step S103 in FIG. 1;

FIG. 4 is a flowchart of step S104 in FIG. 1;

fig. 5 is a flowchart of step S105 in fig. 1;

FIG. 6 is a flowchart of step S106 in FIG. 1;

fig. 7 is a flowchart of step S604 in fig. 6;

fig. 8 is a schematic structural diagram of a dialog text generation device according to an embodiment of the present application;

fig. 9 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It should be noted that although functional blocks are partitioned in a schematic diagram of an apparatus and a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the partitioning of blocks in the apparatus or the order in the flowchart. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

First, several terms referred to in the present application are resolved:

artificial Intelligence (AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence, and research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others. The artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

Natural Language Processing (NLP): NLP uses computer to process, understand and use human language (such as chinese, english, etc.), and belongs to a branch of artificial intelligence, which is a cross discipline between computer science and linguistics, also commonly called computational linguistics. Natural language processing includes parsing, semantic analysis, discourse understanding, and the like. Natural language processing is commonly used in the technical fields of machine translation, character recognition of handwriting and print, speech recognition and text-to-speech conversion, information intention recognition, information extraction and filtering, text classification and clustering, public opinion analysis and viewpoint mining, and relates to data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research, linguistic research related to language calculation and the like related to language processing.

Information Extraction (NER): and extracting the fact information of entities, relations, events and the like of specified types from the natural language text, and forming a text processing technology for outputting structured data. Information extraction is a technique for extracting specific information from text data. The text data is composed of specific units, such as sentences, paragraphs and chapters, and the text information is composed of small specific units, such as words, phrases, sentences and paragraphs or combinations of these specific units. The extraction of noun phrases, names of people, names of places, etc. in the text data is text information extraction, and of course, the information extracted by the text information extraction technology can be various types of information.

A Variational auto-encoder (VAE) is an important generation model. And the variational self-decoder is added with regularization during training to prevent overfitting, and the self-encoder ensures that the hidden layer space has enough capacity to carry out the generation process. The distribution generated by the encoder is chosen to be a normal distribution and the encoder can train statistics back to the mean and covariance matrices that describe these positive distributions. An input is coded as a distribution because it can very naturally express global regularization and local regularization of the underlying space, locally because of the control of the variance and globally because of the control of the mean. The loss function of the variational autocoder consists of a reconstruction term (last layer) and a regularization term (hidden layer). The regularization term is represented by the KL divergence between the resulting distribution and the positive Taiwan distribution. The regularization is used for enabling the hidden layer space to be generated, so that the following two characteristics need to be satisfied: continuity and integrity. Continuity is understood to mean that two similar points in the hidden layer should be approximately the same after decoding; completeness is understood to mean that the content of the decoded points sampled in the distribution should be of particular significance. It is not sufficient to satisfy the above two features if the dots in the hidden layer are simply distributed. So a good regularization term needs to be defined, i.e. the distribution generated by the encoder is close to the standard positive distribution, the covariance matrix is close to the unity matrix, and the mean is 0. This regularization term may prevent the model from encoding data remotely in the underlying space and encourage "overlap" of as many return distributions as possible to meet expected continuity and integrity conditions. The regularization term increases the reconstruction penalty, so the training needs to trade off the two penalties.

Encoding (encoder): the encoding is used to convert the input sequence into a fixed length vector.

Decoding (decoder): decoding the fixed vector generated by encoding and converting the fixed vector into an output sequence; wherein, the input sequence can be characters, voice, images and videos; the output sequence may be text, images.

Latent variables: hidden variables are random variables that are not observable, and inferences are typically made about hidden variables by samples of observable variables. Taking the example of a gaussian mixture model, the hidden variables in GMM refer to the gaussian components corresponding to each observation, and are named as hidden variables because the generation process is not observable (or hidden). We can conceal the variables by collecting samples.

Upsampling (upsampling): upsampling refers to enlarging an image, also called image interpolation (interpolating), and is mainly aimed at enlarging the original image so that the image can be displayed on a higher resolution display device. The up-sampling principle is as follows: the image amplification almost adopts an interpolation method, namely, a proper interpolation algorithm is adopted to insert new elements among pixel points on the basis of the original image pixels. The interpolation algorithm mainly comprises an edge-based image interpolation algorithm and a region-based image interpolation algorithm.

Downsampled (subsampled): down-sampling refers to reducing an image, also called down-sampling (down sampled), and is mainly aimed at making the image fit the size of the display area and generating a thumbnail of the corresponding image. The down-sampling principle: for an image I with size M × N, s-fold down sampling is performed to obtain a resolution image with size (M/s) × (N/s), of course, s should be a common divisor of M and N, if an image in matrix form is considered, the image in the original image s × s window is changed into a pixel, and the value of the pixel is the average value of all pixels in the window.

And (3) back propagation: the general principle of back propagation is: inputting training set data into an input layer of a neural network, passing through a hidden layer of the neural network, and finally reaching an output layer of the neural network and outputting a result; calculating the error between the estimated value and the actual value because the output result of the neural network has an error with the actual result, and reversely propagating the error from the output layer to the hidden layer until the error is propagated to the input layer; in the process of back propagation, adjusting the values of various parameters according to errors; and continuously iterating the process until convergence.

And (3) collaborative filtering algorithm: the recommendation algorithm is a well-known and commonly used recommendation algorithm, and finds preference bias of users based on mining of historical behavior data of the users, predicts products which the users may like to recommend, or finds similar users (based on the users) or articles (based on the articles). The implementation of the collaborative filtering algorithm based on the user mainly needs to solve two problems, namely how to find people with similar love with you, namely, how to calculate the similarity of data.

Based on this, embodiments of the present application provide a dialog text generation method, apparatus, electronic device, and storage medium, which aim to improve accuracy of a dialog text.

The method, the apparatus, the electronic device, and the storage medium for generating a dialog text provided in the embodiments of the present application are described in detail with reference to the following embodiments, in which the method for generating a dialog text in the embodiments of the present application is first described.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The embodiment of the application provides a dialog text generation method, and relates to the technical field of artificial intelligence. The dialog text generation method provided by the embodiment of the application can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smartphone, tablet, laptop, desktop computer, or the like; the server side can be configured into an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and cloud servers for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (content delivery network) and big data and artificial intelligence platforms; the software may be an application or the like that implements a dialog text generation method, but is not limited to the above form.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Fig. 1 is an alternative flowchart of a dialog text generation method provided in an embodiment of the present application, where the method in fig. 1 may include, but is not limited to, steps S101 to S106.

Step S101, acquiring original dialogue data;

step S102, extracting the characteristics of the original dialogue data to obtain target dialogue data;

step S103, encoding the target dialogue data to obtain an initial dialogue hidden variable;

step S104, resampling the initial dialogue hidden variables according to preset attribute feature data to obtain standard dialogue hidden variables;

step S105, decoding the standard dialogue hidden variable to obtain a target dialogue text;

and step S106, performing semantic analysis processing on the target dialog text to obtain a standard dialog text.

In steps S101 to S106 illustrated in the embodiment of the present application, the target session data is obtained by performing feature extraction on the original session data, so that data with low relevance in the session data can be effectively removed, the total amount of data is reduced, and the data reasonability is improved. The method comprises the steps of coding target dialogue data to obtain an initial dialogue hidden variable, resampling the initial dialogue hidden variable according to preset attribute feature data to obtain a standard dialogue hidden variable, and adding required text attribute information to the initial dialogue hidden variable, so that accuracy and diversity of a generated dialogue text are improved. The standard dialogue hidden variables are decoded to obtain the target dialogue text, and each dialogue statement in the generated target dialogue text can be more reasonable, so that the matching performance of an inquiry statement and a reply statement in the target dialogue text is improved, and the dialogue text can meet the requirements of different dialogue scenes. Finally, the standard dialogue text is obtained by performing part-of-speech analysis and semantic error correction on the sentences in the target dialogue text, and the quality of the dialogue text can be further improved.

In step S101 of some embodiments, the original dialogue data of the user may be collected by performing image-text consultation, call return visit, video question answering, and the like on the user, or the original dialogue data may be extracted by obtaining a platform tag of the user and evaluation questionnaire information participated by the user. In the actual original dialogue data acquisition process, technical means such as a web crawler can be adopted to crawl data, for example, the web crawler is written, targeted crawling data is performed after a data source is set, the original dialogue data is obtained, and the rationality of the original dialogue data is guaranteed.

Referring to fig. 2, in some embodiments, step S102 may include, but is not limited to, step S201 to step S203:

step S201, extracting entity dialogue characteristics in original dialogue data;

step S202, classifying entity conversation characteristics through a pre-trained sequence classifier to obtain conversation parameter characteristics;

step S203, performing convolution processing on the dialogue parameter feature to obtain target dialogue data.

In step S201 of some embodiments, the entity dialogue features in the original dialogue data are identified by using a preset lexical analysis model. For example, a conversation database is constructed in advance, which may include names of various conversation scenes or proper nouns, terms, non-proper names, etc., related to conversation types. Through the dialogue data word library, the preset lexical analysis model can identify entity dialogue features in the original dialogue data according to specific dialogue linguistic data contained in the dialogue data word library and preset part-of-speech categories, wherein the entity dialogue features can comprise entity words with multiple dimensions such as proper nouns, terms, non-proper names, modifiers, time information and the like related to dialogues.

In step S202 of some embodiments, the entity dialogue features may be classified by a pre-trained sequence classifier, which may be a model based on a conditional random field algorithm (CRF) or a model based on a two-way long-and-short memory algorithm (bi-LSTM). For example, a sequence classifier can be constructed based on the bi-LSTM algorithm, where the input words wi and characters are embedded in a model based on the bi-LSTM algorithm, such that a single output layer is generated at the location where the outputs are connected, by left-to-right long-short memory and right-to-left long-short memory. The sequence classifier can directly transmit the input entity conversation features to the softmax classifier through the output layer, a probability distribution is created on a preset part of speech category label through the softmax classifier, and accordingly the entity conversation features are labeled and classified according to the probability distribution to obtain conversation parameter features, and the conversation parameter features are the entity conversation features including the target conversation parameters.

In step S203 of some embodiments, the convolution layer is used to perform convolution processing on the dialog parameter features to realize extraction of the dialog parameter features, so as to obtain the required target dialog data, where the target dialog data is the original dialog data containing the target dialog parameters.

In some embodiments, before step S103, the method further includes training the dialogue model in advance, where the training process specifically includes:

step a, obtaining sample dialogue data and sample attribute feature data;

b, inputting sample dialogue data and sample attribute feature data into a dialogue model;

step c, coding the sample dialogue data to obtain a sample initial hidden variable;

step d, resampling the initial hidden variables of the sample according to the attribute characteristic data of the sample to obtain standard hidden variables of the sample;

e, decoding the sample standard hidden variable to obtain a sample dialogue text;

step f, calculating the similarity between the sample dialog text and a preset reference dialog text through a loss function of the dialog model;

and g, optimizing the loss function of the dialogue model according to the similarity so as to update the dialogue model.

In step a of some embodiments, sample conversation data may be acquired by performing image-text consultation, call return visit, video question answering and the like on a user, and may also be extracted by obtaining a platform tag of the user and evaluation questionnaire information participated by the user. The attribute feature data mainly includes feature information meeting the requirements of different conversation scenes and different conversation topics, such as customer portrait tags, customer interest points, product explanation information and the like, and the attribute feature data can be preset according to the requirements of different conversation scenes and different conversation topics.

In step b of some embodiments, the sample dialogue data and the sample attribute feature data are input into a dialogue model, which is a variational self-decoding model.

In step c of some embodiments, an MLP network is used to perform multiple mapping processing from a semantic space to a vector space on sample dialogue data, and the sample dialogue data is mapped to a preset vector space to obtain sample dialogue features. And according to the coding sequence and the coding dimension from bottom to top, coding the sample dialogue features to obtain a first sample dialogue hidden variable, and then performing downsampling on the first sample dialogue hidden variable to obtain a sample initial hidden variable.

In step d of some embodiments, the MLP network performs mapping processing on the sample attribute feature data from a semantic space to a vector space, and maps the sample attribute feature data to a preset vector space to obtain a sample attribute feature; and finally, analyzing the sample attribute feature vectors according to different conversation topics and conversation keywords corresponding to different conversation scenes to generate a sample Gaussian distribution network corresponding to a plurality of conversation topics. And resampling the initial hidden variables of the sample through a Gaussian distribution network of the sample to obtain standard hidden variables of the sample.

In step e of some embodiments, the standard dialogue hidden variables of the samples in different dimensions are decoded to obtain second sample dialogue hidden variables, and then upsampling is performed layer by layer to realize decoding processing and upsampling processing of the second sample dialogue hidden variables in all dimensions, so that the target sample dialogue hidden variables are obtained. And then, activating the target sample dialogue hidden variable by adopting a swish function to obtain a sample dialogue text.

In step f of some embodiments, the similarity between the sample dialog text and the preset reference dialog text is calculated through a loss function of the dialog model, for example, the sample dialog text and the preset reference dialog text are converted into a sample dialog text vector and a reference dialog text vector, so that the cosine similarity between the two vectors is calculated, and a similarity value is obtained.

In step g of some embodiments, the similarity value is compared with a similarity threshold, and the model loss of the loss function of the dialogue model is propagated backwards according to the magnitude relation between the similarity value and the similarity threshold, so as to fine-tune the model parameters, so that the similarity value is greater than the similarity threshold, and the optimization of the dialogue model is stopped.

When calculating the model loss of the loss function, the model loss includes an attribute loss for determining whether or not the input sample attribute feature data matches the attribute of the sample dialogue text, in addition to the reconstruction loss and the KL divergence loss of the dialogue model itself, and the attribute loss can be expressed by a cross entropy function. For example, the model loss L (x, x') of the loss function can be expressed as shown in equation (1):

L(x,x′)＝L_recon+L_KL+L_bceformula (1)

Wherein L is_reconIs the loss of reconstruction, L_KLIs KL divergence loss, L_bceIs an attribute loss.

Referring to fig. 3, in some embodiments, step S103 may include, but is not limited to, step S301 to step S303:

step S301, mapping target dialogue data to a preset vector space to obtain target dialogue characteristics;

step S302, according to a preset coding sequence and coding dimensions, coding processing is carried out on target conversation characteristics to obtain a first conversation hidden variable;

step S303, the first session hidden variable is subjected to down-sampling processing to obtain an initial session hidden variable.

In step S301 of some embodiments, an MLP network may be used to perform multiple mapping processes on target dialog data from a semantic space to a vector space, and map the target dialog data into a preset vector space to obtain target dialog features, where the target dialog features may be text features or image features.

In step S302 and step S303 of some embodiments, the coding module of the dialog model may perform coding processing on the target dialog feature according to a bottom-up coding order and a coding dimension to obtain a first dialog hidden variable, and then perform downsampling processing on the first dialog hidden variable by using a downsampling unit of the dialog model to obtain an initial dialog hidden variable. It should be noted that the encoding order may be from bottom to top or from top to bottom, and the encoding dimension may include a plurality of feature dimensions set according to the conversation type, the conversation scene, and the like, for example, a conversation emotion dimension, a conversation topic dimension, and the like, without limitation.

For example, the target dialog feature is primarily encoded to obtain a first dialog hidden variable z1 at the bottommost layer, and then downsampling is performed layer by layer upwards to obtain an initial dialog hidden variable [ z2, z3 …, zk ] corresponding to each layer.

Compared with a method for mapping high-dimensional text information to a low-dimensional hidden variable layer z in the prior art, the method can effectively avoid the loss of target dialogue data and effectively improve the text quality of the reconstructed text.

Referring to fig. 4, in some embodiments, step S104 may include, but is not limited to, step S401 to step S402:

step S401, analyzing the attribute feature data to generate a target Gaussian distribution network;

and step S402, resampling the initial dialogue hidden variables through a target Gaussian distribution network to obtain standard dialogue hidden variables.

In step S401 of some embodiments, the attribute feature data mainly includes feature information meeting the requirements of different dialog scenarios and different dialog topics, such as customer portrait tags, customer interest points, product explanation information, and so on.

In some embodiments, the customer representation tag includes the user's age, gender, occupation, and the like. In practical applications, by dividing the character image attributes of the attribute feature data and performing model training based on the character image attributes of the attribute feature data, it is possible to generate responses corresponding to the characteristics of different users. The content replied by users of different ages, sexes and jobs is different for the same question. For example, the agent needs to ask the user for the following: i want to recommend XX insurance to you, the elderly may answer: i feel ungry and the young may answer: i are well without buying insurance, and users with low income may answer: i have great pressure at ordinary times, do not have money to buy insurance, have children's user probably ask against and need not to buy insurance etc. for children.

It should be noted that, the attributes of the person representation may be various, such as: age, gender, etc., for which an attribute classifier for each specific person attribute needs to be trained in advance, such as an attribute classifier for age, an attribute classifier for gender, etc. When training an attribute classifier for a certain person attribute, setting a category for a specific person attribute; collecting dialogue linguistic data and behavior data of different users; extracting data corresponding to different categories from the dialogue corpus and the behavior data respectively to serve as training data; and training by using the training data to obtain an attribute classifier corresponding to the specific character attribute. Taking an attribute classifier for age as an example, the interval of the age characteristics can be set as: 0-10 years old, 10-25 years old, 25-40 years old, 40-60 years old, and over 60 years old. When the user age is judged by the attribute classifier aiming at the age, the user can inquire information based on historical inquiry of the user, if the inquiry information of the user frequently shows the content related to car buying or house buying, the score of the section with the age of 25-40 years is obtained by the attribute classifier aiming at the age and is higher than that of other sections, so that the user can be judged to have the age of 25-40 years. Similarly, when the gender of the user is determined by using the gender-specific attribute classifier, if it is known that female-related content such as a one-piece dress and a lipstick often appears in the query information of the user based on the information of the historical query of the user, the gender of the user is determined to be female by using the gender-specific attribute classifier.

In some embodiments, the client interest points refer to attributes generated based on the content of the conversation in the original text data in conjunction with the context of the conversation, e.g., many contents about adventure have been introduced in the original text data, and then the conversation bot needs to generate a reply to the contents about the adventure, e.g., many contents about education of children have been introduced in the original text data, and the conversation bot needs to generate contents about an educational program.

In other embodiments, the product explanation information may be divided based on the conversation phase, which is generally divided into different phases such as idea introduction, product introduction, promotion, demand stimulation, service introduction, product sale, and the like, and the conversation data is processed in stages, so that the conversation robot can reply specifically to the different phases. For example, in the idea introduction stage, the conversation robot should reply more about why insurance buying and insurance not buying would result in what risk, etc., in the product sale stage, the conversation robot should reply more about product price and product details, etc.

In step S401 in some embodiments, attribute feature data is mapped to a vector space through an MLP network, and the attribute feature data is mapped to a preset vector space to obtain an attribute feature; and finally, analyzing the attribute feature vectors according to the different conversation topics and the conversation keywords corresponding to the different conversation scenes to generate a target Gaussian distribution network corresponding to the plurality of conversation topics, wherein the target Gaussian distribution network can be expressed as (N to (mu, sigma)), and N is the number of the conversation topics. Mu and sigma respectively represent the standard deviation value and the variance value of the feature vectors with different attributes.

In step S402 of some embodiments, when the initial dialog hidden variables are resampled through the target gaussian distribution network, at least one of a nearest neighbor interpolation method, a bilinear interpolation method, and a cubic convolution interpolation method may be used to process the initial dialog hidden variables, that is, gray values of the initial dialog hidden variables are collected at certain intervals, and the collected gray values are analyzed. When the acquired gray value is not in the range of the value set of the original function on the sampling point, the sampled point is interpolated by using a nearest neighbor interpolation method, a bilinear interpolation method or a cubic convolution interpolation method to obtain a plurality of distributions [ Y1, Y2, …, Yk ] of the target dialogue data on different dimensions, namely standard dialogue hidden variables.

The attribute characteristic data is introduced during the resampling process, so that the resampling process is not a sampling process based on standard Gaussian distribution any more, different attribute characteristic data can be used as an independent Gaussian mixture model, a plurality of target Gaussian distributions are formed, text sentences meeting requirements of different theme scenes can be generated in the decoding stage, the accuracy of a dialog text is improved, meanwhile, required vocabularies and vocabulary corpora can be more conveniently determined in the decoding stage based on the guidance of the attribute characteristic data, the calculated amount of the decoding process is reduced, and the generation efficiency of the dialog text is improved.

Referring to fig. 5, in some embodiments, step S105 may further include, but is not limited to, step S501 to step S503:

step S501, decoding the standard dialogue hidden variable to obtain a second dialogue hidden variable;

step S502, carrying out up-sampling processing on the second dialogue hidden variable to obtain a target dialogue hidden variable;

and step S503, activating the target dialogue hidden variable through a preset function to obtain a target dialogue text.

In steps S501 and S502 of some embodiments, a standard dialog hidden variable quantity is decoded by a decoding module of the dialog model to obtain a second dialog hidden variable; and then, performing upsampling processing on the second dialogue hidden variable through an upsampling unit to obtain a target dialogue hidden variable.

For example, decoding processing is performed on standard dialogue hidden variables in different dimensions, and then upsampling processing is performed layer by layer upwards to realize decoding processing and upsampling processing on second dialogue hidden variables in all dimensions, so that target dialogue hidden variables are obtained.

In step S503 of some embodiments, the preset activation function may be a swish function, and the target dialog hidden variable is activated by using the swish function. The swish function may be expressed as f (x) ═ x sigmoid (β x), where x is a target session hidden variable.

The curve of the swish function has the characteristics of no upper bound and lower bound, smoothness and nonmonotony, and the target dialogue hidden variable is input into the swish function to be calculated, so that the target dialogue text meeting the requirements can be obtained.

It should be noted that, in some other embodiments, a preset activation function such as a relu function, a tanh function, and the like may also be used to perform activation processing on the target dialog hidden variable, which is not limited to this.

Referring to fig. 6, in some embodiments, step S106 further includes, but is not limited to, steps S601 to S604:

step S601, extracting semantic features of a target dialog text;

step S602, segmenting the target dialogue text according to the semantic features to obtain a target dialogue field;

step S603, calculating the similarity of the target dialogue field and the reference dialogue field;

and step S604, obtaining a standard dialog text according to the similarity.

In step S601 in some embodiments, different types of dialog keywords may be preset according to different dialog scenarios, dialog requirements, and dialog types, and meanwhile, a part-of-speech category of the set dialog keywords is labeled to obtain a dialog keyword including a part-of-speech category tag. And recognizing semantic features of the original target dialogue text according to preset dialogue key words containing part-of-speech category labels, and extracting corresponding semantic features.

In step S602 in some embodiments, for the number of characters of each semantic feature and the position of the semantic feature, the target dialog text is segmented, and the target dialog text is divided into a plurality of target dialog fields, where the target dialog fields may be triples, quadruples, or single characters.

In step S603 and step S604 of some embodiments, each of the target dialog field and the reference dialog field is vectorized to obtain a target dialog vector and a reference dialog vector. The similarity between each target dialogue vector and the reference dialogue vector is calculated by a collaborative filtering algorithm such as cosine similarity algorithm. Further, the required dialogue sentences are selected from the preset sentence database according to the similarity and the preset similarity, and the final standard dialogue text is constructed according to the dialogue sentences.

Referring to fig. 7, in some embodiments, step S604 may further include, but is not limited to, step S701 to step S703:

step S701, screening the dialogue sentences in a preset sentence database according to the similarity to obtain a candidate sentence set;

step S702, supplementary processing is carried out on the dialogue sentences in the candidate sentence set to obtain standard dialogue sentences;

and step S703, performing fusion processing on the standard dialogue sentences to obtain standard dialogue texts.

In step S701 of some embodiments, a similarity between a target dialogue vector and a reference dialogue vector is calculated, a reference dialogue vector with the similarity greater than or equal to a similarity threshold is determined, dialogue sentences corresponding to the reference dialogue vectors are screened from a preset sentence database, and then, a dialogue sentence with the similarity greater than or equal to the preset similarity threshold is labeled, so as to obtain a labeled dialogue sentence. Wherein the label may be a text label, a number label, a letter label or a graphic label. And then, the label dialogue sentences are incorporated into the same set to obtain a candidate sentence set.

In step S702 of some embodiments, the dialog sentences within the candidate sentence set are supplemented. Specifically, the utterance sentence is supplemented by filling synonyms and synonyms or by performing a copying and supplementing process on the utterance sentences in the candidate sentence set according to the corresponding target dialogue fields, so that a standard dialogue sentence is obtained, and the integrity of the dialogue sentence is improved.

In step S703 of some embodiments, the standard dialog statements are converted into SQL statements, and the SQL statements are merged and fused by the database platform to obtain a standard dialog text meeting the requirements.

The method comprises the steps of obtaining original dialogue data; the original dialogue data is subjected to feature extraction to obtain target dialogue data, data with low relevance in the dialogue data can be effectively removed, the total amount of the data is reduced, and the data rationality is improved. And then, encoding the target dialogue data to obtain an initial dialogue hidden variable, resampling the initial dialogue hidden variable according to preset attribute feature data to obtain a standard dialogue hidden variable, and adding required text attribute information to the initial dialogue hidden variable by the method, so that the accuracy and diversity of the generated dialogue text are improved. And then, decoding the standard dialogue hidden variable to obtain the target dialogue text, so that each dialogue statement in the generated target dialogue text can be more reasonable, the matching performance of the inquiry statement and the reply statement in the target dialogue text is improved, and the dialogue text can meet the requirements of different dialogue scenes. And finally, performing semantic analysis processing on the target dialog text, and performing part-of-speech analysis and semantic error correction on the sentences in the target dialog text to obtain a standard dialog text, thereby further improving the quality of the dialog text.

Referring to fig. 8, an embodiment of the present application further provides a dialog text generation device, which can implement the dialog text generation method described above, where the dialog text generation device includes:

an original dialogue data acquisition module 801, configured to acquire original dialogue data;

the feature extraction module 802 is configured to perform feature extraction on the original dialogue data to obtain target dialogue data;

the encoding module 803 is configured to perform encoding processing on the target dialog data to obtain an initial dialog hidden variable;

the resampling module 804 is configured to perform resampling processing on the initial dialogue hidden variable according to preset attribute feature data to obtain a standard dialogue hidden variable;

the decoding module 805 is configured to perform decoding processing on the standard dialog hidden variable to obtain a target dialog text;

and the semantic analysis module 806 is configured to perform semantic analysis processing on the target dialog text to obtain a standard dialog text.

The specific implementation of the dialog text generation apparatus is basically the same as the specific implementation of the dialog text generation method, and is not described herein again.

An embodiment of the present application further provides an electronic device, where the electronic device includes: the device comprises a memory, a processor, a program stored on the memory and capable of running on the processor, and a data bus for realizing connection communication between the processor and the memory, wherein the program realizes the dialog text generation method when being executed by the processor. The electronic equipment can be any intelligent terminal including a tablet computer, a vehicle-mounted computer and the like.

Referring to fig. 9, fig. 9 illustrates a hardware structure of an electronic device according to another embodiment, where the electronic device includes:

the processor 901 may be implemented by a general-purpose CPU (central processing unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits, and is configured to execute a relevant program to implement the technical solution provided in the embodiment of the present application;

the memory 902 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a Random Access Memory (RAM). The memory 902 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 902 and called by the processor 901 to execute the dialog text generation method according to the embodiments of the present application;

an input/output interface 903 for implementing information input and output;

a communication interface 904, configured to implement communication interaction between the device and another device, where communication may be implemented in a wired manner (e.g., USB, network cable, etc.), or in a wireless manner (e.g., mobile network, WIFI, bluetooth, etc.);

a bus 905 that transfers information between various components of the device (e.g., the processor 901, the memory 902, the input/output interface 903, and the communication interface 904);

wherein the processor 901, the memory 902, the input/output interface 903 and the communication interface 904 enable a communication connection within the device with each other through a bus 905.

An embodiment of the present application further provides a storage medium, which is a computer-readable storage medium for computer-readable storage, and the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the above dialog text generation method.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

According to the dialog text generation method, the dialog text generation device, the electronic equipment and the storage medium, the target dialog data are obtained by performing feature extraction on the original dialog data, data with low relevance in the dialog data can be effectively eliminated, the total amount of the data is reduced, and the data rationality is improved. And then, encoding the target dialogue data to obtain an initial dialogue hidden variable, resampling the initial dialogue hidden variable according to preset attribute feature data to obtain a standard dialogue hidden variable, and adding required text attribute information to the initial dialogue hidden variable by the method, so that the accuracy and diversity of the generated dialogue text are improved. And then, decoding the standard dialogue hidden variable to obtain the target dialogue text, so that each dialogue statement in the generated target dialogue text can be more reasonable, the matching performance of the inquiry statement and the reply statement in the target dialogue text is improved, and the dialogue text can meet the requirements of different dialogue scenes. And finally, performing semantic analysis processing on the target dialog text, and performing part-of-speech analysis and semantic error correction on the sentences in the target dialog text to obtain a standard dialog text, thereby further improving the quality of the dialog text.

The embodiments described in the embodiments of the present application are for more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute a limitation to the technical solutions provided in the embodiments of the present application, and it is obvious to those skilled in the art that the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems with the evolution of technology and the emergence of new application scenarios.

It will be appreciated by those skilled in the art that the solutions shown in fig. 1-7 are not intended to limit the embodiments of the present application and may include more or fewer steps than those shown, or some of the steps may be combined, or different steps may be included.

The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing programs, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and the scope of the claims of the embodiments of the present application is not limited thereto. Any modifications, equivalents and improvements that may occur to those skilled in the art without departing from the scope and spirit of the embodiments of the present application are intended to be within the scope of the claims of the embodiments of the present application.

Claims

1. A dialog text generation method, the method comprising:

acquiring original dialogue data;

coding the target dialogue data to obtain an initial dialogue hidden variable;

resampling the initial dialogue hidden variable according to preset attribute feature data to obtain a standard dialogue hidden variable;

2. The dialog text generation method according to claim 1, wherein the step of extracting features of the original dialog data to obtain target dialog data includes:

extracting entity dialogue features in the original dialogue data;

3. The dialog text generation method according to claim 1, wherein the step of encoding the target dialog data to obtain an initial dialog hidden variable includes:

4. The dialog text generation method according to claim 1, wherein the step of resampling the initial dialog hidden variables according to preset attribute feature data to obtain standard dialog hidden variables comprises:

5. The dialog text generation method according to claim 1, wherein the step of decoding the standard dialog hidden variable to obtain the target dialog text comprises:

6. The dialog text generation method according to any one of claims 1 to 5, wherein the step of performing semantic analysis processing on the target dialog text to obtain a standard dialog text comprises:

extracting semantic features of the target dialog text;

and obtaining the standard dialog text according to the similarity.

7. The method according to claim 6, wherein the step of obtaining the standard dialog text based on the similarity includes:

8. An apparatus for generating dialog text, the apparatus comprising:

9. An electronic device, characterized in that the electronic device comprises a memory, a processor, a computer program stored on the memory and executable on the processor, and a data bus for enabling a connection communication between the processor and the memory, the computer program, when executed by the processor, implementing the steps of the dialog text generation method according to any of claims 1 to 7.

10. A storage medium, which is a computer-readable storage medium for computer-readable storage, characterized in that the storage medium stores one or more computer programs, which are executable by one or more processors to implement the steps of the dialog text generation method according to any one of claims 1 to 7.