CN117216206A

CN117216206A - Session processing method and device, electronic equipment and storage medium

Info

Publication number: CN117216206A
Application number: CN202311101662.3A
Authority: CN
Inventors: 邵纪春; 梁鑫; 冯华文
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2023-08-29
Filing date: 2023-08-29
Publication date: 2023-12-12

Abstract

The application relates to a session processing method, a session processing device, electronic equipment and a storage medium. The method comprises the following steps: determining the content category of a target event described by the current dialogue information in response to the current dialogue information of the target object in the conversation process; extracting dialogue elements from the current dialogue information based on the content category to obtain dialogue elements matched with the content category; matching is carried out in dialogue label configuration information corresponding to the target event by using the dialogue element, so that a current dialogue label matched with current dialogue information is obtained; generating reply information corresponding to the current dialogue information according to the current dialogue tag and the history dialogue tag; the history dialogue tag is a dialogue tag matched with the history dialogue information of the target object in the conversation process. According to the technical scheme provided by the application, the flexibility of the session and the session efficiency can be improved.

Description

Session processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of session service technologies, and in particular, to a session processing method, a session processing device, an electronic device, and a storage medium.

Background

The prior art has widely used session services through man-machine conversations, such as data statistics, consultation, etc. However, the session policy of the machine in the session generally performs the session in a fixed stack-in sequence, so that the session process is inflexible and cannot be effectively applied to different session requirements, resulting in lower session efficiency.

Disclosure of Invention

The application provides a session processing method, a session processing device, electronic equipment and a storage medium, so as to improve the flexibility and the session efficiency of a session. The technical scheme of the application is as follows:

according to a first aspect of an embodiment of the present application, there is provided a session processing method, including:

determining the content category of a target event described by the current dialogue information in response to the current dialogue information of the target object in the conversation process;

extracting dialogue elements from the current dialogue information based on the content category to obtain dialogue elements matched with the content category;

matching is carried out in dialogue label configuration information corresponding to the target event by using the dialogue element, so that a current dialogue label matched with current dialogue information is obtained;

generating reply information corresponding to the current dialogue information according to the current dialogue tag and the history dialogue tag; the history dialogue tag is a dialogue tag matched with the history dialogue information of the target object in the conversation process.

According to a second aspect of an embodiment of the present application, there is provided a session processing apparatus including:

the content category determining module is used for responding to the current dialogue information of the target object in the conversation process and determining the content category of the target event described by the current dialogue information;

the dialogue element extraction module is used for extracting dialogue elements from the current dialogue information based on the content category to obtain dialogue elements matched with the content category;

the current dialogue tag acquisition module is used for matching in dialogue tag configuration information corresponding to the target event by using the dialogue element to obtain a current dialogue tag matched with the current dialogue information;

the reply information generation module is used for generating reply information corresponding to the current dialogue information according to the current dialogue label and the history dialogue label; the history dialogue tag is a dialogue tag matched with the history dialogue information of the target object in the conversation process.

According to a third aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of any of the first aspects above.

According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method of any of the first aspects of embodiments of the present application.

According to a fifth aspect of embodiments of the present application, there is provided a computer program product comprising computer instructions which, when executed by a processor, cause the computer to perform the method of any of the first aspects of embodiments of the present application.

The technical scheme provided by the embodiment of the application at least has the following beneficial effects:

the content category of the target event described by the current dialogue information is determined, so that dialogue elements can be effectively extracted based on the content category, and the method is accurate and saves resources; and by setting the corresponding dialogue tag configuration information for the target event, the current dialogue information can be positioned to the matched current dialogue tag based on the dialogue element, so that the reply information corresponding to the current dialogue information is generated according to the current dialogue tag and the history dialogue tag, the generation of the reply information is more flexible, and the conversation efficiency is higher.

In addition, the reply information of the reply target object in the conversation process also refers to the conversation label configuration information corresponding to the target event, and the conversation label configuration information corresponding to different events is different, so that the conversation process can be differentiated due to different events, the conversation process can be more effectively directed against each event, the conversation process of each event can be effectively guided, the conversation processing scene applicability is wider and more flexible, and the conversation processing of each event is more efficient and accurate.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application and do not constitute a undue limitation on the application.

FIG. 1 is a schematic diagram of an application environment, shown in accordance with an exemplary embodiment.

Fig. 2 is a flow chart illustrating a session processing method according to an exemplary embodiment.

Fig. 3 is a flow architecture diagram illustrating a session processing method according to an example embodiment.

Fig. 4 is a schematic diagram of a session interface, shown according to an example embodiment.

FIG. 5 is a diagram illustrating a training flow of a content category classification model according to an exemplary embodiment.

FIG. 6 is a schematic diagram illustrating dialog element extraction based on a dialog element extraction model, according to an example embodiment.

Fig. 7 is a schematic diagram illustrating a preset session flow according to an exemplary embodiment.

Fig. 8 is a schematic diagram illustrating one dialog tag configuration information, according to an example embodiment.

FIG. 9 is a diagram illustrating a training architecture of a dialog tag predictive model, according to an example embodiment.

Fig. 10 is a block diagram of a session processing apparatus according to an exemplary embodiment.

Fig. 11 is a block diagram of an electronic device for session handling, according to an example embodiment.

Fig. 12 is a block diagram of an electronic device for session handling, shown in accordance with an exemplary embodiment.

Detailed Description

Various exemplary embodiments, features and aspects of the application will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In addition, numerous specific details are set forth in the following description in order to provide a better illustration of the application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, well known methods, procedures, components, and circuits have not been described in detail so as not to obscure the present application.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

In recent years, with research and progress of artificial intelligence technology, the artificial intelligence technology is widely applied in a plurality of fields, and the scheme provided by the embodiment of the application relates to natural language processing technology, machine learning/deep learning technology and the like, and is specifically described by the following embodiments.

Referring to fig. 1, fig. 1 is a schematic diagram of an application system according to an embodiment of the application. The application system can be used for the session processing method of the present application. As shown in fig. 1, the application system may include at least a server 01 and a terminal 02.

In the embodiment of the present application, the server 01 may be used for session processing, where the server 01 may include an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content distribution networks), and basic cloud computing services such as big data and artificial intelligence platforms.

In the embodiment of the present application, the terminal 02 may be used for information presentation of a dialogue between two parties in a dialogue process and for interaction with the server 01 in a dialogue process, where the two parties in the dialogue may include a target object and a question-answering robot. The terminal 02 may include a smart phone, a desktop computer, a tablet computer, a notebook computer, a smart speaker, a digital assistant, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a smart wearable device, or other type of physical device. The physical device may also include software, such as an application, running in the physical device. The operating system running on the terminal 02 in the embodiment of the present application may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.

In addition, it should be noted that fig. 1 is only one application environment of the session processing method provided by the present application. For example, session processing and information presentation during the session may be performed by the terminal 02.

In the embodiment of the present disclosure, the terminal 02 and the server 01 may be directly or indirectly connected through a wired or wireless communication method, which is not limited to the present disclosure.

It should be noted that, in the specific embodiment of the present application, related data of a user is referred to, and when the following embodiments of the present application are applied to specific products or technologies, user permission or consent is required to be obtained, and the collection, use and processing of related data are required to comply with related laws and regulations and standards of related countries and regions.

Fig. 2 is a flow chart illustrating a session processing method according to an exemplary embodiment. As shown in fig. 2, the following steps may be included.

S201, responding to the current dialogue information of the target object in the conversation process, and determining the content category of the target event described by the current dialogue information.

In the embodiment of the present specification, the session process may refer to a session process between the target object and the question-answering robot, and the associated information of the target event may be obtained through the session process. For example, as shown in fig. 3, during a session, a target object may perform dialogue information input, so that the input dialogue information (current dialogue information) may be subjected to content category identification to identify the content category of a target event described by the dialogue information. Further, the dialogue element extraction can be performed on the current dialogue information based on the content category, so that the extracted dialogue element can be used for matching in the dialogue tag configuration information corresponding to the target event, and the current dialogue tag matched with the current dialogue information can be obtained. And historical dialogue information can be extracted from the dialogue state management, so that personalized dialogue decision (or a mode of selecting to generate reply information) can be performed based on the historical dialogue label corresponding to the historical dialogue information and the current dialogue label matched with the current dialogue information, and the mode of generating the reply information can be described below. In this way, a conversation is continuously conducted to acquire the associated information of the target event.

The association information of the target event may be preset information that needs to be acquired for the target event. The target event may refer to a preset event, such as a statistical event, a streaming event, etc., which is not limited in the present application. Taking statistics of events as an example, for example, statistics of play situations of all sceneries in a certain period may be called sceneries play statistics. The related information required to be acquired for the scenic spot playing statistical event can be set according to actual needs, and can include, but is not limited to, object attribute information, scenic spot name, time of playing at the scenic spot, traffic mode, and the like.

In one example, the session process may be presented through a session interface, such as that shown in FIG. 4. The target object may enter the session interface to talk to the question-answering robot in the session interface, e.g., to talk continuously under the guidance of the question-answering robot. Wherein the target object may refer to a user participating in the session.

Illustratively, referring to fig. 4, 41 in fig. 4 may refer to an avatar of the question-answering robot, and 42 in fig. 4 may refer to an avatar of the target object. 43 in fig. 4 may refer to the current dialogue information of the target object in the session, that is, the dialogue information currently input by the target object, that is, the latest dialogue information of the target object in the session, which has not been replied by the question-answering robot, that is, the dialogue information replied by the question-answering robot.

Note that the dialogue information may be text information, audio information, or image information, which is not limited in this disclosure. Accordingly, information recognition may be performed using matched text, audio, or image recognition techniques for subsequent processing; or can be converted into text information for subsequent processing.

In the embodiment of the present specification, a content category set of dialog descriptions may be set in advance for a target event. The content category set may include a plurality of preset content categories, and the preset content categories may be used to characterize content categories of the target event described by the dialogue information. Based on this, the identified content category may be one of a plurality of preset content categories in the set of content categories. As one example, the content category may correspond to a category of information that needs to be acquired, such as a content category of an object attribute, time, and the like; or may also include positive, negative, etc. content categories that may be used to describe whether the current dialog information is positive or negative of the question-answering robot.

In practical application, in response to the current dialogue information of the target object in the session process, the current dialogue information needs to be replied, and based on the current dialogue information, the content category corresponding to the current dialogue information can be determined first. For example, a content category set may be obtained, so that a preset content category matched with the current dialogue information in the content category set may be determined, and the matched preset content category is determined as the content category corresponding to the current dialogue information.

As an example, the content category classification model may be used to perform internal category classification on the current session information, to obtain a content category corresponding to the current session information.

For example, where the current dialog information is in text form, the content category classification model may be a fine-tuning of a pre-trained natural language processing model (i.e., a pre-trained language characterization model). For example, a BERT model (Bidirectional Encoder Representation from Transformers, a bi-directional encoder representation of transformers, a pre-trained language characterization model) may be used, and a classification layer may be added on the basis of the BERT model for fine-tuning of content class classification tasks. For example, referring to fig. 5, a Token sequence corresponding to a sample sentence may be obtained, and a [ CLS ] symbol and a [ SEP ] symbol are added to the first bits as inputs of the BERT model, and vector outputs corresponding to the [ CLS ] positions in output results of the BERT model are used as inputs of a classification layer, where the outputs of the classification layer are predicted content categories. For example, the sample sentence "I XXXX-10 points XX-XX go to XX sight", and the corresponding predicted content category may be a content category such as time and sight name.

Further, loss information can be obtained according to content category labels and predicted content categories corresponding to sample sentences, so that parameters of a classification layer can be adjusted by using a gradient descent method according to the loss information until iteration conditions are met. The corresponding classification layer and BERT model when the iteration condition is satisfied can be used as the content classification model. Wherein the loss information may be calculated based on a cross entropy loss function, as the application is not limited in this regard. The content category label may belong to a content category set, i.e. the content category label is annotated for the sample sentence using a preset content category in the content category set.

And S203, performing dialogue element extraction processing on the current dialogue information based on the content category to obtain dialogue elements matched with the content category.

In one possible implementation, a plurality of dialog elements may be extracted from the current dialog information, where a dialog element may refer to a word segment in the current dialog information. Further, a dialog element matching the content category may be screened from the plurality of dialog elements. As an example, the plurality of dialog elements may be extracted from the current dialog information using a pre-trained natural language processing model, which is not limited by the present application.

In another possible implementation manner, the dialog element extraction process of the current dialog information can be guided by using the content category, so that the dialog element matched with the content category is obtained, and the dialog element is extracted more accurately. Based on this, the extracting the dialog element from the current dialog information based on the content category to obtain the dialog element matched with the content category may include:

inputting the content category and the current dialogue information into a dialogue element extraction model, and performing dialogue element extraction processing to obtain dialogue elements with matched content categories;

the dialogue element extraction model can be obtained by training a preset neural network based on dialogue training data, wherein each dialogue training data comprises sample sentence information, sample content categories and labeled element labels; the sample content category may belong to a set of content categories. The training process may calculate the loss using a cross entropy loss function and minimize the loss using a random gradient descent algorithm, resulting in a dialog element extraction model.

By way of example, the structure of the dialog element extraction model may include a BERT model, a BiLSTM (Bi-directionalLong Short-Term Memory) model, and a CRF (Conditional Random Fields, conditional random field) model. The BERT is used as a feature extractor for extracting text information considering context and forming feature representation, and the output of the BERT is a context-related word vector corresponding to the word segmentation; the output of the BERT may then be embedded as input sequence features of the BiLSTM layer modeling the input sequence based on bi-directional lstm to further capture contextual dependencies of the tokens in the sequence, generating a richer feature representation. Finally, the output characteristics of BiLSTM are evaluated by using CRF, the dependency relationship between the BiLSTM and the vector characterization of the content category is judged, and the dialogue elements matched with the content category, namely the word segmentation matched with the content category, are output.

Alternatively, the content category may also be identified by a dialog element extraction model, such that the content category may not be used as an input to the dialog element extraction model, such as shown in fig. 6. In this case, the dialogue element extraction model may be obtained by training a preset neural network based on training sample data, where each training sample data may include sample sentence information, and labeled element tags and content category tags; the content category labels may belong to a set of content categories. The training process may calculate the loss using a preset loss function and minimize the loss using a random gradient descent algorithm, resulting in a dialog element extraction model. The preset loss function may be a cross entropy loss function, which is not limited in the present application.

S205, matching processing is carried out in dialogue label configuration information corresponding to the target event by using dialogue elements, and a current dialogue label matched with the current dialogue information is obtained.

In this embodiment of the present disclosure, the current dialog tag may be one of dialog tags preset in the dialog tag configuration information, and the dialog tag may be used to characterize a category of information to be acquired, for example, object attribute information of the target object.

In one possible implementation manner, the semantics of the dialogue element and the semantics of each preset dialogue tag in the dialogue tag configuration information can be matched to obtain a dialogue tag with matched semantics; so that the obtained semantically matched dialog tag can be used as the current dialog tag matched with the current dialog information.

In another possible implementation, the step S205 may include:

inputting the dialogue element into a dialogue tag prediction model, and performing dialogue tag prediction processing to obtain dialogue element characterization information;

and matching the dialogue element characterization information with the embedded characterization information of each dialogue tag in the dialogue tag configuration information to obtain a matched dialogue tag serving as a current dialogue tag. The embedded characterization information of each dialog tag may refer to an embedded vector of each dialog tag, and may be obtained by performing vector characterization on each dialog tag in advance based on a pre-trained natural language model. Illustratively, the pre-trained natural language model may be a BERT model.

In one example, the training step of the dialog tag predictive model may include:

acquiring training sample pairs, wherein each training sample pair comprises a sample word segmentation and a reference word segmentation; the number of the training sample pairs can be multiple, wherein the training sample pairs can comprise positive training sample pairs and negative training sample pairs, and the sample word segmentation in the positive training sample pairs is consistent with the reference word segmentation semanteme; the sample word segmentation and the reference word segmentation in the negative training sample pair are semantically different. For example, a reference word segment may refer to a more standard expression for the respective semantics, and a sample word segment may refer to a more standard expression as well as a non-standard expression for the respective semantics. For example, the reference word is the driver, and the sample word may be the start.

Further, referring to fig. 7, a preset double-tower model may be used for training to obtain a dialog tag prediction model. For example, the sample word segmentation may be input into a first semantic model in a preset double-tower model, and semantic feature extraction processing is performed to obtain a first semantic feature;

inputting the reference word into a second semantic model in a preset double-tower model, and extracting semantic features to obtain second semantic features; illustratively, the first semantic model and the second semantic model may be pre-trained BERT models, which the present application is not limited to.

Performing loss processing on the first semantic features and the second semantic features based on contrast learning to obtain loss information; for example, similar presentation features are pulled up and differential presentation features are pulled up using a contrast learning technique based on an info nce loss function. The application is not limited to the loss function in the contrast learning.

Training the first semantic model according to the loss information until the training iteration condition is met, and taking the first semantic model corresponding to the training iteration condition as a dialog label prediction model. The training iteration condition may include, but is not limited to, a training iteration number threshold, a loss threshold, and the like, which is not limited by the present application.

S207, generating reply information corresponding to the current dialogue information according to the current dialogue tag and the history dialogue tag.

In this embodiment of the present disclosure, the history session tag may be a session tag that matches the history session information of the target object in the session process, and the history session tag may also be obtained by using the same obtaining manner as the current session tag, that is, when the history session information is replied, session elements extracted from the history session information may be used to match in the session tag configuration information to obtain the history session tag. The history dialog tag may belong to a dialog tag preset in the dialog tag configuration information.

In one possible implementation, the dialog tag configuration information may include a dialog tag set corresponding to each of a plurality of dialog states; the plurality of dialogue states can be dialogue states in a preset conversation process corresponding to the target event, and the dialogue states can be used for representing conversation stages in the preset conversation process, so that the questioning of the questioning and answering robot can be guided, the conversation process of the target event can be orderly executed, the acquisition of the associated information of the target event is more efficient, and the associated information is more comprehensive. For example, the obtained association information may be stored under the matched dialog tags, and for each target object, the session process of the target object participating in the target event may be considered to be completed if the dialog tags are filled with the corresponding structured information.

In one example, the preset session flow may include a plurality of dialog states arranged in sequence, which may be used to divide the session process into a plurality of session phases, so that the session process may be performed in stages, so that the session may be more accurate and efficient. For example, the session process may be divided according to the information category of the associated information that needs to be acquired by the target event to obtain the plurality of session states, for example, one session state is set to correspond to one information category, that is, one session stage corresponding to one session state is used to acquire information under one information category. Referring to fig. 8, for example, the preset session flow may include 4 dialog states arranged in sequence, such as dialog state 1 through dialog state 4 from left to right. For example, when setting to obtain the associated information of the travel event, the information category of the associated information to be obtained may be divided to obtain a plurality of corresponding information obtaining phases, i.e., session phases, such as session state 1 may be a session state of basic attribute information, session state 2 may be a session state of the scenic spot, session state 3 may be a session state of using the vehicle, and session state 4 may be a session state of travel round trip time. The application is not limited to this, and the setting can be performed according to the target event and the related information of the target event which is acquired in actual need.

As an example, referring to fig. 9, the dialog tag configuration information may include dialog tag sets corresponding to each of a plurality of dialog states, such as dialog state 1 through dialog state 4 corresponding to dialog tag set 1 through dialog tag set 4, respectively. One or more dialogue tags can be configured under each dialogue tag set, such as n dialogue tags in fig. 9, dialogue tag 11-1 n, dialogue tag 21-2 n, dialogue tag 31-3 n, and dialogue tag 41-4 n; n may be a positive integer greater than or equal to 1. The number of session tags configured under each session tag set may be the same or different, which is not limited by the present application.

In an alternative embodiment, in a case where the setting dialog tag configuration information includes a plurality of dialog states, the method may further include: the current session state in which the current session information is located may be obtained, for example, from session state management. The current dialog state may be one of a plurality of dialog states.

Accordingly, the generating the reply information corresponding to the current dialogue information according to the current dialogue tag and the historical dialogue tag may be replaced by: and under the condition that the current dialogue label belongs to the dialogue label set corresponding to the current dialogue state, generating reply information corresponding to the current dialogue information according to the current dialogue label and the historical dialogue label. Optionally, in the case that the current dialog tag does not belong to the dialog tag set corresponding to the current dialog state, the current dialog information indicating the target object is not in the current dialog state, and may not reply according to the question, so that the target object may be prompted to reply again or skip, and so on. By combining the current dialogue state, the accuracy and efficiency of the question-answering robot reply can be improved.

In an alternative embodiment, the session processing method may further include:

converting the current dialogue information into structural information under the current dialogue label;

obtaining the structural information of the target object under the history dialogue label;

and splicing the structured information under the current dialogue label and the structured information under the history dialogue label to construct the label information of the target object under the target event.

In practical application, the label information of the target object under the constructed target event can be filled with the corresponding structured information under each dialogue label, and the structured information under the dialogue labels is in a preset information format, so that the information formats under the same dialogue label are unified, and the storage, statistics and the like are convenient. For example, taking fig. 9 as an example, the tag information of the target object 1 under the construction target event may be as follows:

title: tag information of target object 1 under target event

Dialog tag 11: structured information 11;

dialog tag 12: structured information 12;

……

dialog tag 4n: structured information 4n.

In one possible implementation manner, the step of generating the reply message corresponding to the current session information according to the current session tag and the historical session tag may include:

and acquiring a reply template set configured under the current dialogue label. In practical application, a reply template set under each dialog label may be preset, and the reply template set may include at least one reply template.

Further, reply templates in the reply template set can be screened based on the historical dialog labels to obtain reply information. In one example, in the case that the history dialog tag is different from the current dialog tag, that is, none of the reply templates in the reply template set configured under the current dialog tag is used in the session process to ask for corresponding information, so that any reply template can be screened out from the reply template set; and generating reply information based on any reply template, for example, directly taking the reply template as the reply information, or carrying out expression conversion on the reply template in a description mode matched with the basic attribute information by combining the basic attribute information of the target object to obtain the reply information. The description herein may include, but is not limited to, presentation style, increasing mood words, etc. Or the reply template can be supplemented based on the dialogue elements in the current dialogue information, so that the reply information can be more complete and smooth. For example, the current dialogue information is "I am an XXX", if the reply template is "where is the address of the work place? "in this case, a reply template may be supplemented based on XXX, such as where the work site for XXX is commonly referred to as AAA, and a reply message may be generated" where is the address of AAA you work? ". The application is not limited in this regard.

Or, in the case that the history dialog tag is the same as the current dialog tag, that is, the reply template in the reply template set configured under the current dialog tag may have been used, in order to avoid repeated acquisition of the association information, the history question information of the question-answering robot for the history dialog tag in the session process, for example, the question information of the question-answering robot corresponding to the history dialog information may be acquired. Based on the method, the reply templates with different semantics from the historical questioning information can be screened out from the reply template set, namely, the reply templates in the same historical dialogue labels are found out and excluded. Thus, reply information can be generated based on screening reply templates different from the historical questioning information. For example, the different reply templates may be directly used as reply information, or the description mode matched with the basic attribute information of the target object is used to perform expression conversion on the different reply templates to obtain reply information, and the specific generation mode may refer to the corresponding content and is not described herein.

Optionally, if a reply template with a semantic difference from the historical questioning information is not screened from the reply template set, it may be stated that reply templates in the reply template set are all used, so that a target dialog tag set corresponding to the current dialog state may be obtained. Further, a target dialog tag may be screened from the set of target dialog tags; the target dialog tag may be a dialog tag in the set of target dialog tags that is different from the current dialog tag and the historical dialog tag; and generating reply information corresponding to the current dialogue information based on the target dialogue tag. For example, any reply template may be screened from the set of reply templates configured under the target dialog tag, and reply information may be generated based on the any reply template. For example, any reply template selected can be directly used as reply information, or the description mode matched with the basic attribute information is combined with the basic attribute information of the target object to perform expression conversion on any reply template selected to obtain the reply information, and the specific generation mode can be referred to the corresponding content and is not described herein.

Or if the target dialogue labels are not screened from the target dialogue label set, the completion of information acquisition under the current dialogue state is indicated, so that a preset dialogue flow corresponding to the target event can be acquired, wherein the preset dialogue flow comprises a plurality of dialogue states which are arranged in sequence. And then the next dialog state of the current dialog state in the sequence arrangement can be determined; and generating reply information corresponding to the current dialogue information based on the dialogue tag set corresponding to the next dialogue state. For example, any reply template may be screened from the reply template set configured under any dialog tag based on any dialog tag in the dialog tag set corresponding to the next dialog state, and reply information may be generated based on any reply template. For example, any reply template selected can be directly used as reply information, or the description mode matched with the basic attribute information is combined with the basic attribute information of the target object to perform expression conversion on any reply template selected to obtain the reply information, and the specific generation mode can be referred to the corresponding content and is not described herein.

Fig. 10 is a block diagram of a session processing apparatus according to an exemplary embodiment. Referring to fig. 10, the apparatus may include:

A content category determining module 1001, configured to determine, in response to current dialogue information of a target object in a session, a content category of a target event described by the current dialogue information;

a dialog element extraction module 1003, configured to perform dialog element extraction on the current dialog information based on the content category, to obtain a dialog element that matches the content category;

a current dialogue tag obtaining module 1005, configured to use the dialogue element to match in dialogue tag configuration information corresponding to the target event, so as to obtain a current dialogue tag matched with the current dialogue information;

the reply information generating module 1007 is configured to generate reply information corresponding to the current session information according to the current session tag and the historical session tag; the history dialogue tag is a dialogue tag matched with the history dialogue information of the target object in the conversation process.

In one possible implementation manner, the dialog tag configuration information includes dialog tag sets corresponding to each of a plurality of dialog states; the plurality of dialogue states are dialogue states in a preset conversation process corresponding to the target event, and the dialogue states are used for representing conversation phases in the preset conversation process; the apparatus may further include:

the current dialogue state acquisition module is used for acquiring a current dialogue state in which the current dialogue information is located, wherein the current dialogue state is one of the plurality of dialogue states;

correspondingly, the reply information generating module 1007 is further configured to generate, when the current session tag belongs to the session tag set corresponding to the current session state, reply information corresponding to the current session information according to the current session tag and the historical session tag.

In one possible implementation, the reply information generating module 1007 may include:

the reply template set acquisition unit is used for acquiring a reply template set configured under the current dialogue label;

and the reply acquisition unit is used for screening reply templates in the reply template set based on the history dialogue labels to obtain the reply information.

In one possible implementation manner, the reply acquiring unit may include:

the first reply acquisition subunit is used for screening any reply template from the reply template set under the condition that the history dialogue label is different from the current dialogue label; generating the reply information based on any reply template;

the second reply acquisition subunit is used for acquiring the history question information of the question-answering robot aiming at the history dialogue tag in the conversation process under the condition that the history dialogue tag is the same as the current dialogue tag; and screening out a reply template with the meaning different from that of the historical questioning information from the reply template set, and generating reply information based on the screened out reply template with the meaning different from that of the historical questioning information.

In a possible implementation manner, the second reply obtaining subunit is further configured to obtain a target dialog tag set corresponding to the current dialog state if a reply template with a semantic different from that of the history question information is not screened from the reply template set; screening target dialogue labels from the target dialogue label set; the target dialogue tag is a dialogue tag which is different from the current dialogue tag and the history dialogue tag in the target dialogue tag set; and generating the reply information corresponding to the current dialogue information based on the target dialogue tag.

In a possible implementation manner, the second reply obtaining subunit is further configured to obtain a preset session flow corresponding to the target event if the target session label is not selected from the target session label set, where the preset session flow includes a plurality of session states arranged in sequence; determining a next dialog state of the current dialog state in the ordering; and generating the reply information corresponding to the current dialogue information based on the dialogue tag set corresponding to the next dialogue state.

In one possible implementation, the current session tag obtaining module 1005 may include:

the dialogue label prediction unit is used for inputting the dialogue element into a dialogue label prediction model, and performing dialogue label prediction processing to obtain dialogue element characterization information;

the current dialogue tag obtaining unit is used for matching the dialogue element characterization information with the embedded characterization information of each dialogue tag in the dialogue tag configuration information to obtain a matched dialogue tag as a current dialogue tag.

In one possible implementation, the dialog element extraction module 1003 may include:

the dialogue element extraction unit is used for inputting the content category and the current dialogue information into the dialogue element extraction model, and performing dialogue element extraction processing to obtain dialogue elements matched with the content category;

The dialogue element extraction model is obtained by training a preset neural network based on dialogue training data, and each dialogue training data comprises sample sentence information, sample content categories and labeled element labels.

In one possible implementation, the session processing apparatus may further include:

the structured information conversion module is used for converting the current dialogue information into structured information under the current dialogue label;

the structured information acquisition module is used for acquiring structured information of the target object under the history dialogue label;

and the label information construction module is used for splicing the structured information under the current dialogue label and the structured information under the history dialogue label to construct the label information of the target object under the target event.

the training sample pair acquisition module is used for acquiring training sample pairs, and each training sample pair comprises a sample word segmentation and a reference word segmentation;

the first semantic feature extraction module is used for inputting the sample word segmentation into a first semantic model in a preset double-tower model, and carrying out semantic feature extraction processing to obtain first semantic features;

The second semantic feature extraction module is used for inputting the reference word segmentation into a second semantic model in a preset double-tower model, and carrying out semantic feature extraction processing to obtain second semantic features;

the loss information acquisition module is used for carrying out loss processing on the first semantic features and the second semantic features based on contrast learning to obtain loss information;

the training module is used for training the first semantic model according to the loss information until the training iteration condition is met, and the first semantic model corresponding to the training iteration condition is used as the dialog label prediction model.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 11 is a block diagram illustrating an electronic device for session processing, which may be a terminal, according to an exemplary embodiment, and an internal structure diagram thereof may be as shown in fig. 11. The electronic device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of session handling. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the electronic equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 11 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the electronic device to which the present inventive arrangements are applied, and that a particular electronic device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

Fig. 12 is a block diagram of an electronic device for session processing, which may be a server, and an internal structure diagram thereof may be as shown in fig. 12, which is shown in accordance with an exemplary embodiment. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of session handling.

It will be appreciated by those skilled in the art that the structure shown in fig. 12 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the electronic device to which the present inventive arrangements are applied, and that a particular electronic device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an exemplary embodiment, there is also provided an electronic device including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement a session handling method as in the embodiments of the present application.

In an exemplary embodiment, a computer readable storage medium is also provided, which when executed by a processor of an electronic device, enables the electronic device to perform the session processing method in the embodiment of the application. The computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In an exemplary embodiment, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of session handling in an embodiment of the application is also provided.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A session processing method, comprising:

2. The method of claim 1, wherein the dialog tag configuration information includes a set of dialog tags corresponding to each of a plurality of dialog states; the plurality of dialogue states are dialogue states in a preset conversation process corresponding to the target event, and the dialogue states are used for representing conversation phases in the preset conversation process; the method further comprises the steps of:

acquiring a current dialogue state in which current dialogue information is located, wherein the current dialogue state is one of the dialogue states;

the generating reply information corresponding to the current dialogue information according to the current dialogue label and the history dialogue label comprises the following steps:

and under the condition that the current dialogue label belongs to the dialogue label set corresponding to the current dialogue state, generating reply information corresponding to the current dialogue information according to the current dialogue label and the history dialogue label.

3. The method of claim 1, wherein generating reply information corresponding to the current session information based on the current session tag and the historical session tag comprises:

Acquiring a reply template set configured under a current dialogue label;

and screening reply templates in the reply template set based on the history dialogue labels to obtain the reply information.

4. The method of claim 3, wherein the screening reply templates in the reply template set based on the historical dialog tag to obtain the reply message comprises:

screening any reply template from the reply template set under the condition that the historical conversation label is different from the current conversation label; generating the reply information based on any reply template;

or under the condition that the history dialogue tag is the same as the current dialogue tag, acquiring history question information of the question-answering robot aiming at the history dialogue tag in the conversation process; and screening reply templates with semantics different from that of the historical questioning information from the reply template set, and generating the reply information based on the reply templates with semantics different from the historical questioning information.

5. The method according to claim 4, wherein the method further comprises:

if a reply template with the semantic difference with the historical questioning information is not screened from the reply template set, a target dialogue tag set corresponding to the current dialogue state is obtained;

Screening target dialogue labels from the target dialogue label set; the target dialogue tag is a dialogue tag which is different from the current dialogue tag and the history dialogue tag in the target dialogue tag set;

and generating the reply information corresponding to the current dialogue information based on the target dialogue tag.

6. The method of claim 5, wherein the method further comprises:

if the target dialogue labels are not screened from the target dialogue label set, acquiring a preset dialogue flow corresponding to the target event, wherein the preset dialogue flow comprises a plurality of dialogue states which are arranged in sequence;

determining a next dialog state of the current dialog state in the ordering;

and generating the reply information corresponding to the current dialogue information based on the dialogue tag set corresponding to the next dialogue state.

7. The method of claim 1, wherein the matching, using the dialog element, in the dialog tag configuration information corresponding to the target event, to obtain a current dialog tag that matches the current dialog information includes:

And matching the dialogue element characterization information with the embedded characterization information of each dialogue tag in the dialogue tag configuration information to obtain a matched dialogue tag serving as a current dialogue tag.

8. The method according to claim 1, wherein the extracting the dialog element from the current dialog information based on the content category to obtain the dialog element matching the content category includes:

inputting the content category and the current dialogue information into a dialogue element extraction model, and performing dialogue element extraction processing to obtain dialogue elements matched with the content category;

9. The method according to claim 1, wherein the method further comprises:

converting the current dialogue information into structural information under a current dialogue label;

obtaining the structured information of the target object under the history dialogue label;

10. The method of claim 7, wherein the method further comprises:

acquiring training sample pairs, wherein each training sample pair comprises a sample word segmentation and a reference word segmentation;

inputting the sample word segmentation into a first semantic model in a preset double-tower model, and extracting semantic features to obtain first semantic features;

inputting the reference word segmentation into a second semantic model in a preset double-tower model, and extracting semantic features to obtain second semantic features;

performing loss processing on the first semantic features and the second semantic features based on contrast learning to obtain loss information;

training the first semantic model according to the loss information until a training iteration condition is met, and taking the first semantic model corresponding to the condition meeting the training iteration condition as the dialog label prediction model.

11. A session processing apparatus, comprising:

12. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the session handling method of any of claims 1 to 10.

13. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the session handling method according to any one of claims 1 to 10.