CN114282549A

CN114282549A - Method and device for identifying root relation between information, electronic equipment and storage medium

Info

Publication number: CN114282549A
Application number: CN202110904251.2A
Authority: CN
Inventors: 杨振; 张笃振; 孟凡东
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2022-04-05

Abstract

The application relates to the technical field of computers, and discloses a method and a device for identifying a root cause relationship between information, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining emotion statement vectors corresponding to the historical statements by adopting a trained target information recognition model, establishing a graph network by adopting the target information recognition model, updating the emotion statement vectors corresponding to the historical statements based on the graph network, and respectively obtaining root cause relation recognition results of the historical statements based on the updated emotion statement vectors. Therefore, the emotion expressed by the historical sentences and the emotion influence caused by different triggers in the historical sentences can be considered at the global view, so that the emotion information in the historical sentences and the emotion influence caused by the historical sentences from different triggers are fused in the obtained recognition result, and the emotion root cause of the sentences to be replied can be effectively recognized in the interactive session context.

Description

Method and device for identifying root relation between information, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computers, and discloses a method and a device for identifying a root cause relationship between information, electronic equipment and a storage medium.

Background

With the development of intelligent technologies, intelligent devices have been able to interact with real objects in a variety of intelligent interaction ways.

Generally, in an intelligent interaction scenario, a smart device can reply to feedback information in a targeted manner in response to input information triggered by a real object.

However, in the related art, the smart device can only reply based on the input information currently triggered by the real object in a targeted manner, so that the content of the reply is separated from the context of the interactive session, the intrinsic meaning expressed by the input information cannot be determined, the content of the reply information floats from the literal meaning of the information, and the input information of the real object cannot be replied effectively.

Disclosure of Invention

The embodiment of the application provides a method and a device for identifying a root cause relationship between information, electronic equipment and a storage medium, which are used for solving the problems that the intrinsic meaning expressed by input information cannot be analyzed in the context of interactive conversation and only the literal meaning of characters can be analyzed.

In a first aspect, a method for identifying a root cause relationship between information is provided, including:

obtaining a sentence to be replied input by a target object, and obtaining each history sentence interacted with the target object in a specified history period when determining that the sentence to be replied contains an emotional semantic element;

obtaining emotion statement vectors corresponding to the historical statements by adopting a trained target information identification model, wherein each emotion statement vector is determined based on a statement vector corresponding to the corresponding historical statement and an attention result between preset emotion semantic labels;

establishing a graph network for representing the connection relation between each historical statement and each historical statement by adopting the target information identification model, and respectively updating the emotional statement vectors corresponding to the historical statements on the basis of the graph network and the trigger parties of the historical statements;

and respectively obtaining root factor relation recognition results between each corresponding historical statement and the statement to be replied by adopting the target information recognition model based on each updated emotional statement vector.

Optionally, the obtaining, by using the trained target information recognition model, an emotion statement vector corresponding to each of the historical statements includes:

sequencing each historical statement according to the input time associated with each historical statement, generating statement vectors corresponding to each historical statement based on each sequenced historical statement by adopting a trained target information recognition model, and acquiring each emotion semantic label vector, wherein each emotion semantic label vector is obtained by adjusting a randomly initialized emotion semantic label in the training process of the target information recognition model;

adopting the target information identification model, and respectively executing the following operations aiming at each historical statement:

generating an emotional attention vector corresponding to one historical statement based on each emotional semantic tag vector and a statement vector corresponding to the historical statement, wherein the emotional attention vector is used for representing an attention result of the historical statement corresponding to each emotional semantic tag;

and splicing the statement vector of the historical statement and the corresponding emotion attention vector to obtain the emotion statement vector corresponding to the historical statement.

Optionally, the generating, by using the trained target information recognition model, a statement vector corresponding to each historical statement based on the sorted historical statements includes any one of the following operations:

for each history statement, the following operations are respectively executed: after initial identification information and termination identification information are respectively added at the initial position and the termination position of a historical statement, an output vector generated corresponding to the initial identification information in the historical statement is used as a statement vector corresponding to the historical statement by adopting a bidirectional encoder BERT network based on a converter in the target information recognition model;

adopting a trained target information recognition model, and aiming at each historical statement, respectively executing the following operations: and generating a statement vector corresponding to one historical statement based on one historical statement by adopting a long and short memory (LSTM) network in the target information identification model.

Optionally, the generating an emotion attention vector corresponding to one history statement based on each emotion semantic tag vector and a statement vector corresponding to the history statement includes any one of the following operations:

determining an attention matrix of each historical statement corresponding to each emotion element label according to each obtained emotion semantic label vector and a statement vector corresponding to one historical statement by adopting an attention mechanism, and determining an emotion attention vector corresponding to one historical statement based on the attention matrix and each emotion semantic label vector;

by adopting a multi-head attention mechanism, a plurality of emotion attention sub-vectors corresponding to one historical statement are obtained in parallel by multiple heads, the obtained emotion attention sub-vectors are combined into an emotion attention vector, and when each emotion attention sub-vector is determined, the following operations are executed: according to the obtained emotion semantic label vectors and a statement vector corresponding to a history statement, determining an attention matrix of the history statement corresponding to the emotion semantic labels, and determining an emotion attention sub-vector corresponding to the history statement based on the attention matrix, the emotion semantic label vectors and configured parameters.

Optionally, the establishing a graph network for characterizing connection relationships between the historical statements and the historical statements includes:

respectively generating corresponding nodes aiming at the historical sentences, and respectively executing the following operations aiming at the nodes: establishing a connecting edge between one node and each node;

and generating a graph network for representing the connection relation between any one historical statement and each historical statement according to each node and each connection edge established for each node.

Optionally, the updating, based on the graph network and the trigger of each historical statement, the emotion statement vector corresponding to each historical statement respectively includes:

in the graph network, respectively determining each node corresponding to each historical statement, determining a trigger corresponding to each historical statement, and determining an emotion statement vector corresponding to each statement; the trigger party of each historical statement is the target object, or is an interactive object of the target object;

for each node, respectively executing the following operations:

determining each adjacent node which has a connection relation with one node in the graph network, and classifying each adjacent node according to a trigger party corresponding to each adjacent node to obtain a first class adjacent node which is pertinently configured with a first parameter set and a second class adjacent node which is pertinently configured with a second parameter set;

according to each type of adjacent nodes, the following operations are respectively executed: determining a target parameter set corresponding to a type of adjacent nodes in the first parameter set and the second parameter set, respectively determining edge weights between each adjacent node and the one node according to the target parameter set, an emotion statement vector corresponding to the one node and emotion statement vectors corresponding to the adjacent nodes in the type of adjacent nodes, and obtaining a weighting result of the emotion statement vectors corresponding to the adjacent nodes based on the obtained edge weights and the target parameter set;

and determining the updated emotion statement vector of the node according to the weighting result corresponding to each adjacent node in each adjacent node.

Optionally, the obtaining, by using the target information recognition model, root relationship recognition results between each corresponding history statement and the statement to be replied based on each updated emotion statement vector respectively includes:

respectively determining relative position vectors between the statements to be replied and the historical statements by adopting the target information identification model, wherein the relative position vectors are used for representing the statement input proximity between the historical statements and the statements to be replied;

and respectively determining root relation recognition results between each historical statement and the statement to be replied by adopting the target recognition model based on each updated emotion statement vector, each relative position vector and the statement vector corresponding to each historical statement.

Optionally, the determining the relative position vectors between the to-be-replied statement and the historical statements respectively includes:

sequencing according to the arrangement sequence of the input historical sentences, taking the difference value between the sequence number corresponding to the sentence to be replied and the sequence number corresponding to each historical sentence as the relative position information between the sentence to be replied and each corresponding historical sentence, and acquiring each relative position vector obtained by adjusting each piece of relative position information after random initialization processing when the training of the target information recognition model is completed;

alternatively, the first and second electrodes may be,

sequencing according to the arrangement sequence of the input historical sentences, taking the difference value between the sequence number corresponding to the sentence to be replied and the sequence number corresponding to each historical sentence as the relative position information between the sentence to be replied and each corresponding historical sentence, and obtaining each piece of relative position information after the target information recognition model is trained and after the target information recognition model is adjusted, obtaining each corresponding initial relative position vector;

for each initial relative position vector, performing the following operations: determining a piece of relative position information corresponding to an initial relative position vector, obtaining position weights between the piece of relative position information and each piece of obtained relative position information, and performing vector weighted fusion on each corresponding initial relative position vector based on each obtained position weight to obtain a corresponding relative position vector, wherein the position weights are determined in advance by adopting a radial product RBF kernel algorithm based on differences among the pieces of relative position information.

Optionally, the determining that the sentence to be replied includes the emotion semantic element includes any one of the following operations:

respectively acquiring a keyword set corresponding to each preset emotion semantic tag, performing semantic analysis on a to-be-replied sentence input by a target object based on each acquired keyword set, and determining that the to-be-replied sentence input by the target object comprises a corresponding emotion semantic element when the to-be-replied sentence is successfully matched with at least one keyword;

performing emotion analysis on the to-be-replied sentence input by the target object by adopting a preset emotion analysis model to obtain a classification result of each preset emotion semantic label, and determining that the to-be-replied sentence contains corresponding emotion semantic elements based on the classification result.

Optionally, the obtaining of each history statement interacted with the target object in the specified history period includes any one of the following operations:

determining a time period between the starting time associated with the current conversation and the current time as a historical time period, and acquiring each historical statement interacted with the target object in the specified historical time period;

historical conversation information with the target object, when the time interval between the starting time related to the current conversation and the ending time related to one historical conversation is determined not to exceed a set time threshold, the time period between the starting time and the current time of the historical conversation is determined as a historical time period, and each historical statement interacted with the target object in the specified historical time period is obtained.

Optionally, the target information recognition model is trained in the following manner:

acquiring a training sample set, wherein one training sample comprises at least one sample statement, the at least one sample statement comprises a sample interactive statement determined based on a target sample statement, the sample interactive statement is set in a historical period, the sample interactive statement comprises labeled real root cause relation information between the target sample statement and the at least one sample statement, and the target sample statement comprises an emotion semantic element;

performing multiple rounds of iterative training on an information recognition model to be trained by adopting the training sample set until a preset convergence condition is met, and taking the information recognition model output in the last round as a target information recognition model, wherein the following operations are executed in the process of one round of iterative training:

inputting each sample statement included in the acquired training sample into the information identification model to obtain predicted root cause relationship information corresponding to each sample statement, wherein the predicted root cause relationship information corresponding to each sample statement represents the probability that the corresponding sample statement is the root cause of the corresponding emotional semantic element;

and determining a corresponding loss value based on the obtained predicted root relation information and the corresponding real root relation information by adopting a cross entropy loss function, and carrying out parameter adjustment on the information identification model based on the loss value.

In a second aspect, an apparatus for identifying a root cause relationship between information is provided, including:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a sentence to be replied input by a target object and acquiring each history sentence interacted with the target object in a specified history period when determining that the sentence to be replied contains an emotional semantic element;

the determining unit is used for acquiring emotion statement vectors corresponding to the historical statements by adopting the trained target information identification model, wherein each emotion statement vector is determined based on the statement vector corresponding to the corresponding historical statement and the attention result between the preset emotion semantic labels;

the establishing unit is used for establishing a graph network for representing the connection relation between each historical statement and each historical statement by adopting the target information identification model, and respectively updating the emotion statement vectors corresponding to each historical statement on the basis of the graph network and the trigger party of each historical statement;

and the identification unit is used for respectively obtaining root relation identification results between each corresponding historical statement and the statement to be replied based on each updated emotional statement vector by adopting the target information identification model.

Optionally, when obtaining the emotion statement vectors corresponding to the history statements respectively by using the trained target information recognition model, the determining unit is configured to:

Optionally, when the trained target information recognition model is used to generate the statement vector corresponding to each historical statement based on the sorted historical statements, the determining unit is configured to perform any one of the following operations:

Optionally, when the emotion attention vector corresponding to one history statement is generated based on each emotion semantic tag vector and the statement vector corresponding to the history statement, the determining unit is configured to perform any one of the following operations:

Optionally, when the graph network for characterizing the connection relationship between each historical statement and each historical statement is established, the establishing unit is configured to:

Optionally, when the emotion statement vectors corresponding to the respective historical statements are updated based on the graph network and the respective triggers of the respective historical statements, the establishing unit is configured to:

for each node, respectively executing the following operations:

Optionally, when the target information recognition model is adopted and root cause relationship recognition results between each corresponding history statement and the statement to be replied are respectively obtained based on each updated emotion statement vector, the recognition unit is configured to:

Optionally, when the relative position vectors between the to-be-replied statement and the historical statements are respectively determined, the identifying unit is configured to:

alternatively, the first and second electrodes may be,

Optionally, when it is determined that the to-be-replied sentence includes an emotion semantic element, the obtaining unit is configured to perform any one of the following operations:

Optionally, when obtaining each history statement interacted with the target object in the specified history period, the obtaining unit is configured to perform any one of the following operations:

Optionally, the apparatus further includes a training unit, where the training unit is configured to train to obtain a target information recognition model by using the following method:

In a third aspect, an electronic device is provided, which includes a processor and a memory, where the memory stores program codes, and when the program codes are executed by the processor, the processor is caused to execute the steps of the method for identifying root cause relationships between information provided in the embodiment of the present application.

In a fourth aspect, a computer-readable storage medium is provided, which includes program code for causing an electronic device to perform the steps of the method for identifying root cause relationships between information provided by the embodiments of the present application when the program product runs on the electronic device.

The beneficial effect of this application is as follows:

the embodiment of the application provides a method and a device for identifying a root cause relationship between information, electronic equipment and a storage medium. By means of the target information identification model, the attention calculation of each historical statement on each emotion semantic label is executed respectively, so that the tendency degree of each historical statement on emotion expressed by each emotion semantic label can be determined, emotion statement vectors containing emotion distribution of the historical statements are obtained, an image network representing the connection relation between each historical statement and each historical statement is established through the target information identification model, and the emotion statement vectors of different triggers are associated in the aggregate image network by adopting an image attention mechanism so as to fuse emotion influences caused by different triggers.

Therefore, the emotion expressed by the historical sentences and the emotion influence caused by different triggers in the historical sentences can be considered in a global view, so that the emotion information in the historical sentences and the emotion influence generated by the historical sentences from different triggers in the obtained recognition result are fused, the emotion root cause of the sentences to be replied can be effectively recognized in the interactive session context, the emotion perception capability of real objects is improved, the effectiveness and pertinence of sentence reply are improved, and meanwhile, the interactive experience of the real objects can be assisted to be improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1a is a schematic diagram of an application scenario in an embodiment of the present application;

FIG. 1b is a schematic diagram of an operable page for sentence input in the embodiment of the present application;

FIG. 1c is a schematic diagram of an interaction page in an embodiment of the present application;

FIG. 2a is a schematic diagram of a model architecture of an information recognition model to be trained in an embodiment of the present application;

FIG. 2b is a schematic diagram of a first processing network employing a BERT network in an embodiment of the present application;

FIG. 3a is a schematic flowchart of a process of training an information recognition model to be trained in an embodiment of the present application;

FIG. 3b is a schematic flowchart of a round of iterative training process in an embodiment of the present application;

FIG. 4a is a schematic diagram illustrating a process of identifying root cause relationships between information in an embodiment of the present application;

FIG. 4b is a diagram illustrating generation of an emotion statement vector in the embodiment of the present application;

FIG. 4c is a schematic flow chart of the processing device obtaining the emotional attention sub-vector in the embodiment of the present application;

FIG. 4d is a schematic diagram of a graph network generated in an embodiment of the present application;

FIG. 4e is a schematic flow chart illustrating updating of each node in the graph network according to the embodiment of the present application

FIG. 4f is a schematic flow chart illustrating the process of determining a root cause relationship between a history statement and a to-be-replied statement according to an embodiment of the present application;

fig. 5 is a schematic logical structure diagram of an apparatus for identifying root cause relationships between information according to an embodiment of the present application;

fig. 6 is a schematic diagram of a hardware component of an electronic device to which an embodiment of the present application is applied;

fig. 7 is a schematic structural diagram of a computing device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the technical solutions of the present application. All other embodiments obtained by a person skilled in the art without any inventive step based on the embodiments described in the present application are within the scope of the protection of the present application.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

Emotion semantic labeling: the Emotion semantic label is equivalent to classifying different Emotion types which can be perceived or expressed by individuals with Emotion perception, and in the embodiment of the application, when a target information recognition model is obtained by training data in an Emotion reason Cause in conversation (RecCON) data set, the Emotion label defined in the RECCON data set can be used; alternatively, when the target information recognition model is obtained by using the customized utterance training, the emotion label can be customized.

Emotion semantic elements: the emotion semantic elements refer to abstract emotion contents which can be expressed in sentences or various types of information, in the embodiment of the application, the emotion semantic elements refer to contents which can assist in determining emotion tendencies and determining corresponding emotion semantic labels in the sentences when the sentences are analyzed, and the sentences are considered to be neutral when the emotion semantic elements cannot be analyzed from the sentences.

The root cause relationship represents the cause of the emotion semantic element contained in the to-be-replied sentence, the root cause relationship represents the content of the previous history sentence and causes the corresponding relationship of the emotion semantic element contained in the subsequent to-be-replied sentence, when the previous history sentence is determined to be the cause of the emotion semantic element contained in the to-be-replied sentence, the previous history sentence can be used as the root cause of the emotion corresponding to the emotion semantic element generated by the subsequent to-be-replied sentence, and meanwhile, the root cause relationship is considered to exist between the previous history sentence and the subsequent to-be-replied sentence.

Embodiments of the present application relate to Artificial Intelligence (AI) and machine learning techniques, and are designed based on computer vision techniques and Machine Learning (ML) in the AI.

Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The target information identification model in the identification method of the root relationship among the information provided in the embodiment of the application belongs to machine learning models, the models relate to the technical field of machine learning, and the target information identification model can be obtained through the technology of machine learning in a trainable mode.

The following briefly introduces the design concept of the embodiments of the present application:

in the related technology, in an intelligent interactive scene, the intelligent device can only perform targeted reply based on the to-be-replied statement currently triggered by the real object, so that when the to-be-replied statement with emotion expression is input by the real object, the reason for emotion generation in the to-be-replied statement can not be determined according to each historical statement in the conversation process from a global perspective, the reason for emotion change of the real object can not be accurately sensed in the context of interactive conversation, further effective reply can not be performed on the to-be-replied statement, and the interactive experience of the real object is reduced.

In view of this, in the embodiment of the present application, by means of the target information identification model, the attention calculation of each historical statement for each emotion semantic tag is respectively performed, so that the tendency degree of each historical statement for the emotion expressed by each emotion semantic tag can be determined, an emotion statement vector including emotion distribution of the historical statements is obtained, an image network representing the connection relationship between each historical statement and each historical statement is established through the target information identification model, and each emotion statement vector associated with different triggers in the aggregate image network is controlled by using the image attention machine, so as to fuse the emotion influences caused by different triggers.

The preferred embodiments of the present application will be described in conjunction with the drawings of the specification, it should be understood that the preferred embodiments described herein are for purposes of illustration and explanation only and are not intended to limit the present application, and features of the embodiments and examples of the present application may be combined with each other without conflict.

Fig. 1a is a schematic view of an application scenario in the embodiment of the present application. The application scenario diagram includes two terminal devices 110 and one

processing device

120, 1101 is an operable page on the terminal device, and the terminal device 110 and the server 120 can communicate with each other through a communication network.

In an alternative embodiment, the communication network is a wired network or a wireless network. The terminal device 110 and the server 120 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

In the embodiment of the present application, the terminal device 110 is an electronic device used by a target object, and the electronic device may be a computer device having a certain computing capability and running instant messaging software and a website or social software and a website, such as a personal computer, a mobile phone, a tablet computer, a notebook, a desktop computer, a smart watch, an e-book reader, and the like, but is not limited thereto.

The processing device 120 may be an electronic device with specific processing capability, such as a computer, a notebook, a desktop, or may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, an information recommendation Network (CDN), and a big data and artificial intelligence platform.

The terminal device 110 and the processing device 120 are connected directly or indirectly by wired or wireless communication, and the application is not limited herein. The terminal device 110 is installed with an application related to intelligent interaction with a processing device, and presents an operable page 1101, the application related to the embodiment of the present application may be software, or a client such as a web page or an applet, and the processing device 120 may be a backend server corresponding to the software, or the web page or the applet, or another server capable of obtaining backend data.

In this embodiment of the application, when identifying the root cause relationship between information, the processing device 120 may receive a to-be-replied sentence initiated in the terminal device 110, and further, when determining that the to-be-replied sentence includes an emotional semantic element, obtain each history sentence, analyze the root cause relationship between each history sentence and the to-be-replied sentence, and determine, based on the obtained root cause relationship, a reason that the to-be-replied sentence includes the emotional semantic element.

Referring to fig. 1b, which is a schematic diagram of an operable page for sentence input in the embodiment of the present application, an operable page 1101 on the terminal device 110 at least includes an area for displaying an interactive sentence and an area for sentence input including a target object.

In some possible embodiments, the method and the device can be applied to a scene of intelligent interaction, for example, a scene of man-machine conversation, the sentence to be replied input by the target object is analyzed, and in the process of generating the reply sentence, the history sentence causing emotional influence on the sentence to be replied is globally perceived, so that the emotional comprehension capability of the target object is improved, and the effective reply sentence can be generated. Referring to FIG. 1c, which is a schematic diagram of an interactive page in an embodiment of the present application, a processing device is capable of recognizing a to-be-replied sentence "Wawa!input by a current target object based on the interactive content illustrated in FIG. 1 c! What is good and big when snowing and what is going out and scattering? "in, the emotion of" happy "can be expressed, and based on the processing of the processing device, it can be determined that the emotion reason corresponding to the sentence to be replied is the history sentence" i like winter "that was previously input.

In some possible embodiments, the application can be applied to a dialogue translation scene, so that the emotion understanding capability of a dialogue translation system is improved, and translation content which is more in line with an interaction context is generated.

In a possible application scenario, in order to reduce the communication delay, the processing device 120 may be deployed in each area, or in order to balance the load, different processing devices 120 may respectively serve the areas corresponding to the terminal devices 110. The plurality of processing devices 120 enable sharing of data by a blockchain, and the plurality of processing devices 120 correspond to a data sharing system composed of a plurality of servers. For example, a terminal device 110 is located at site a and is communicatively coupled to a processing device 120, and another terminal device 110 is located at site b and is communicatively coupled to another processing device 120.

Each processing device 120 in the data sharing system has a node identifier corresponding to the processing device 120, and each processing device 120 in the data sharing system may store the node identifiers of other processing devices 120 in the data sharing system, so that the generated block is broadcast to other processing devices 120 in the data sharing system according to the node identifiers of other processing devices 120. Each processing device 120 may maintain a node identification list as shown in the following table, and store the name of the processing device 120 and the node identification in the node identification list. The node identifier may be an Internet Protocol (IP) address and any other information that can be used to identify the node, and table 1 only illustrates the IP address as an example.

TABLE 1

Node name	Node identification
		Node
1	117.114.151.174
		Node 2	117.116.189.145
…	…
		Node N	119.123.789.258

The following describes an identification process of root relationship between pieces of information in the embodiment of the present application with reference to the accompanying drawings, where the identification process of root relationship between pieces of information in the embodiment of the present application is applicable to the processing device 120 shown in fig. 1 a:

in the embodiment of the application, when the processing device determines that the to-be-replied sentence input by the target object includes the emotion semantic element, the trained target information recognition model is adopted, and the root relation recognition result between each history sentence and the to-be-replied sentence is recognized and obtained based on each obtained history sentence.

It should be noted that, for the information recognition model to be trained and the target information recognition model obtained after the training is completed, since the model architecture of the information recognition model is not changed before and after the training, and the changed model architecture is the parameter in the information recognition model, only the model architecture of the information recognition model to be trained is schematically described below.

Referring to fig. 2a, which is a schematic diagram of a model architecture of an information recognition model to be trained in the embodiment of the present application, an algorithm model architecture of the information recognition model to be trained is described below with reference to fig. 2 a:

in the embodiment of the present application, the constructed information recognition model to be trained is composed of a first processing network, a second processing network, a third processing network, and a fourth processing network, wherein,

1. and the first processing network is used for receiving the input historical statement sequence, coding each historical statement according to the arrangement sequence of each historical statement in the historical statement sequence, and further respectively generating corresponding statement vectors corresponding to each historical statement.

In the embodiment of the present application, the first processing network may be a Bidirectional Encoder (BERT) network based on a converter, or the first processing network may be a Long Short Term Memory (LSTM) network.

In the following, the first processing network is schematically illustrated by using a BERT network as an example only:

referring to fig. 2b, which is a schematic diagram of a first processing network that employs a BERT network in the embodiment of the present application, a processing device adds a keyword "CLS" at a start position of each historical statement, and adds a keyword "SEP" at an end position of each historical statement, and further employs the BERT network to perform coding processing on each historical statement, so as to obtain a vector form that can represent a context and location relationship of each historical statement. In the illustration of fig. 2b, the dashed and solid lines in the architecture of the information recognition model schematically represent historical statements corresponding to the same trigger.

It should be noted that, in the embodiment of the present application, when an information recognition model to be trained is built, a pre-trained BERT network may be directly obtained and used as a first processing network in a target recognition model, so that fine tuning is performed in a subsequent model training process.

In specific implementation, the L (U) shape is obtained after the history sentences are sequenced according to the sequence of the history sentences input in the interactive session_t)＝(U₁、U₂…U_i…U_t-1、U_t) For incoming L (U) of the BERT network_t) Respectively performing the following operations: extending a history statement to be "[ CLS ]]"to start with" [ SEP]"is the history statement of the end.

Furthermore, the segment embedding vector of the interval is adopted to distinguish different historical sentences in the historical sentence sequence, for example, the historical sentences with odd serial numbers in the historical sentence sequence are assigned with preset segment embedding directionsQuantity E_AAnd distributing preset segmented embedded vectors E to the historical sentences with even sequence numbers in the historical sentence sequence_BWherein E is_AAnd E_BThe history sentences are used for representing different positions in the history sentence sequence. And then determining input representation of words at each position in the historical sentence sequence, wherein one historical sentence comprises at least one word, and the input representation of the word is obtained by fusing corresponding word embedding vectors, segment embedding vectors and position embedding vectors. Further, by means of the BERT network, output representations corresponding to the words in the positions are obtained, and the [ CLS ] in the history sentences is displayed]"corresponding output means a statement vector as a corresponding historical statement, based on" [ CLS]The determined statement vector is fused with semantic information of each word in the corresponding historical statement.

It should be noted that in the embodiment of the present application, each history statement that is simultaneously input is respectively extended by using "[ CLS ]" and "[ SEP ]" and corresponding to the output representation generated by "[ CLS ]" can represent semantic information in each subsequent history statement starting from the "[ CLS ]", and meanwhile, since the input representation of a word in each generated history statement is obtained by embedding a vector based on the position of the word, the generated output representation incorporates the position relationship of different history statements, that is, incorporates the input precedence relationship of different history statements, so that the obtained statement vector can represent the context position relationship between the corresponding history statement and other history statements.

2. The second processing Network may be an Emotion Attention Network (EAN), and may be configured to determine, based on a statement vector corresponding to each historical statement and each Emotion semantic tag vector obtained by processing each Emotion semantic tag in advance by using a random initialization algorithm, an Emotion tendency of each historical statement to each Emotion semantic tag, so that an Emotion distribution corresponding to each historical statement can be represented, an Emotion Attention vector corresponding to each historical statement can be obtained, and the Emotion Attention vector and the statement vector are spliced to obtain an Emotion statement vector.

The history sentences can be processed into history sentences each including a rich emotion expression by the processing of the second processing network. In the initially established second processing network, the emotion semantic label vectors corresponding to the emotion semantic labels are initialized by adopting a random initialization algorithm, the initialized emotion semantic label vectors can be stored in a preset emotion semantic label embedding query table, and the emotion semantic label vectors in the emotion semantic label embedding query table can be learned and adjusted in the training process of the information recognition model.

The third processing Network may be defined as a Relation-aware Graph Attention Network (RGAT) for characterizing different trigger party relations between the history statements, and determining, by using an RGAT mechanism, emotional influences generated by different trigger parties based on respective corresponding emotional statement vectors of the history statements, where a trigger party is an interactive object for sending a history statement.

A fourth processing network: and the root relation prediction module is used for predicting the root relation, and obtaining corresponding root relation recognition results for each historical statement through the statement vector corresponding to each historical statement and each updated emotion statement vector obtained by the third processing network processing.

In the embodiment of the present application, the content of each processing network in the information identification model will be described in detail in the subsequent process, which is not described herein again.

Further, the processing device trains the information recognition model to be trained to obtain a trained target information recognition model.

Referring to fig. 3a, a schematic flowchart of a process of training an information recognition model to be trained in the embodiment of the present application is shown, and a description is given below with reference to fig. 3a of the process of training an information recognition model to be trained.

Step 301: the processing device obtains a set of training samples.

In this embodiment of the application, the processing device may generate a training sample set based on a recon data set, where one training sample includes at least one sample statement, the at least one sample statement includes a sample interactive statement determined based on a target sample statement, the sample interactive statement is set in a historical period, the sample interactive statement includes real root cause relationship information between a labeled target sample statement and the at least one sample statement, the target sample statement includes an emotion semantic element, and the target sample statement and the sample interactive statement are collectively referred to as the sample statement.

Specifically, the processing device may determine, according to actual processing needs, a target sample statement containing an emotion semantic element in the recon data set, and determine each sample interaction statement interacted before the target sample statement in a set history period.

It should be noted that the set history period may be a period of time up to the input time associated with the target sample statement, and the set history period refers to a period of time in the past. In specific implementation, the set history period may be limited to a period from the start time of the current session to the time after the target sample statement is currently sent, so as to represent that each obtained sample interactive statement and the target sample statement are in the same session, thereby ensuring the relevance between the obtained sample statements; or, according to the actual processing requirement, a value of a time length corresponding to the set history period may be determined, and the time length up to the input time associated with the stop target sample statement may be determined as the set history period with the determined time length.

Therefore, in consideration of the influence of interaction time on the sentences to be replied, the sample sentences in the set historical time period are obtained for training, and the number of the sample sentences in one training sample is not particularly limited, so that the number of input sentences is not limited in a target information recognition model obtained by subsequent training, and the expandability of the model is improved.

Step 302: and the processing equipment adopts the training sample set to perform multi-round iterative training on the information recognition model to be trained until a preset convergence condition is met, and takes the information recognition model output in the last round as a target information recognition model.

Specifically, the processing device performs multiple rounds of iterative training on the information recognition model to be trained based on the obtained training sample set until a preset convergence condition is met, where the preset convergence condition may be any one of the following conditions: training rounds of training the information recognition model reach a first set threshold; or the number of times that the loss value of the information identification model is continuously lower than the second set threshold reaches the specified threshold, wherein the values of the first set threshold, the second set threshold and the specified threshold are set according to actual processing requirements, and the method and the device are not limited too much here.

Referring to fig. 3b, which is a schematic flow chart of a round of iterative training process in an embodiment of the present application, the following describes operations performed by a processing device in the round of iterative training process with reference to fig. 3 b:

step 3021: and the processing equipment inputs each sample statement included in the obtained training sample into the information identification model to be trained to obtain the corresponding prediction root relation information of each sample statement.

In the embodiment of the application, a training sample consists of a target sample sentence containing emotion semantic elements, various sample interactive sentences determined according to the target sample sentence, and real root relation information between the labeled target sample sentence and the various sample sentences. In an actual training process, the processing device may input only each sample sentence in one training sample in a training process according to an actual configuration requirement, or input each sample sentence in a plurality of training samples in a training process, which is not limited herein. Meanwhile, when a plurality of training samples are used for training in one training process, the same processing operation may be executed in parallel for each training sample, so in the following description of the present application, the training process of the information recognition model is described by taking only one training sample as an example for training in one training process.

The processing equipment adopts an information identification model, and respectively generates corresponding predicted root relation information based on each sample statement in a training sample, wherein the predicted root relation information corresponding to each sample statement is used for representing the probability that the corresponding sample statement is the root of the emotional semantic element in the target sample statement. The process of specifically generating root cause relationship information will be described in detail in the subsequent flow, and will not be described herein again.

Step 3022: and the processing equipment adopts a cross entropy loss function, determines a corresponding loss value based on the obtained predicted root relation information and the corresponding real root relation information, and performs parameter adjustment on the information identification model based on the loss value.

The processing equipment determines a loss value of the information recognition model based on information difference between each piece of predicted root relation information and corresponding real root relation information by adopting a cross entropy loss function after obtaining each piece of predicted root relation information output by the information recognition model corresponding to one training sample in one training process, and adjusts parameters in the information recognition model based on the determined loss value.

In this way, the processing device may perform iterative training for multiple times by using the iterative training flow illustrated in fig. 3b, and then obtain the trained target information recognition model, so that in the subsequent processing process, the root relationship between information can be recognized based on the obtained target information recognition model.

Referring to fig. 4a, which is a schematic diagram illustrating a process of identifying a root relationship between information in an embodiment of the present application, a process of performing identification of a root relationship between information will be described with reference to fig. 4 a:

step 401: the processing equipment obtains the sentences to be replied input by the target object, and obtains each history sentence interacted with the target object in a specified history period when determining that the sentences to be replied contain the emotional semantic elements.

In the embodiment of the application, in the process of sentence interaction between the processing equipment and the target object, the processing equipment executes analysis operation on the sentence to be replied every time one sentence to be replied input by the target object is obtained so as to determine whether the sentence to be replied contains the emotion semantic element.

Specifically, the processing device may determine whether the to-be-replied sentence includes an emotion semantic element by using any one of the following manners:

and in the first mode, the processing equipment judges that the sentence to be replied contains corresponding emotion semantic elements according to the matching condition of the keywords associated with the emotion semantic tags and the sentence to be replied.

Specifically, the processing device respectively acquires a keyword set corresponding to each preset emotion semantic tag, performs semantic analysis on the to-be-replied sentence input by the target object based on each acquired keyword set, and determines that the to-be-replied sentence input by the target object includes a corresponding emotion semantic element when it is determined that the to-be-replied sentence is successfully matched with at least one keyword.

It should be noted that, in the embodiment of the present application, there is a corresponding relationship between the emotion semantic elements and the emotion semantic tags, the emotion semantic elements are bases for measuring emotional tendencies of the to-be-replied sentences, and the to-be-replied sentences including the emotion semantic elements may be assigned with corresponding emotion semantic tags.

And secondly, determining the emotion semantic label corresponding to the sentence to be replied by adopting a preset emotion analysis model, and further determining the emotion semantic element corresponding to the emotion semantic label contained in the sentence to be replied.

Specifically, the processing device may perform emotion analysis on the to-be-replied sentence input by the target object by using a preset emotion analysis model to obtain a classification result for each preset emotion semantic tag, and determine that the to-be-replied sentence includes corresponding emotion semantic elements based on the classification result, where performing emotion analysis on the sentence by using the emotion analysis model is a conventional technique in the art and will not be described herein.

Therefore, by performing emotion analysis on the sentence to be replied, the emotion semantic elements contained in the sentence to be replied can be determined, and a basis is provided for determining whether the sentence to be replied has an emotional tendency.

Further, when determining that the sentence to be replied input by the target object contains the emotional semantic element, the processing device acquires each history sentence interacted with the target interactive object within a set history time period.

Specifically, when the processing device determines each history statement interacted with the target object within the set history time period, the processing device may adopt an operation defined by any one of the following processing manners:

and a processing mode a, the processing equipment appoints a history time period according to the starting time and the duration time of the current conversation and acquires each history statement interacted with the target object in the appointed history time period.

Specifically, when the processing device determines that the currently obtained sentence to be replied input by the target object contains the emotion semantic element, the processing device determines a time period between the starting time of the current conversation and the current time as a history time period, and acquires each history sentence interacted with the target object in the specified history time period.

It should be noted that, in the embodiment of the present application, the determined set history period is a closed interval, and the current time refers to the current time for receiving the to-be-replied sentence, and as the application intends to determine that the to-be-replied sentence includes the emotional semantic element, a root cause causing the to-be-replied sentence to include the emotional semantic element is determined in each history sentence generated by the current session, a time delay between the processing device receiving the to-be-replied sentence and the target object sending the to-be-replied sentence may be ignored.

For example, assuming that the starting time of the current conversation is 13:10:44 in the human-computer interaction process between the processing device and the target object, the current time of the sentence to be replied is obtained, that is, the input time associated with the sentence to be replied is 13:14:26, and 13:10:44-13:14:26 is taken as a certain set history period, then each history sentence in the set history period refers to each history sentence with the associated input time in the range of 13:10:44-13:14:26, and 13:10:44-13:14:26 can be regarded as a time closed interval.

And b, the processing mode b is that the processing equipment appoints a history time period according to the approaching condition between the current conversation and the history conversation, and acquires each history statement interacted with the target object in the appointed history time period.

The processing equipment determines historical conversation information between the processing equipment and the target object after receiving the to-be-replied sentences which are sent by the target object and contain the emotional semantic elements, determines a time period between the starting time of a historical conversation and the current time as a historical time period when the time interval between the starting time of the current conversation and the ending time of the historical conversation does not exceed a set time threshold, and acquires each historical sentence interacted with the target object in the specified historical time period.

Specifically, considering that a history sentence representing a root cause of an emotional semantic element in a sentence to be replied may exist in a history conversation which is very close to the current conversation interval time, in the implementation of the present application, a history interaction record with a target object may be obtained, a time interval between the end time of an adjacent history conversation and the start time of the current conversation is determined, and when the time interval between the end time of the adjacent history conversation and the start time of the current conversation does not exceed a set time threshold, a time period between the start time of the adjacent history conversation and the current time of receiving the sentence to be replied may be designated as a history period, and all interactive sentences generated by the history conversation and the current conversation may be used as the respective history sentences interacted with the target object in the designated history period.

In this way, by means of the specified historical time period, the time range in which the root cause statement of the to-be-replied statement is located is determined, and meanwhile, each historical statement which may have root cause relation with the to-be-replied statement is defined.

Step 402: and the processing equipment adopts the trained target information identification model to obtain the emotion statement vectors corresponding to the historical statements.

In the embodiment of the application, after obtaining each history statement, the processing device sorts each history statement according to the sequence of the history statement in the conversation process, wherein the sequence of the history statement in the conversation process is determined according to the input time associated with the history statement.

It should be noted that, for each obtained history statement, the trigger corresponding to the history statement may be a target object, or may be an interactive object of the target object, and thus the input time associated with the history statement may specifically refer to the time when the corresponding history statement is input into the session by the corresponding trigger.

In specific implementation, the processing device sequences the historical sentences according to input time associated with the historical sentences, generates sentence vectors corresponding to the historical sentences based on the sequenced historical sentences by adopting a trained target information recognition model, and acquires emotion semantic label vectors, wherein each emotion semantic label vector is obtained by adjusting a randomly initialized emotion semantic label in the training process of the target information recognition model.

In the embodiment of the application, the emotion semantic label is denoted as e_kCorresponds to e_kThe generated emotion semantic label vector is marked as

And e_kThere is a correspondence between, where K:

the embedded query table records emotion semantic labels and corresponding emotion semantic label vectors, wherein the processing equipment adopts a random initialization processing mode to generate initial emotion semantic label vectors corresponding to the emotion semantic labels, adjusts the initial emotion semantic label vectors in the training process of the target information identification model, and stores the emotion semantic label vectors obtained corresponding to the emotion semantic labels into the embedded query table after the target information identification model is obtained. The number of emotion semantic tags is | E | in total, and corresponding labels are respectively: k is 1, …,E, the whole of each emotional semantic tag vector is expressed as

Wherein the content of the first and second substances,

represents | E | × d_hReal number matrix of dimension, dimension of one emotion semantic label vector being 1 × d_h。

In the embodiment of the present application, in consideration of that the configured first network in the target information identification model may be a BERT network or an LSTM network, when generating statement vectors corresponding to respective historical statements, there are the following two processing manners:

1. and obtaining statement vectors corresponding to the historical statements based on a BERT network.

The processing device respectively executes the following operations for each history statement: after initial identification information and termination identification information are respectively added at the initial position and the termination position of a history statement, an output vector generated corresponding to the initial identification information in the history statement is used as a statement vector corresponding to the history statement by adopting a BERT network in a target information recognition model.

In the embodiment of the present application, after adding start identification information "[ CLS ]" to the start position of each history statement, and adding end identification information "[ SEP ]" to the end position of each history statement, according to the processing method of the BERT network in the related art, the input representation of the word at each position is obtained based on the word embedding vector, the segment embedding vector, and the position embedding vector corresponding to the word at each position in each history statement, and further, the output representation of the word at each position is obtained based on the BERT network, and the output representation corresponding to "[ CLS ]" in each history statement is used as the statement vector of each corresponding history statement.

2. And obtaining statement vectors corresponding to the historical statements based on the LSTM network.

Specifically, the processing device may adopt the trained target information recognition model, and perform the following operations for each history statement: and generating a statement vector corresponding to the historical statement based on the historical statement by adopting an LSTM network in the target information identification model.

The generation of the statement vector corresponding to the statement based on the LTSM model is a conventional technique in the art, and is not described herein again.

Further, after the processing device obtains statement vectors corresponding to the historical statements and emotion semantic label vectors, the processing device obtains emotion statement vectors corresponding to the historical statements by using the target information identification model.

In specific implementation, the following steps may be adopted to determine the emotion statement vectors corresponding to the history statements:

referring to fig. 4b, which is a schematic diagram of generating emotion statement vectors in the embodiment of the present application, a process of generating corresponding emotion statement vectors for each history statement is described below with reference to fig. 4b, in an actual processing process, a target information recognition model is used to perform synchronous processing on each history statement, and a statement vector matrix corresponding to each history statement is used as a whole for processing in a subsequent formula, so as to synchronously calculate a result corresponding to each history statement, and for convenience of description, only the generation of corresponding emotion statement vectors based on one history statement is described below as an example:

step 4021: and the processing equipment generates an emotional attention vector corresponding to the historical statement based on each emotional semantic tag vector and the statement vector corresponding to the historical statement.

Specifically, when step 4021 is executed, the following two calculation methods may be adopted to calculate an emotional attention vector corresponding to one history sentence, where the emotional attention vector is used to represent an attention result corresponding to each emotional semantic tag in one history sentence:

and in the first calculation mode, the processing equipment adopts an attention mechanism, performs dot product attention calculation between the statement vector and each emotion semantic label vector to obtain an emotion attention vector representing the attention result of the historical statement to each emotion semantic label.

The processing equipment adopts an attention mechanism, determines an attention matrix of each emotion element label corresponding to a history sentence according to each obtained emotion semantic label vector and a sentence vector corresponding to the history sentence, and determines an emotion attention vector corresponding to the history sentence based on the attention matrix and each emotion semantic label vector.

In specific implementation, the following formula is adopted to perform dot product attention calculation on the statement vector and each emotion semantic label vector to obtain an emotion attention vector H^e：

H^e＝attention(Q,K,V)＝αV

Wherein Q, K, V are keywords in the attention mechanism, Q denotes Query, K and V denote Key-Value pairs (Key-Value pairs), and when performing dot product attention calculation, Q is H^u,K＝V＝X^e，H^uRepresenting a matrix of individual statement vectors, the dimension being t x d_hT represents the total number of input history statements, d_hA dimension representing the generated statement vector; x^eRepresenting a matrix of individual emotion semantic tag vectors with dimensions | E | × d_h，α∈R^t×|E|The obtained attention matrix of each historical sentence to each emotion semantic label can represent emotion distribution corresponding to each historical utterance. So that an emotional attention vector characterizing the attention result of a history statement for each emotional semantic tag can be determined based on the above formula.

In this way, by calculating the emotion attention vectors of the historical sentences for the respective emotion semantic tags, the emotion distribution of the historical sentences can be effectively represented by means of the emotion attention vectors, and compared with the processing of a multi-head attention mechanism, the processing process is relatively simple, and the processing pressure of the target information identification model can be reduced.

And a second calculation mode, namely, adopting a multi-head attention mechanism to respectively obtain a plurality of emotion attention sub-vectors corresponding to the historical sentences, and merging the obtained emotion attention sub-vectors into an emotion attention vector.

In the embodiment of the application, the processing device adopts a multi-head attention mechanism, multiple heads obtain multiple emotion attention sub-vectors corresponding to one history statement in parallel, and the obtained multiple emotion attention sub-vectors are combined into an emotion attention vector.

Referring to fig. 4c, which is a schematic flowchart illustrating a process of obtaining an emotional attention sub-vector by a processing device according to an embodiment of the present application, a process of determining each emotional attention sub-vector is described below with reference to fig. 4 c:

step 40211: and the processing equipment determines an attention matrix of each emotion semantic label corresponding to a history statement according to each obtained emotion semantic label vector and a statement vector corresponding to the history statement.

Specifically, after obtaining each emotion semantic label vector and a sentence vector corresponding to the historical sentence, the processing device may determine the attention matrix of each emotion semantic label corresponding to the historical sentence by using the following formula.

Where α 1 represents an attention matrix of each history sentence for each emotion semantic tag, Q, K, V are keywords in the attention mechanism, Q represents Query, K and V represent Key-Value pairs (Key-Value pairs), and Q ═ H is obtained when performing dot product attention calculation^u,K＝V＝X^e，H^uRepresenting a matrix of statement vectors, the dimensions being t x d_hT represents the total number of input history statements, d_hA dimension representing a statement vector; x^eRepresenting a matrix of individual emotion semantic tag vectors with dimensions | E | × d_h，

Is a parameter learned during the training of the target information recognition model, and m is the number of parallel heads. So that the attention matrix of the historical statement corresponding to each emotional semantic tag can be determined based on the formula.

Step 40212: and the processing equipment determines an emotional attention sub-vector corresponding to the historical statement based on the attention matrix, each emotional semantic tag vector and the configured parameters.

After the processing equipment determines the attention matrix, the following formula is adopted, and the emotion attention sub-vector corresponding to the historical statement is determined based on each emotion semantic label vector, the attention matrix and the configured parameters:

wherein the head_jThe sub-vector of emotional attention is characterized,

are parameters learned during the training of the target information recognition model, and for different sub-headers,

the value of j is 1, 2 and 3 … m to represent different heads in a multi-head attention mechanism; α 1 is the attention matrix determined in step 40211, and Q, K, V has the same definition as that in step 40211, and is not described herein again. So that the emotional attention sub-vector corresponding to one historical sentence can be determined based on the formula.

Further, the processing device combines the obtained emotion attention sub-vectors by using the following formula to obtain the emotion attention vector corresponding to the history statement:

H^e＝concat(head₁,…,head_m)+H^u

wherein H^eRepresenting emotional attention vectors, concat represents the merging processing of all emotional attention sub-vectors, m represents the number of heads in a multi-head attention mechanism, H^uAnd the statement vector matrix is composed of statement vectors corresponding to the historical statements. And correspondingly determining the emotional attention vector corresponding to one historical sentence based on the formula.

In this way, when determining the emotional attention vector through the multi-head attention mechanism, the potential emotional distribution in the historical sentences can be captured better in parallel, so that more accurate emotional attention vectors can be generated better corresponding to the historical sentences.

Step 4022: and the processing equipment splices the statement vector of one history statement and the corresponding emotion attention vector to obtain an emotion statement vector corresponding to the history statement.

Specifically, after determining the emotion attention vector corresponding to the history statement, the processing device may splice the statement vector corresponding to the history statement and the emotion attention vector corresponding to the history statement to obtain an emotion statement vector corresponding to the history statement, where each emotion statement vector is determined based on the statement vector corresponding to the corresponding history statement and the attention result between preset emotion semantic tags.

The specific splicing mode is shown as the following formula:

H＝[H^u；H^e]

wherein H represents a matrix composed of emotion statement vectors corresponding to the history statements respectively,

H^urepresenting a matrix formed by statement vectors corresponding to the historical statements respectively, and having dimensions of t × d_h；H^eRepresenting a matrix formed by emotional attention vectors corresponding to the historical sentences, and having the dimension of t multiplied by d_h. So that based on the above formula, the correspondence determination can be madeAnd the emotion statement vector corresponds to the history statement.

In this way, by means of the (second processing network) emotion attention network in the target information recognition model, interaction from historical sentences to emotion distribution can be established, and a plurality of potential emotion distributions can be captured in an auxiliary manner through a multi-head attention mechanism, so that richer and comprehensive emotion information can be obtained.

Step 403: the processing equipment adopts a target information identification model, establishes a graph network for representing the connection relation between each historical statement and each historical statement, and updates the emotional statement vector corresponding to each historical statement respectively based on the graph network and the trigger party of each historical statement.

In execution step 403, the processing device defines each history statement as a graph network using the object information recognition model.

It should be noted that, in this embodiment of the application, before the processing device inputs each historical statement including the statement to be replied into the target information identification model, indication information capable of indicating a trigger corresponding to each historical statement is respectively established, and based on the indication information, a historical statement set belonging to the same trigger in each historical statement can be determined, so that when a graph network is generated by subsequently using a third processing network in the target information identification model, the indication information can be acquired, and the trigger corresponding to each historical statement is determined.

Specifically, the processing device generates corresponding nodes for each history statement, and performs the following operations for each node: and establishing a connecting edge between one node and each node. And generating a graph network for representing the connection relation between any one historical statement and each historical statement according to each node and each connection edge established for each node.

It should be noted that, in consideration of the to-be-replied sentence containing the emotion semantic element input with the target object, the historical sentence with root relation may be the to-be-replied sentence itself, so when the graph network is generated, the present application establishes the connection edge between each node, and on one hand, facilitates the same operation in order to consider the situation that the root sentence of the to-be-replied sentence is the root sentence itself, and therefore, the self-loop edge is respectively established for each node in the established graph network.

Referring to fig. 4d, which is a schematic diagram of a graph network generated in the embodiment of the present application, it is assumed that 5 pieces of data of the history statements in the input target information recognition model are sorted according to the associated input time as follows: u1, u2, u3, u4 and u5, wherein in a graph network generated by the processing equipment by adopting the target information recognition model, a node corresponding to a history statement u1 in the graph network is v1, and an emotion statement vector h1 of the history statement u1 is adopted for initialization; the node corresponding to the history statement u2 is v2 and is initialized with the emotion statement vector h2 of the history statement u2, …, wherein,

corresponding to the emotion statement vector obtained in step 402, in the illustration of fig. 4d, the value of i is 1, 2, 3, 4, and 5. In the graph network illustrated in fig. 4d, each initialized node has a corresponding self-loop edge, and a connection edge exists between each node.

In the embodiment of the present application, in a session interaction process, it is considered that a change of target object emotion may be influenced by a history statement input by itself, and may also be influenced by an interaction object interacting with a target object, so that, for pertinence, according to a difference of trigger parties corresponding to the history statement, corresponding relationship types of the history statement are defined, which are, respectively, internal (Intra) relationship types: to characterize the impact of historical utterances by the target subject itself; interaction (Inter) relationship type: the impact of historical statements of interactive object inputs used to characterize the target object.

Further, the processing device determines, through the target information identification model, in the graph network, each node corresponding to each history statement, each trigger corresponding to each history statement, and each emotion statement vector corresponding to each statement, respectively, where the trigger of each history statement is a target object or an interactive object of the target object. And updating the emotion statement vectors corresponding to the historical statements respectively.

In specific implementation, referring to fig. 4e, which is a schematic flowchart illustrating a process of updating each node in the graph network in the embodiment of the present application, the following describes in detail an updating process of each node in the graph network with reference to fig. 4 e:

step 4031: the processing equipment determines each adjacent node which has a connection relation with one node in the graph network, classifies each adjacent node according to a trigger party corresponding to each adjacent node, and obtains a first type adjacent node which is pertinently configured with a first parameter set and a second type adjacent node which is pertinently configured with a second parameter set.

Specifically, the processing device determines a node, which is assumed to be the node v1, according to a connection relationship between nodes in the graph network, determines each adjacent node associated with the node v1, classifies each adjacent node according to a trigger corresponding to each adjacent node, obtains a first type of adjacent node corresponding to the node v1 and having the same trigger, obtains a second type of adjacent node corresponding to the node v1 and having different triggers, and configures corresponding parameter sets for the first type of adjacent node and the second type of adjacent node.

For example, with continued reference to FIG. 4d, the node representation marked by the dashed line is input by trigger A, and the node representation marked by the solid line is input by trigger B, then for node v1 initialized as h1, v1 includes { node 1, node 3, node 5} in the first set of neighboring nodes, and { node 2, node 4} in the second set of neighboring nodes.

Step 4032: the processing device respectively executes the following operations according to each type of adjacent nodes: determining a target parameter set corresponding to one type of adjacent nodes in the first parameter set and the second parameter set, respectively determining the edge weight between each adjacent node and one node according to the target parameter set, the emotion statement vector corresponding to one node and the emotion statement vector corresponding to each adjacent node in one type of adjacent nodes, and obtaining the weighting result of the emotion statement vector corresponding to each adjacent node based on each obtained edge weight and the target parameter set.

In specific implementation, the processing device obtains the weighted result of the emotion statement vectors corresponding to the adjacent nodes of the specified relationship type by using the target information identification model and adopting the following formula.

Wherein alpha is_ijrRepresenting the edge weight between the node i corresponding to hi and the node j corresponding to hj; h is_jAn emotion statement vector representing node j; j represents any adjacent node in the adjacent node set with the determined relation type r; n is a radical of^r(i) Representing a set of adjacency nodes consisting of individual adjacency nodes of relationship type r,

and

respectively representing a weight matrix and a vector under a relation type r, wherein the relation type r is determined according to a trigger party corresponding to each node, and the relation type r has two expressions, namely an adjacent node belonging to the same trigger party as the current node and an adjacent node belonging to a trigger party different from the current node; LRL denotes the LeakyReLU activation function; h is_irCharacterizing for node i, each of a set of contiguous nodes determined based on a relationship type rThe adjacent nodes respectively correspond to the weighted results of the emotion statement vectors.

It should be noted that, in the embodiment of the present application, a multi-head graph attention mechanism may also be selectively used to aggregate adjacent nodes of the same relationship type.

Step 4033: and the processing equipment determines the updated emotion statement vector of one node according to the weighting result corresponding to each adjacent node in each adjacent node.

In specific implementation, the processing device may determine the updated emotion statement vector of the node based on the weighted result of each neighboring node in each type of neighboring nodes by using the following formula.

Where r represents a defined relationship type, h_irAnd (4) representing a weighting result obtained after weighting the emotion statement vectors of the adjacent nodes of the corresponding types under the relation type r.

In this way, for a node, by aggregating emotion statement vectors of adjacent nodes of different types, updating of the node is realized, so that the relationship perception between different triggers can be realized by means of a graph attention mechanism, the relationship of the trigger corresponding to a history statement is represented, interaction from the history statement to the history statement is executed, and the emotion influence generated by the trigger is modeled by performing attention calculation on history statement representation with rich emotion, namely the emotion statement vectors.

Step 404: and the processing equipment adopts a target information identification model, and respectively obtains root relation identification results between each corresponding historical statement and the statement to be replied based on each updated emotional statement vector.

Referring to fig. 4f, which is a schematic flow chart illustrating determining a root cause relationship between a history statement and a statement to be replied in the embodiment of the present application, specific operations performed when step 404 is executed are described below with reference to fig. 4 f:

step 4041: and the processing equipment adopts a target information identification model to respectively determine the relative position vector between the statement to be replied and each historical statement.

In the embodiment of the application, considering that in the recon data set, the history statement corresponding to the emotional reason of the target statement is usually the history statement input adjacent to the target statement, in the application, the relative position between the history statements is used as a basis for root cause relationship identification, and a target information identification model is adopted to respectively determine the relative position vector between the statement to be replied and each history statement, wherein the relative position vector is used for representing the proximity degree of the statement input between the history statement and the statement to be replied.

When the relative position between the history statement and the statement to be replied is specifically determined, the following processing mode can be adopted to determine the relative position vector:

in the first mode, based on the sequencing serial number of the historical sentences, the phase positions between each historical sentence and the sentence to be replied are determined, and corresponding relative position vectors are generated corresponding to each relative position.

The processing equipment sorts the historical sentences according to the input arrangement sequence, the difference value between the sequence number corresponding to the sentence to be replied and the sequence number corresponding to each historical sentence is used as the relative position information between the sentence to be replied and each corresponding historical sentence, and each relative position vector obtained by adjusting each piece of relative position information after random initialization processing is obtained when the training of the target information recognition model is completed.

For example, assuming that there are P historical sentences currently, and there are P-1 historical sentences input before the sentence to be replied in the P sentences except the sentence to be replied, with the sentence to be replied as a reference, it can be determined that the relative position of each historical sentence is P ∈ { - (P-1), …, -1,0}, and then a random initialization algorithm is adopted to initialize each relative position, generate a relative position initial vector, synchronously adjust the phase position initial vector in the training process of the target information identification model, and finally obtain each adjusted relative position vector when the target information identification model is obtained.

And secondly, respectively adjusting the relative position vectors corresponding to the relative positions by adopting a radial product kernel algorithm.

Specifically, the processing device sorts the historical sentences according to the arrangement sequence of the input historical sentences, respectively uses the difference between the sequence numbers corresponding to the sentences to be replied and the sequence numbers corresponding to the historical sentences as the relative position information between the sentences to be replied and the corresponding historical sentences, and obtains the corresponding initial relative position vectors after the target information recognition model is trained and the relative position information after the random initialization is adjusted.

Further, the processing device performs the following operations for each initial relative position vector respectively: determining a relative position information corresponding to an initial relative position vector, obtaining a position weight between the relative position information and each obtained relative position information, and performing vector weighted fusion on each corresponding initial relative position vector based on each obtained position weight to obtain a corresponding relative position vector, wherein the position weight is determined in advance by adopting a radial product RBF kernel algorithm based on the difference between each piece of relative position information.

In particular implementations, the processing device may employ the following formula to model, for each relative position p, the interaction between the relative position p and the respective relative position. Therefore, the relative position vector r obtained in the first mode is improved by integrating the embedding of other relative positions_p：

Wherein q ∈ { - (P-1), …, -1,0} is one of the possible relative positions determined;

the variance of the RBF kernel is obtained, and a value is taken according to actual configuration requirements, wherein the value is 1. Weight K_p(q) is computed by the RBF kernel, characterizing the weight of influence of the relative position q on the relative position p, where K_pThe value of (q) is fixed and is not adjusted along with the training of the target information recognition model; r'_pAnd a relative position vector corresponding to the corrected relative position P.

Therefore, the position relation among the historical sentences is expressed by calculating the relative position vector, so that more considerable influence factors are blended in the identification process of the target identification model.

Step 4042: and the processing equipment adopts a target recognition model, and respectively determines root relation recognition results between each historical statement and the statement to be replied based on each updated emotion statement vector, each relative position vector and the statement vector corresponding to each historical statement.

In specific implementation, the processing device adopts a target information identification model, and determines root relation identification results between each history statement and the statement to be replied respectively based on the following formula.

Specifically, the processing device identifies the fourth processing network of the model through the target information, and aims at each historical statement u_iSplicing history statement u_iCorresponding statement vector

And an updated emotion statement vector g output through a third processing network of the target information recognition model_iTo obtain the history sentence u_iAnd finally representing the root relation in a corresponding way when the root relation is identified. At the same time, the relative position vector is applied to the final representation to fuse the relative position information. Finally, a full-connection network is used for classifying the spliced vectors, and the implementation formula is as follows:

wherein the content of the first and second substances,

is the word u_iProbability of emotional cause including emotional semantic element in the sentence to be replied, r_i-tRepresenting a contextual utterance u_iAnd a sentence u to be replied_tRelative position vector corresponding to the relative position between,

and b₂Are learnable parameters in a fully connected network. d_fAnd d_pRespectively representing the dimension of the hidden layer in the middle of the fully-connected network and the dimension of the relative position vector.

In this way, by means of the target information identification model, the emotion expressed by the historical sentences and the emotion influence caused by different triggers in the historical sentences can be considered at the global view angle, so that the emotion information in the historical sentences and the emotion influence caused by the historical sentences from different triggers are fused in the obtained identification result.

Further, the processing device can obtain a root cause information recognition result output by the target information recognition model for each historical statement, wherein the root cause information recognition result can represent the probability that each historical statement is the root cause of the emotional semantic elements in the to-be-replied statement.

In the embodiment of the application, a corresponding result threshold value can be set for the obtained root cause information identification result, the historical statement of which the root cause information identification result exceeds the result threshold value is used as the historical statement of which the root cause relation exists with the emotion semantic element in the statement to be replied, and the content of replying the statement to be replied is determined based on the screened historical statement.

For example, suppose that the to-be-replied sentence is determined to contain the "worry" emotional element according to the to-be-replied sentence input by the target object, the sentence to be replied can be considered to express the worries, so as to acquire the emotion of the worries in the current conversation process, each interactive statement interacted before the statement to be replied, and each interactive statement and the statement to be replied are collectively called each historical statement, after sequencing according to the input sequence of each historical statement in the current conversation, inputting a target information identification model, assuming that a result threshold value configured aiming at a root cause information identification result in advance is 0.5, based on the respective root cause information recognition results output by the target information recognition model, if it is determined that the root cause information recognition results for the two positions exceed the result threshold, and taking the two history sentences in the corresponding positions as the history sentences having root cause relations with the sentences to be replied.

For another example, assuming that it can be determined that "the pet XX of the target object has died" and "the target object does not accompany the pet when it died" in the history sentence in which the root relation with the sentence to be replied is determined, when the reply sentence is generated in a targeted manner, a corresponding comfort sentence may be added when the reply sentence is generated for the above-described reason, for example, "XX may want you not to go too after going back to waning star".

Therefore, the emotion root of the sentence to be replied can be effectively recognized in the interactive session context, the emotion perception capability of the real object is improved, the effectiveness and pertinence of sentence reply are improved, and meanwhile, the interactive experience of the real object can be assisted to be improved.

Based on the technical scheme provided by the application, the applicant carries out a comparative test on the target information identification model provided by the application and the model in the related technology based on the disclosed conversation emotion reason identification data set, the comparative test result is shown in table 2, and the applicant adopts the model provided by Poria and the like in the related technology as a baseline model. With macro-average F1(macro F1) as an evaluation index, based on the obtained macro F1 index result, it can be determined that the model proposed by the present application is significantly superior to the baseline model.

TABLE 2

Model (model)	macro F1
		Model under correlation technique	77.06
Object information recognition model	79.30

Based on the same inventive concept, the embodiment of the application also provides a device for identifying the root cause relationship between information. Referring to fig. 5, a schematic structural diagram of an apparatus 500 for identifying root cause relationships between information listed in the embodiment of the present application is shown, which may include:

an obtaining unit 501, configured to obtain a to-be-replied statement input by a target object, and obtain each history statement interacted with the target object within a specified history time period when it is determined that the to-be-replied statement includes an emotion semantic element;

a determining unit 502, configured to obtain, by using the trained target information recognition model, emotion statement vectors corresponding to the respective historical statements, where each emotion statement vector is determined based on a statement vector corresponding to a corresponding historical statement and an attention result between preset emotion semantic tags;

the establishing unit 503 is configured to establish a graph network for representing a connection relationship between each historical statement and each historical statement by using the target information identification model, and update the corresponding emotion statement vector of each historical statement based on the graph network and the trigger of each historical statement;

the identifying unit 504 obtains root relationship identification results between each corresponding history statement and the statement to be replied based on each updated emotion statement vector by using the target information identification model.

Optionally, when obtaining the emotion statement vectors corresponding to the history statements respectively by using the trained target information recognition model, the determining unit 502 is configured to:

sequencing each historical statement according to input time associated with each historical statement, generating statement vectors corresponding to each historical statement by adopting a trained target information recognition model based on each sequenced historical statement, and acquiring each emotion semantic label vector, wherein each emotion semantic label vector is obtained by adjusting a randomly initialized emotion semantic label in the training process of the target information recognition model;

adopting a target information identification model, and respectively executing the following operations aiming at each historical statement:

generating an emotional attention vector corresponding to a historical statement based on each emotional semantic tag vector and a statement vector corresponding to the historical statement, wherein the emotional attention vector is used for representing an attention result of the historical statement corresponding to each emotional semantic tag;

and splicing the statement vector of one historical statement and the corresponding emotion attention vector to obtain an emotion statement vector corresponding to one historical statement.

Optionally, when a trained target information recognition model is used and a sentence vector corresponding to each historical sentence is generated based on each sorted historical sentence, the determining unit 502 is configured to perform any one of the following operations:

for each history statement, the following operations are respectively executed: after initial identification information and termination identification information are respectively added at the initial position and the termination position of a historical statement, an output vector generated corresponding to the initial identification information in the historical statement is used as a statement vector corresponding to the historical statement by adopting a bidirectional encoder BERT network based on a converter in a target information recognition model;

adopting a trained target information recognition model, and aiming at each historical statement, respectively executing the following operations: and generating a statement vector corresponding to the historical statement based on the historical statement by adopting a long and short memory LSTM network in the target information identification model.

Optionally, when generating an emotion attention vector corresponding to one history statement based on each emotion semantic tag vector and a statement vector corresponding to one history statement, the determining unit 502 is configured to perform any one of the following operations:

determining an attention matrix of each emotion element label corresponding to a history sentence according to each obtained emotion semantic label vector and a sentence vector corresponding to the history sentence by adopting an attention mechanism, and determining an emotion attention vector corresponding to the history sentence based on the attention matrix and each emotion semantic label vector;

by adopting a multi-head attention mechanism, a plurality of emotion attention sub-vectors corresponding to one historical statement are obtained in parallel by multiple heads, the obtained emotion attention sub-vectors are combined into an emotion attention vector, and when each emotion attention sub-vector is determined, the following operations are executed: and determining an attention matrix of the historical statement corresponding to each emotion semantic label according to each obtained emotion semantic label vector and a statement vector corresponding to the historical statement, and determining an emotion attention sub-vector corresponding to the historical statement based on the attention matrix, each emotion semantic label vector and configured parameters.

Optionally, when a graph network for characterizing connection relationships between the historical statements and the historical statements is established, the establishing unit 503 is configured to:

Optionally, when updating the emotion statement vectors corresponding to the history statements respectively based on the graph network and the respective triggers of the history statements, the establishing unit 503 is configured to:

in a graph network, respectively determining each node corresponding to each historical statement, determining each trigger corresponding to each historical statement, and determining each emotional statement vector corresponding to each statement; the trigger of each history statement is a target object, or an interactive object of the target object;

for each node, the following operations are respectively executed:

determining each adjacent node which has a connection relation with one node in the graph network, and classifying each adjacent node according to a trigger party corresponding to each adjacent node to obtain a first type adjacent node which is pertinently configured with a first parameter set and a second type adjacent node which is pertinently configured with a second parameter set;

according to each type of adjacent nodes, the following operations are respectively executed: determining a target parameter set corresponding to one type of adjacent nodes in the first parameter set and the second parameter set, respectively determining the edge weight between each adjacent node and one node according to the target parameter set, the emotion statement vector corresponding to one node and the emotion statement vector corresponding to each adjacent node in one type of adjacent nodes, and obtaining the weighting result of the emotion statement vector corresponding to each adjacent node based on each obtained edge weight and the target parameter set;

and determining an updated emotion statement vector of a node according to the weighting result corresponding to each adjacent node in each adjacent node.

Optionally, when a target information identification model is adopted and root cause relationship identification results between each corresponding history statement and the statement to be replied are respectively obtained based on each updated emotion statement vector, the identification unit 504 is configured to:

respectively determining relative position vectors between the statements to be replied and the historical statements by adopting a target information identification model, wherein the relative position vectors are used for representing the input proximity degree of the statements between the historical statements and the statements to be replied;

and adopting a target recognition model, and respectively determining root relation recognition results between each historical statement and the statement to be replied based on each updated emotional statement vector, each relative position vector and the statement vector corresponding to each historical statement.

Optionally, when determining the relative position vector between the to-be-replied statement and each history statement, the identifying unit 504 is configured to:

sequencing according to the input sequence of each historical statement, taking the difference between the sequence number corresponding to the statement to be replied and the sequence number corresponding to each historical statement as the relative position information between the statement to be replied and each corresponding historical statement, and acquiring each relative position vector obtained by adjusting each piece of relative position information after random initialization processing when the training of the target information recognition model is completed;

alternatively, the first and second electrodes may be,

sequencing according to the input sequence of each historical statement, taking the difference value between the sequence number corresponding to the statement to be replied and the sequence number corresponding to each historical statement as the relative position information between the statement to be replied and each corresponding historical statement, and obtaining each piece of relative position information after the target information recognition model is trained and after the random initialization processing is adopted, obtaining each corresponding initial relative position vector;

for each initial relative position vector, the following operations are performed: determining a relative position information corresponding to an initial relative position vector, obtaining a position weight between the relative position information and each obtained relative position information, and performing vector weighted fusion on each corresponding initial relative position vector based on each obtained position weight to obtain a corresponding relative position vector, wherein the position weight is determined in advance by adopting a radial product RBF kernel algorithm based on the difference between each piece of relative position information.

Optionally, when determining that the to-be-replied sentence includes an emotion semantic element, the obtaining unit 501 is configured to perform any one of the following operations:

respectively acquiring a keyword set corresponding to each preset emotion semantic tag, performing semantic analysis on a to-be-replied sentence input by a target object based on each acquired keyword set, and determining that the to-be-replied sentence input by the target object contains corresponding emotion semantic elements when the to-be-replied sentence is successfully matched with at least one keyword;

and carrying out emotion analysis on the to-be-replied sentence input by the target object by adopting a preset emotion analysis model to obtain a classification result of each preset emotion semantic label, and determining that the to-be-replied sentence contains corresponding emotion semantic elements based on the classification result.

Optionally, when obtaining each history statement interacted with the target object in the specified history period, the obtaining unit 501 is configured to perform any one of the following operations:

historical conversation information with the target object, and when the time interval between the starting time of the current conversation and the ending time of the historical conversation is determined not to exceed a set time threshold, the time period between the starting time of the historical conversation and the current time is determined as a historical time period, and each historical statement interacted with the target object in the specified historical time period is obtained.

Optionally, the apparatus further includes a training unit 505, where the training unit 505 is configured to train to obtain the target information recognition model by using the following method:

acquiring a training sample set, wherein one training sample comprises at least one sample statement, the at least one sample statement comprises a sample interactive statement determined based on a target sample statement, the sample interactive statement is set in a historical period, the sample interactive statement comprises real root cause relation information between a labeled target sample statement and the at least one sample statement, and the target sample statement comprises an emotion semantic element;

performing multiple rounds of iterative training on an information recognition model to be trained by adopting a training sample set until a preset convergence condition is met, and taking the information recognition model output in the last round as a target information recognition model, wherein the following operations are executed in the process of one round of iterative training:

inputting each sample statement included in the obtained training sample into an information identification model to obtain predicted root cause relationship information corresponding to each sample statement, wherein the predicted root cause relationship information corresponding to each sample statement represents the probability that the corresponding sample statement is the root cause of the corresponding emotional semantic element;

and determining corresponding loss values based on the obtained predicted root relation information and the corresponding real root relation information by adopting a cross entropy loss function, and carrying out parameter adjustment on the information identification model based on the loss values.

Having described the method and apparatus for identifying root cause relationships between information according to an exemplary embodiment of the present application, an electronic device according to another exemplary embodiment of the present application is described next.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Based on the same inventive concept as the method embodiment described above, an electronic device is further provided in the embodiment of the present application, referring to fig. 6, which is a schematic diagram of a hardware composition structure of an electronic device to which the embodiment of the present application is applied, and the electronic device 600 may at least include a processor 601 and a memory 602. The memory 602 stores program codes, and when the program codes are executed by the processor 601, the processor 601 executes any one of the above steps of the method for identifying a root relationship between information.

In some possible implementations, a computing device according to the present application may include at least one processor, and at least one memory. Wherein the memory stores program code which, when executed by the processor, causes the processor to perform the steps of multimedia data recommendation according to various exemplary embodiments of the present application described above in the present specification. For example, the processor may perform the steps as shown in fig. 4 a.

A computing device 700 according to this embodiment of the present application is described below with reference to fig. 7. As shown in fig. 7, computing device 700 is embodied in the form of a general purpose computing device. Components of computing device 700 may include, but are not limited to: the at least one processing unit 701, the at least one memory unit 702, and a bus 703 that couples various system components including the memory unit 702 and the processing unit 701.

Bus 703 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The storage unit 702 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)7021 and/or cache storage unit 7022, and may further include Read Only Memory (ROM) 7023.

Storage unit 702 may also include a program/utility 7025 having a set (at least one) of program modules 7024, such program modules 7024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The computing apparatus 700 may also communicate with one or more external devices 704 (e.g., keyboard, pointing device, etc.), with one or more devices that enable objects to interact with the computing apparatus 700, and/or with any devices (e.g., router, modem, etc.) that enable the computing apparatus 700 to communicate with one or more other computing apparatuses. Such communication may occur via input/output (I/O) interfaces 705. Moreover, the computing device 700 may also communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 706. As shown, the network adapter 706 communicates with the other modules for the computing device 700 over a bus 703. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with computing device 700, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Based on the same inventive concept as the above method embodiments, various aspects of the keypoint detection method provided by the present application may also be implemented in the form of a program product, which includes program code for causing an electronic device to perform the steps in the multimedia data recommendation method according to various exemplary embodiments of the present application described above in this specification when the program product runs on the electronic device, for example, the electronic device may perform the steps as shown in fig. 4 a.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for identifying a root cause relationship between information is characterized by comprising the following steps:

2. The method of claim 1, wherein the obtaining, by using the trained target information recognition model, emotion sentence vectors corresponding to the respective history sentences comprises:

3. The method of claim 2, wherein the generating, by using the trained target information recognition model, a sentence vector corresponding to each of the historical sentences based on the sorted historical sentences comprises any one of the following operations:

4. The method of claim 2, wherein generating an emotional attention vector corresponding to one historical sentence based on each emotional semantic tag vector and the sentence vector corresponding to the one historical sentence comprises any one of the following operations:

5. The method of claim 1, wherein said establishing a graph network for characterizing connection relationships between said respective historical statements and said respective historical statements comprises:

6. The method of claim 5, wherein the updating the emotional statement vector corresponding to each of the historical statements based on the graph network and the trigger of each of the historical statements respectively comprises:

for each node, respectively executing the following operations:

7. The method according to any one of claims 1 to 6, wherein the obtaining, by using the target information recognition model, root cause relationship recognition results between corresponding history statements and the statements to be replied based on the updated emotion statement vectors respectively includes:

8. The method of claim 7, wherein said separately determining a relative position vector between the statement to be replied to and the respective historical statements comprises:

alternatively, the first and second electrodes may be,

9. The method according to any one of claims 1 to 6, wherein the determining that the sentence to be replied contains an emotional semantic element comprises any one of the following operations:

10. The method of any one of claims 1-6, wherein the obtaining each history statement interacted with the target object within the specified history period comprises any one of:

11. The method of any one of claims 1-6, wherein the object information recognition model is trained by:

12. An apparatus for identifying a root cause relationship between information, comprising:

13. The apparatus of claim 12, wherein when obtaining the emotion sentence vector corresponding to each of the historical sentences using the trained target information recognition model, the determining unit is configured to:

14. An electronic device, comprising a processor and a memory, wherein the memory stores program code which, when executed by the processor, causes the processor to carry out the steps of the method of any of claims 1 to 11.

15. Computer-readable storage medium, characterized in that it comprises program code for causing an electronic device to carry out the steps of the method according to any one of claims 1 to 11, when said program product is run on said electronic device.