CN111931507A

CN111931507A - Method, apparatus, medium, and device for acquiring a tuple set for implementing a session

Info

Publication number: CN111931507A
Application number: CN202010849539.XA
Authority: CN
Inventors: 王宏; 王贺青; 武晓飞
Original assignee: Beike Technology Co Ltd
Current assignee: Beike Technology Co Ltd
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2020-11-13

Abstract

A method, apparatus, medium, and electronic device for obtaining a set of tuples to implement a conversation are disclosed. The method comprises the following steps: acquiring a conversation statement of a first conversation party in a first conversation; identifying a first entity, a second entity and an entity relationship in a conversational sentence; the entity relationship is used for representing the relationship between the first entity and the second entity; performing combined verification on the first entity, the second entity and the entity relationship; if the combination check is passed, the first entity, the second entity and the entity relation are taken as a multi-tuple and stored in a multi-tuple set; wherein the set of tuples is used for: in the second conversation process, based on the current conversation statement of the second conversation party, selecting a corresponding multi-element group from the multi-element group set, and forming the current conversation statement of the first conversation party according to the selected multi-element group. The technical scheme provided by the disclosure is beneficial to realizing the conversation with the user with high efficiency and high quality and improving the conversation experience of the user.

Description

Method, apparatus, medium, and device for acquiring a tuple set for implementing a session

Technical Field

The present disclosure relates to computer technologies, and in particular, to a method for obtaining a tuple set for implementing a session, a device for obtaining a tuple set for implementing a session, a storage medium, and an electronic device.

Background

In many fields, a worker as a conversation party needs to have a conversation with a large number of users to solve the problems of the users or meet the needs of the users. For example, in the real estate domain, it is one of the main contents of the daily work of the real estate agent to contact a large number of users, and the real estate agent provides the users with the house they need by conversing with various users. In order to improve the efficiency and quality of the conversation between the conversation party and the user, a method of actively providing a corresponding conversation sentence for the conversation party to refer to by the conversation party can be adopted to assist in realizing the conversation with the user.

In addition, the man-machine interaction mode has the advantages of being capable of being in conversation with the user at any time and the like, so that the man-machine interaction mode is widely applied to the fields of customer service and the like.

No matter the conversation with the user is realized in an auxiliary mode or a human-computer interaction mode, how to realize the conversation with high efficiency and high quality is a significant technical problem.

Disclosure of Invention

The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides a method for acquiring a multi-tuple set for realizing a session, a device for acquiring the multi-tuple set for realizing the session, a storage medium and an electronic device.

According to an aspect of the embodiments of the present disclosure, there is provided a method for obtaining a multi-element group set for implementing a session, the method including: acquiring a conversation statement of a first conversation party in a first conversation; identifying a first entity, a second entity and an entity relationship in the conversational sentence; wherein the entity relationship is used for representing the relationship between a first entity and a second entity; performing combined check on the first entity, the second entity and the entity relationship; if the combination check passes, the first entity, the second entity and the entity relation are taken as a multi-tuple and stored in a multi-tuple set; wherein the set of tuples are to: in the second conversation process, based on the current conversation statement of the second conversation party, selecting a corresponding multi-element group from the multi-element group set, and forming the current conversation statement of the first conversation party according to the selected multi-element group.

In an embodiment of the present disclosure, the identifying a first entity, a second entity, and an entity relationship in the conversational sentence includes: providing the conversation statement to an entity relationship identification model, carrying out entity relationship identification processing on the conversation statement through the entity relationship identification model, and obtaining the entity relationship of the conversation statement according to the output of the entity relationship identification model; and identifying a first entity and a second entity in the conversation statement according to the conversation statement and the entity relationship thereof.

In yet another embodiment of the present disclosure, the training process of the entity relationship recognition model includes: respectively providing a plurality of first sentence samples in the training set to an entity relationship recognition model; wherein the first discourse sentence sample is provided with entity relationship labeling information; respectively carrying out entity relationship recognition processing on each first sentence sample through the entity relationship recognition model, and obtaining the entity relationship of each first sentence sample according to the output of the entity relationship recognition model; according to the entity relationship labeling information of each first sentence sample and the obtained entity relationship of each first sentence sample, performing loss calculation to obtain a first loss calculation result, and adjusting the network parameters of the entity relationship recognition model by using the first loss calculation result; respectively providing a plurality of second conversation statement samples in the training set to the entity relationship recognition model; wherein the second conversation statement sample is not provided with entity relationship labeling information; respectively carrying out entity relationship identification processing on each second conversation statement sample through the entity relationship identification model, and obtaining the entity relationship of each second conversation statement sample according to the output of the entity relationship identification model; and acquiring a correction result of the entity relationship of each second conversation sentence sample, forming a first conversation sentence sample according to the correction result, and storing the first conversation sentence sample in the training set.

In another embodiment of the present disclosure, the identifying a first entity and a second entity in the conversational sentence according to the conversational sentence and the entity relationship thereof includes: and providing the conversation statement and the entity relationship of the conversation statement to an entity recognition model, carrying out entity recognition processing on the conversation statement through the entity recognition model, and obtaining a first entity and a second entity in the conversation statement according to the output of the entity recognition model.

In another embodiment of the present disclosure, the training process of the entity recognition model includes: respectively providing a plurality of third conversation statement samples in the training set and entity relationship marking information thereof to the entity recognition model; the third conversation statement sample is provided with first entity marking information, second entity marking information and entity relation marking information; respectively carrying out entity identification processing on each third conversation statement sample and entity relation marking information thereof through the entity identification model, and obtaining a first entity and a second entity of each third conversation statement sample according to the output of the entity identification model; according to the first entity marking information and the second entity marking information of each third conversation statement sample, and the obtained first entity and the second entity of each third conversation statement sample, performing loss calculation to obtain a second loss calculation result, and adjusting the network parameters of the entity identification model by using the second loss calculation result; respectively providing a plurality of fourth conversation statement samples in the training set and entity relationship marking information thereof to the entity recognition model; the fourth conversation statement sample is provided with entity relation marking information, and is not provided with first entity marking information and second entity marking information; respectively carrying out entity identification processing on each fourth conversation statement sample and entity relationship marking information thereof through the entity identification model, and obtaining a first entity and a second entity of each fourth conversation statement sample according to the output of the entity identification model; and acquiring the correction results of the first entity and the second entity of each fourth conversational sentence sample, forming a third conversational sentence sample according to the correction results, and storing the third conversational sentence sample in a training set.

In yet another embodiment of the present disclosure, the performing a combined check on the first entity, the second entity, and the entity relationship includes: obtaining the type of the first entity and the type of the second entity; and if the type of the first entity, the type of the second entity and the entity relationship meet a preset combination rule, determining that the combination check of the first entity, the second entity and the entity relationship is passed.

In yet another embodiment of the present disclosure, the method further comprises: acquiring an alternative entity relation in the conversation statement; and if the first entity, the second entity and the entity relationship are not verified in a combined mode, and the type of the first entity, the type of the second entity and the alternative entity relationship meet a preset combination rule, taking the first entity, the second entity and the alternative entity relationship as a multi-tuple and storing the multi-tuple in a multi-tuple set.

In yet another embodiment of the present disclosure, the obtaining the alternative entity relationship in the conversational statement includes: and under the condition that the entity relationship of the conversation statement is obtained by utilizing the entity relationship recognition model, acquiring the alternative entity relationship in the conversation statement according to the confidence coefficient of each entity relationship in the conversation statement output by the entity relationship recognition model.

According to another aspect of the embodiments of the present disclosure, there is provided a method for acquiring a multi-element set for implementing a session, the method including: the obtaining conversation statement module is used for obtaining a conversation statement of a first conversation party in a first conversation; the identification module is used for identifying a first entity, a second entity and an entity relationship in the conversation statement; wherein the entity relationship is used for representing the relationship between a first entity and a second entity; the combined checking module is used for carrying out combined checking on the first entity, the second entity and the entity relation; the multi-tuple storage module is used for taking the first entity, the second entity and the entity relationship as a multi-tuple and storing the multi-tuple in a multi-tuple set if the combination check passes; wherein the set of tuples are to: in the second conversation process, based on the current conversation statement of the second conversation party, selecting a corresponding multi-component group, and forming the current conversation statement of the first conversation party according to the selected multi-component group.

In an embodiment of the present disclosure, the identification module includes: the entity relationship recognition sub-module is used for providing the conversation statement to an entity relationship recognition model, carrying out entity relationship recognition processing on the conversation statement through the entity relationship recognition model, and obtaining the entity relationship of the conversation statement according to the output of the entity relationship recognition model; and the entity identification submodule is used for identifying a first entity and a second entity in the conversation statement according to the conversation statement and the entity relationship thereof.

In yet another embodiment of the present disclosure, the apparatus further comprises a first training module configured to: respectively providing a plurality of first sentence samples in the training set to an entity relationship recognition model; wherein the first discourse sentence sample is provided with entity relationship labeling information; respectively carrying out entity relationship recognition processing on each first sentence sample through the entity relationship recognition model, and obtaining the entity relationship of each first sentence sample according to the output of the entity relationship recognition model; according to the entity relationship labeling information of each first sentence sample and the obtained entity relationship of each first sentence sample, performing loss calculation to obtain a first loss calculation result, and adjusting the network parameters of the entity relationship recognition model by using the first loss calculation result; respectively providing a plurality of second conversation statement samples in the training set to the entity relationship recognition model; wherein the second conversation statement sample is not provided with entity relationship labeling information; respectively carrying out entity relationship identification processing on each second conversation statement sample through the entity relationship identification model, and obtaining the entity relationship of each second conversation statement sample according to the output of the entity relationship identification model; and acquiring a correction result of the entity relationship of each second conversation sentence sample, forming a first conversation sentence sample according to the correction result, and storing the first conversation sentence sample in the training set.

In yet another embodiment of the present disclosure, the identifying entity sub-module is further configured to: and providing the conversation statement and the entity relationship of the conversation statement to an entity recognition model, carrying out entity recognition processing on the conversation statement through the entity recognition model, and obtaining a first entity and a second entity in the conversation statement according to the output of the entity recognition model.

In yet another embodiment of the present disclosure, the apparatus further includes: a second training module to: respectively providing a plurality of third conversation statement samples in the training set and entity relationship marking information thereof to the entity recognition model; the third conversation statement sample is provided with first entity marking information, second entity marking information and entity relation marking information; respectively carrying out entity identification processing on each third conversation statement sample and entity relation marking information thereof through the entity identification model, and obtaining a first entity and a second entity of each third conversation statement sample according to the output of the entity identification model; according to the first entity marking information and the second entity marking information of each third conversation statement sample, and the obtained first entity and the second entity of each third conversation statement sample, performing loss calculation to obtain a second loss calculation result, and adjusting the network parameters of the entity identification model by using the second loss calculation result; respectively providing a plurality of fourth conversation statement samples in the training set and entity relationship marking information thereof to the entity recognition model; the fourth conversation statement sample is provided with entity relation marking information, and is not provided with first entity marking information and second entity marking information; respectively carrying out entity identification processing on each fourth conversation statement sample and entity relationship marking information thereof through the entity identification model, and obtaining a first entity and a second entity of each fourth conversation statement sample according to the output of the entity identification model; and acquiring the correction results of the first entity and the second entity of each fourth conversational sentence sample, forming a third conversational sentence sample according to the correction results, and storing the third conversational sentence sample in a training set.

In another embodiment of the present disclosure, the combination check module includes: the obtaining type sub-module is used for obtaining the type of the first entity and the type of the second entity; and the combined checking sub-module is used for judging whether the type of the first entity, the type of the second entity and the entity relationship accord with a preset combined rule or not, and if the type of the first entity, the type of the second entity and the entity relationship accord with the preset combined rule, determining that the combined checking of the first entity, the second entity and the entity relationship passes.

In yet another embodiment of the present disclosure, the apparatus further includes: an obtaining alternative relation module, configured to obtain an alternative entity relation in the session statement; the combined check submodule is further configured to: if the type of the first entity, the type of the second entity and the entity relationship do not accord with a preset combination rule, judging whether the type of the first entity, the type of the second entity and the alternative entity relationship accord with the preset combination rule or not, and if the type of the first entity, the type of the second entity and the alternative entity relationship accord with the preset combination rule, determining that the combination check of the first entity, the second entity and the alternative entity relationship passes; the multi-tuple storage module is further configured to, when the second combination check submodule determines that the combination check of the first entity, the second entity, and the candidate entity relationship passes, store the first entity, the second entity, and the candidate entity relationship as a multi-tuple in a multi-tuple set.

In yet another embodiment of the present disclosure, the obtain alternative relationship module is further configured to: and under the condition that the entity relationship of the conversation statement is obtained by utilizing the entity relationship recognition model, acquiring the alternative entity relationship in the conversation statement according to the confidence coefficient of each entity relationship in the conversation statement output by the entity relationship recognition model.

According to yet another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above method.

According to still another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; the processor is used for reading the executable instructions from the memory and executing the instructions to realize the method.

Based on the method and the device for acquiring the multi-element group set for realizing the session provided by the embodiment of the disclosure, the first entity, the second entity and the entity relationship extracted from the session statement are utilized to form a multi-element group, and the entity relationship can clearly express the relationship between the first entity and the second entity, so that the multi-element group can better describe the structure of one session statement, and therefore, in the session process of the first session party and the second session party, the first entity and the entity relationship can be usually extracted from the session statement of the second session party, so that the multi-element group matched with the extracted first entity and the extracted entity relationship can be conveniently obtained by utilizing the extracted first entity and the extracted entity relationship, and the session statement of the first session party can be conveniently formed through the matched multi-element group to complete the session of the first session party and the second session party. Therefore, the technical scheme provided by the disclosure is beneficial to realizing the conversation with the user with high efficiency and high quality and improving the conversation experience of the user.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of one embodiment of a suitable scenario for use with the present disclosure;

FIG. 2 is a schematic diagram of yet another embodiment of a suitable scenario for the present disclosure;

FIG. 3 is a flowchart of an embodiment of a method for obtaining a set of tuples for implementing a session according to the present disclosure;

FIG. 4 is a flow diagram of one embodiment of obtaining entity relationships for conversational utterances according to the present disclosure;

FIG. 5 is a flow diagram of an embodiment of a training entity relationship recognition model of the present disclosure;

FIG. 6 is a flow diagram of one embodiment of obtaining a first entity and a second entity in a conversational sentence according to the present disclosure;

FIG. 7 is a flow diagram of one embodiment of training an entity recognition model according to the present disclosure;

FIG. 8 is a flow chart of an embodiment of a combinatorial check of the present disclosure;

FIG. 9 is a schematic structural diagram illustrating an embodiment of a device for acquiring a tuple used to implement a session according to the present disclosure;

fig. 10 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more than two and "at least one" may refer to one, two or more than two.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, such as a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Embodiments of the present disclosure may be implemented in electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with an electronic device, such as a terminal device, computer system, or server, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the disclosure

In the process of implementing the present disclosure, the inventor finds that the requirements, problems, speaking manners, and the like of users are often diversified, and the number of workers (such as a house broker, and the like) who have to converse with a large number of users is usually huge in daily needs through tools such as IM (Instant Messaging), and the speaking manners and service levels of the workers are also often diversified. In addition, in the application scene of the human-computer conversation, the standard conversation sentences formed by the preset basic data are directly provided for the user, which is favorable for better realizing the human-computer conversation and improving the human-computer conversation experience of the user.

Brief description of the drawings

One example of an application scenario for the techniques provided by the present disclosure to obtain a set of tuples to implement a session is shown in fig. 1.

In fig. 1, a human-machine conversation system is installed in a smart mobile phone 100 of a user 101. For example, a man-machine conversation system is installed in an APP (Application) in the smart mobile phone 100, and the APP can realize functions of searching for medical and medical inquiry, renting and selling a house, or ticketing. The following describes the application scenario by taking the APP as an APP for implementing a house renting and selling function as an example.

When the user 101 has a house renting and selling demand, the APP in the smart mobile phone 100 is opened, and the user 101 can input the house renting and selling demand in the APP in a voice or text input mode. Assuming that the user 101 currently inputs ". x. x.", the human-computer conversation system in this APP may extract a first entity and an entity relationship in ". x." and search a matching multi-tuple from the multi-tuple set based on the extracted first entity and entity relationship, and the human-computer conversation system may form a corresponding reply statement according to the first entity, a second entity, and an entity relationship in the searched matching multi-tuple set, and provide the reply statement to the user 101, thereby completing a turn of conversation. The man-machine conversation system in the APP can finally achieve the purpose of recommending houses meeting the requirements of the user 101 through a plurality of rounds of conversations with the user 101.

In the field of real estate, one example of an application scenario for the technology provided by the present disclosure to obtain a set of tuples for implementing a conversation is shown in fig. 2.

In fig. 2, assume that there are n1 users and n2 property brokers, respectively user 200_1, user 200_2, … …, user 200_ n1, property broker 210_1, property broker 210_2, … …, property broker 210_ n 2; and assume that the terminal devices of each user and each property broker are the terminal device 201_1, the terminal devices 201_2, … …, the terminal device 201_ n1, the terminal device 211_1, the terminal devices 211_2, … …, and the terminal device 211_ n2, respectively. Each user can perform a conversation with a corresponding property broker through an Instant Messaging (IM) function in an APP installed in the terminal device or a program such as a client. Of course, the user may also access the website provided by the real estate service company through the browser in the terminal device of the user, and perform a conversation with the corresponding real estate broker through the IM function in the corresponding webpage in the website.

Suppose that the user 200_1 has a demand for house renting, or house buying and selling. The user 200_1 can trigger the IM (Instant Messaging) function via his terminal 201_1, thus opening his session with a property broker, such as the property broker 210_ 2.

In the session process between the user 200_1 and the corresponding property broker 210_2, the IM function may extract a first entity and an entity relationship in a session sentence currently sent by the user 200_1, and search a matched tuple from the tuple set based on the extracted first entity and entity relationship, the IM function may form a corresponding session sentence according to the first entity, second entity, and entity relationship in the searched matched tuple (for example, determine corresponding components in the sentence according to the first entity, second entity, and entity relationship, and perform rich processing on each component to form a complete sentence, etc.), and provide the session sentence to the property broker 210_2, and of course, the IM function may also directly provide the matched tuple to the property broker 210_ 2; the property broker 210_2 can issue its current conversational sentence to the user 200_1 with reference to the conversational sentence currently provided by the IM functionality or the matching tuple, thereby completing a round of conversation. The user 200_1 and the property broker 210_2 can perform a plurality of rounds of conversations based on the above-described manner, thereby completing the session.

Exemplary method

Fig. 3 is a flowchart illustrating an embodiment of a method for obtaining a tuple set for implementing a session according to the present disclosure. As shown in fig. 3, the method of this embodiment includes the steps of: s300, S301, S302, and S303. The following describes each step.

S300, obtaining a conversation statement of a first conversation party in the first conversation.

The first session in this disclosure is typically a historical session of the first and second parties. I.e. a session that has ended before the current time.

The first party in the present disclosure may refer to a party providing a service, such as a house broker or a merchandise retailer or customer service, etc. As another example, during a human-machine conversation, the first conversation party is typically the machine party.

The second party in the present disclosure may refer to the party that receives the service. For example, there is a user who needs to rent and sell houses. For another example, during a human-machine conversation, the user side is engaged in a conversation with the machine side.

The method for acquiring the conversation statement of the first conversation party in the first conversation may be as follows: and extracting the speaking content of the first conversation party in a conversation turn from the first conversation, and performing clause processing on the speaking content, wherein each clause is a conversation sentence. For example, assuming that the utterance content of the first conversation partner in one conversation turn is ". about.

S301, identifying a first entity, a second entity and an entity relation in the conversation statement.

The first entity and the second entity in the present disclosure are both entity words, and the first entity and the second entity are typically two different forms of entity words, for example, the first entity may be a subject (subject) entity, and the second entity may be a subject (object) entity. The subject entity may refer to a core entity (e.g., subject, etc.) in the conversational sentence. The guest entity may refer to an entity (e.g., object, etc.) in the conversational sentence that is related to the core entity. Entity relationships in this disclosure are used to represent relationships between a first entity and a second entity. For example, an entity relationship may be information representing that a second entity is an attribute of a first entity. As another example, an entity relationship may be information that represents a geographic location where the second entity is the first entity. The first entity, the second entity and the entity relationship will usually be different according to the actual application scenario. The present disclosure does not limit the concrete representation of the first entity, the second entity and the entity relationship.

The present disclosure may utilize a previously successfully trained model to identify a first entity, a second entity, and an entity relationship in a conversational sentence.

S302, performing combined verification on the first entity, the second entity and the entity relationship.

The first entity, the second entity and the entity relationship in the present disclosure are combined together to form a triple, and the triple should conform to a preset combination rule. The method and the device can utilize preset combination rules to carry out combination verification on the first entity, the second entity and the entity relationship. The preset combination rule can be set according to the specific requirements of the actual application scene. For example, the combination rules may include: the rules set for the allowed combinations may also include: rules set for forbidden combinations, and the like.

And S303, if the combination check passes, storing the first entity, the second entity and the entity relationship as a multi-tuple in a multi-tuple set.

The multivariate group in this disclosure is at least a triad. For example, if the combined check passes, the present disclosure may store the first entity, the second entity, and the entity relationship as one triple in a triple set. For another example, if the combination check passes, the present disclosure may form a tuple higher than the triplet from the first entity, the second entity, the entity relationship, and at least one other element, and store the tuple in the tuple set.

The tuple in the present disclosure is used to form a conversational sentence of the first party during the second conversation. That is, the tuple set in the present disclosure is basic data for implementing a conversation, and the present disclosure can assist the first party in completing the conversation with the first party and can implement a human-machine conversation with the first party by forming the basic data. Specifically, in the second conversation process, based on the current conversation statement of the second conversation party, the corresponding multi-element group is selected from the multi-element group set, and the current conversation statement of the first conversation party is formed according to the selected multi-element group. The second session in this disclosure may refer to the current session of the first and second parties. That is, the present disclosure may utilize historical conversations to form the underlying data for implementing the conversation, such that during a conversation between a first party and a second party, the first party may be caused to complete its conversation with the second party based on the underlying data.

According to the conversation statement extraction method and device, the first entity, the second entity and the entity relation extracted from the conversation statement are utilized to form the multi-element group, the entity relation can clearly express the relation between the first entity and the second entity, therefore, the multi-element group can better describe the structure of the conversation statement, and therefore in the conversation process of the first conversation party and the second conversation party, the first entity and the entity relation can be usually extracted from the conversation statement of the second conversation party, therefore, the matched multi-element group can be conveniently obtained by utilizing the extracted first entity and entity relation, and the conversation statement of the first conversation party can be conveniently formed through the matched multi-element group, so that the conversation of the first conversation party and the second conversation party is completed. Therefore, the technical scheme provided by the disclosure is beneficial to realizing the conversation with the user with high efficiency and high quality and improving the conversation experience of the user.

In one optional example, the process of the present disclosure to identify a first entity, a second entity, and an entity relationship in a conversational statement may be: the method includes the steps of firstly performing entity relationship identification processing on a conversation sentence (for example, performing entity relationship identification processing by using a corresponding model) to obtain an entity relationship of the conversation sentence, and then performing entity identification processing on the conversation sentence (for example, performing entity identification processing by using a corresponding model) according to the obtained entity relationship of the conversation sentence to obtain a first entity and a second entity in the conversation sentence.

According to the method and the device, the entity relationship of the conversation statement is obtained firstly, and then the first entity and the second entity in the conversation statement are obtained by utilizing the entity relationship, so that the influence on entity identification due to the flexibility and the variety of the entities in the conversation statement is avoided, and the first entity and the second entity in the conversation statement can be conveniently and accurately positioned.

In one optional example, the present disclosure may utilize an entity relationship recognition model to recognize entity relationships of conversational utterances. One example of identifying entity relationships for conversational utterances by the present disclosure is shown in FIG. 4.

In fig. 4, S400, the utterance is provided to the entity relationship recognition model.

Alternatively, the conversational utterances of the present disclosure may be vector-based conversational utterances. For example, a segmentation tool is used to perform segmentation processing on a conversational sentence in a natural language form, so as to obtain all the segmentation words in the conversational sentence, and then, the present disclosure may obtain a segmentation vector of each segmentation Word (for example, a segmentation vector of each segmentation Word is obtained by using a Word2vec model), and form a conversational sentence provided for the entity relationship identification model by using all the segmentation vectors.

Alternatively, the participle vector in this disclosure may be represented using a multidimensional real vector. For example, a 128-dimensional or 200-dimensional real number vector may be used for representation. A participle vector in the present disclosure may represent a word (e.g., "on," "good," "bad," etc.) or a word (e.g., "house," "fitment," etc.).

Optionally, the entity relationship recognition model in the present disclosure may adopt a classification-based model such as a BERT (Bidirectional Encoder representation of a transformer) model or a FastText (fast text) model.

S401, entity relation recognition processing is carried out on the input conversation sentences through the entity relation recognition model.

Optionally, the entity relationship recognition processing performed by the entity relationship recognition model in the present disclosure on the input conversational sentence may be regarded as classification processing on the conversational sentence based on the category of the entity relationship.

S402, obtaining the entity relation of the input conversation statement according to the output of the entity relation recognition model.

Optionally, the present disclosure is preset with a plurality of entity relationships, and the entity relationship recognition model may output a confidence level for each preset entity relationship for the input conversation statement. For example, if n entity relationships are preset, the entity relationship recognition model outputs n confidence levels for the input conversational sentence. The entity relationship with the highest confidence coefficient can be used as the entity relationship of the input conversation statement. For example, assuming that the conversation sentence is ". times.primary schools near the cell are. times.primary schools and. times.primary schools", the entity relationship obtained by the entity relationship recognition model may be "school information". For another example, assuming that the conversation sentence is ". star. park is still around the cell", the entity relationship obtained by the entity relationship recognition model may also be "park information".

The entity relationship of the conversational sentence is usually difficult to obtain by using the rule matching and the like, even if the entity relationship of the conversational sentence is obtained by using the rule matching and the like, the phenomenon that the entity relationship of the conversational sentence cannot be flexibly and accurately obtained is easy to occur, and the process of setting the rule is complicated. The method and the device have the advantages that the entity relationship of the conversation statement can be conveniently and accurately obtained by utilizing the entity relationship recognition model (such as the BERT model).

In an alternative example, an example of a training process of the entity relationship recognition model in the present disclosure is shown in fig. 5.

In fig. 5, S500, a plurality of first sentence samples in the training set are provided to the entity relationship recognition model respectively.

Optionally, the present disclosure may read a certain number of first sentence samples from the training set according to the preset batch processing parameters. The first sentence sample in the training set in the present disclosure is provided with entity relationship annotation information. Under the condition that n different entity relationships are preset, the entity relationship marking information of a first sentence sample should show that the entity relationship of the first sentence sample is one of n types. In an initial stage of training, the number of first conversational sentence samples having entity relationship annotation information in the training set may be smaller than a proportion of all conversational sentence samples in the training set. The method and the device can gradually increase the number of the first sentence samples in the training set in the process of training the entity relationship recognition model.

S501, the entity relationship recognition model respectively carries out entity relationship recognition processing on each first sentence sample, and the entity relationship of each first sentence sample is obtained according to the output of the entity relationship recognition model.

Optionally, assuming that n entity relationships are preset in the present disclosure, the entity relationship recognition model may output n confidence levels for each input first sentence sample. For any first sentence sample, the disclosure may use an entity relationship corresponding to the highest confidence coefficient of the n confidence coefficients of the first sentence sample as the entity relationship of the first sentence sample.

S502, according to the entity relation labeling information of each first sentence sample and the obtained entity relation of each first sentence sample, loss calculation is executed to obtain a first loss calculation result, and the network parameters of the entity relation recognition model are adjusted by using the first loss calculation result.

Optionally, the method may further include performing loss calculation on the entity relationship annotation information of each first sentence sample and the obtained entity relationship of each first sentence sample by using a corresponding loss function, and performing back propagation on a loss calculation result in the entity relationship recognition model to adjust a network parameter of the entity relationship recognition model. Network parameters include, but are not limited to: a weight matrix, etc.

Alternatively, the present disclosure may proceed directly to S503 after performing S500-S502. The present disclosure may also proceed to S503 after performing S500-S502 repeatedly a plurality of times.

And S503, respectively providing the plurality of second conversation statement samples in the training set to the entity relationship recognition model.

Optionally, the second conversational statement sample in the present disclosure is not provided with entity relationship labeling information, and the second conversational statement sample may be a conversational statement sample that is not provided with any labeling information. The number of the second conversational sentence samples in the training set is gradually reduced in the training process of the entity relationship recognition model, and after the training of the entity relationship recognition model is completed, the second conversational sentence samples can not be contained in the training set any more.

S504, entity relationship recognition processing is carried out on each second conversation statement sample through the entity relationship recognition model, and the entity relationship of each second conversation statement sample is obtained according to the output of the entity relationship recognition model.

Optionally, assuming that n entity relationships are preset in the present disclosure, the entity relationship identification model may output n confidence levels for each input second conversation statement sample. For any second conversational sentence sample, the disclosure may use an entity relationship corresponding to the highest confidence coefficient of the n confidence coefficients of the second conversational sentence sample as the entity relationship of the second conversational sentence sample.

And S505, acquiring the correction result of the entity relationship of each second conversation sentence sample, forming a first conversation sentence sample according to the correction result of the entity relationship, and storing the first conversation sentence sample in a training set.

Optionally, the disclosure may provide the second conversational statement sample and the entity relationship thereof to the annotation platform, and correct the entity relationship of the second conversational statement sample inferred by the entity relationship identification model through the annotation platform. The method can set entity relation marking information for the second conversation statement sample according to the corrected entity relation, so that the second conversation statement sample is converted into the first conversation statement sample with the entity relation marking information. The second conversational sentence sample in the training set is gradually consumed up in the training process of the entity relationship recognition model, and the entity relationship of the second conversational sentence sample deduced by the entity relationship recognition model is usually more and more accurate through continuously training the entity relationship recognition model, so that the entity relationship needing to be corrected by the annotation platform is less and less.

S506, judging whether the entity relationship recognition model is trained successfully, and if the model is trained successfully, going to S507; if the untraining is successful, it may return to S500.

Optionally, the disclosure may determine whether the entity relationship recognition model is trained successfully by using the first sentence sample in the test set. For example, if the accuracy of the entity relationship inferred by the entity relationship recognition model for the first sentence sample in the test set reaches a predetermined requirement, the entity relationship recognition model is considered to be trained successfully, and if the accuracy of the entity relationship inferred by the entity relationship recognition model for the first sentence sample in the test set does not reach the predetermined requirement, the entity relationship recognition model is considered to be not trained successfully.

In addition, in S506, when it is determined that the entity relationship recognition model is not successfully trained, it may be further determined whether the total number of the first sentence samples in the training set used by the current training of the entity relationship recognition model has reached the predetermined number requirement. If the total number of the first sentence samples in the training set used has reached the requirement of the predetermined number, the process may not return to S500, but go to S507, stop the training process for the entity relationship recognition model, and the entity relationship recognition model is not trained successfully this time. If the total number of first sentence samples in the training set used has not reached the predetermined number requirement, S500 may be returned to.

And S507, finishing the training process of the entity relationship recognition model.

According to the method and the device, in the training process, the entity relationship recognition model is used for carrying out entity relationship recognition processing on the second conversation sentence samples in the training set, the recognition processing result is corrected, if the recognition processing result is submitted to a labeling platform for correction, entity relationship labeling information is set for the second conversation sentence samples by using the correction result, the first conversation sentence samples in the training set can be gradually enriched under the condition that the training set has a small number of first conversation sentence samples, so that the labeling amount of the conversation sentence samples is favorably reduced, and the training efficiency of the entity relationship model is favorably improved.

In one optional example, the present disclosure may utilize an entity recognition model to identify a first entity and a second entity in a conversational statement. One example of the present disclosure identifying a first entity and a second entity in a conversational sentence is shown in fig. 6.

In fig. 6, S600, the entity relationship between the utterance and the conversation sentence is provided to the entity recognition model.

Optionally, the conversational sentence in the present disclosure may be a conversational sentence based on a vector form, and specifically may be as described in S400 above, and will not be described in detail here. The entity relationships in the present disclosure may also be represented in the form of vectors. The method and the device can splice the entity relationship of the utterance and the conversation sentence, and provide the spliced result as input to the entity recognition model.

For example, assuming that the conversational sentence is ". about..

For another example, assuming that the conversation sentence is "× park around the cell", and the physical relationship of the conversation sentence is "park information", the present disclosure may perform a concatenation process on the vector-form "× park around the cell" and "park information", and provide the concatenation result as an input to the physical recognition model.

Optionally, the entity recognition model in the present disclosure may adopt a BERT model + CRF (conditional random fields) model, an LSTM (Long Short-Term Memory) + CRF model, an RNN (Recurrent Neural Networks), a CNN (Convolutional Neural Networks), or a CRF model. The present disclosure is not limited to the specific representation of the entity recognition model.

S601, entity recognition processing is carried out on the conversation sentence through the entity recognition model.

Optionally, the entity identification processing performed by the entity identification model in the present disclosure on the input conversation statement and the entity relationship may be regarded as two classification processing procedures, where one classification processing procedure is: and (3) classifying the first entity and the second entity, wherein the other classifying process is as follows: and respectively carrying out a BIO (begin-inside-outside) classification process on each word in the conversation sentence based on the input entity relation. Where B denotes a first word of the first entity or a first word of the second entity, I denotes a word of the first entity or the second entity other than the first word, and O denotes a word other than the first entity and the second entity.

S602, obtaining a first entity and a second entity in the conversation statement according to the output of the entity recognition model.

Optionally, the entity recognition model in the present disclosure may output a confidence of B-sub, a confidence of B-obj, a confidence of I-sub, a confidence of I-obj, and a confidence of O for each word in the conversational sentence, respectively. The method and the device can calculate the confidence degrees of B-sub, B-obj, I-sub, I-obj and O of all words in the conversational sentence, and determine the first entity and the second entity in the conversational sentence according to the calculation result.

In the previous example, assuming that the conversation sentence is "primary school and primary school", and the physical relationship of the conversation sentence is "school information", the first entity obtained according to the entity identification model of the present disclosure may be "primary school", and the second entity may be "primary school and primary school".

In the previous example, assuming that the conversation sentence is "× park around the cell", and the physical relationship of the conversation sentence is "park information", the first entity obtained by the present disclosure according to the physical identification model may be "× park", and the second entity may be "× park".

If the first entity and the second entity in the conversational sentence are obtained in a rule matching manner, the first entity and the second entity are usually obtained only when the entity in the conversational sentence is completely matched (i.e., strongly matched) with the content in the preset rule, which causes the recall rate of the first entity and the second entity to be low, for example, when a new entity not covered by the preset rule occurs in the conversational sentence, the entity is often not identified. The method and the device have the advantages that the entity recognition model (such as a BERT model + CRF model) is utilized, so that the first entity and the second entity of the conversation statement can be conveniently and accurately obtained.

In an alternative example, an example of a training process for an entity recognition model in the present disclosure is shown in FIG. 7.

S700, respectively providing the plurality of third conversation statement samples in the training set and entity relation marking information thereof to the entity recognition model.

Optionally, the present disclosure may read a certain number of third conversational sentence samples from the training set according to a preset batch processing parameter. The third conversation statement sample in the training set in the present disclosure is provided with the first entity labeling information, the second entity labeling information, and the entity relationship labeling information. In an initial stage of training, the number of third conversational sentence samples having the first entity labeling information, the second entity labeling information, and the entity relationship labeling information in the training set may be smaller than the proportion of all conversational sentence samples in the training set. The method and the device can gradually increase the number of the third conversational statement samples in the training set in the process of training the entity recognition model.

Optionally, the first entity annotation information and the second entity annotation information in the present disclosure are not based on annotation information in the form of BIO in general. The first entity annotation information and the second entity annotation information generally only indicate which entity in the third conversation statement sample is the first entity and which entity is the second entity.

And S701, respectively carrying out entity identification processing on each third conversation statement sample and entity relation marking information thereof by the entity identification model, and obtaining a first entity and a second entity of each third conversation statement sample according to the output of the entity identification model.

Optionally, the entity recognition model in the present disclosure may output, for each word in the third conversational statement sample, a confidence of B-sub, a confidence of B-obj, a confidence of I-sub, a confidence of I-obj, and a confidence of O, respectively. For any third conversational sentence sample, the method and the device can calculate the confidence degrees of B-sub, B-obj, I-sub, I-obj and O of all words in the third conversational sentence sample, and determine the first entity and the second entity in the third conversational sentence sample according to the calculation result.

S702, according to the first entity marking information and the second entity marking information of each third conversation statement sample, and the obtained first entity and the second entity of each third conversation statement sample, loss calculation is executed to obtain a second loss calculation result, and the network parameters of the entity identification model are adjusted by using the second loss calculation result.

Alternatively, since the first entity annotation information and the second entity annotation information in the present disclosure are not generally based on annotation information in the BIO form, in the loss calculation, the loss calculation is usually performed using the BIO-based confidence of the third conversational statement sample, therefore, for any third conversational sentence sample, the disclosure may first set the annotation information in the form of BIO for the third conversational sentence sample according to the first entity annotation information and the second entity annotation information of the third conversational sentence sample, and then, by using the corresponding loss function, the loss calculation is performed on the annotation information based on the BIO format respectively set for each third conversational sentence sample and the confidence based on the BIO of each third conversational sentence sample obtained in the step S701, and the loss calculation result is propagated reversely in the entity recognition model so as to adjust the network parameters of the entity recognition model. Network parameters of the entity recognition model include, but are not limited to: a weight matrix, etc.

And S703, respectively providing the plurality of fourth conversation statement samples in the training set and entity relationship marking information thereof to the entity recognition model.

Optionally, a fourth conversational statement sample in the disclosure is provided with entity relationship labeling information, and the fourth conversational statement sample is not provided with the first entity labeling information and the second entity labeling information. The number of the fourth conversational sentence samples in the training set is gradually reduced in the training process of the entity recognition model, and after the training of the entity recognition model is completed, the fourth conversational sentence samples can not be included in the training set any more.

Optionally, after the entity relationship labeling information is set for the second conversational sentence sample by using the process shown in fig. 5, if the second conversational sentence sample is not set with the first entity labeling information and the second entity labeling information, the second conversational sentence sample with the entity relationship labeling information may be used as a fourth conversational sentence sample. If the first sentence sample is not provided with the first entity annotation information and the second entity annotation information, the first sentence sample with the entity relationship annotation information can also be used as a fourth conversation sentence sample.

And S704, respectively carrying out entity identification processing on each fourth conversation statement sample and entity relationship marking information thereof by the entity identification model, and obtaining a first entity and a second entity of each fourth conversation statement sample according to the output of the entity identification model.

Optionally, the entity recognition model in the present disclosure may output, for each word in the fourth conversational statement sample, a confidence of B-sub, a confidence of B-obj, a confidence of I-sub, a confidence of I-obj, and a confidence of O, respectively. For any fourth conversational sentence sample, the method and the device can perform first entity confidence calculation and second entity confidence calculation on the confidence degrees of B-sub, B-obj, I-sub, I-obj and O of all words in the fourth conversational sentence sample, and determine the first entity and the second entity in the fourth conversational sentence sample according to the calculation result.

S705, obtaining the correction results of the first entity and the second entity of each fourth conversational sentence sample, forming a third conversational sentence sample according to the correction results, and storing the third conversational sentence sample in a training set.

Optionally, the fourth conversational sentence sample and the first entity and the second entity thereof may be provided to the annotation platform, and the first entity and the second entity inferred by the annotation platform entity recognition model are corrected. The method and the device can set the first entity marking information and the second entity marking information for the fourth conversation statement sample according to the corrected first entity and the second entity, so that the fourth conversation statement sample is converted into the third conversation statement sample with the entity relation marking information, the first entity marking information and the second entity marking information. The fourth conversational sentence sample in the training set may be gradually depleted during the training of the entity recognition model.

S706, judging whether the entity recognition model is trained successfully, and if the entity recognition model is trained successfully, going to S707; if the untraining is successful, S700 may be returned.

Optionally, the present disclosure may determine whether the entity recognition model is trained successfully by using a third conversational statement sample in the test set. For example, if the accuracy of the first entity and the second entity inferred by the entity recognition model for the third conversational sentence sample in the test set meets the predetermined requirement, the entity recognition model is considered to be trained successfully, and if the accuracy of the first entity and the second entity inferred by the entity recognition model for the third conversational sentence sample in the test set does not meet the predetermined requirement, the entity recognition model is considered to be not trained successfully.

In addition, in S706, when it is determined that the entity recognition model is not successfully trained, it may be further determined whether the total number of the third conversational sentence samples in the training set used by the current entity recognition model training has reached the predetermined number requirement. If the total number of the third conversational sentence samples in the used training set has reached the requirement of the predetermined number, S707 may be reached, the training process of the entity recognition model is stopped, and the entity recognition model is not successfully trained this time. If the total number of third conversational sentence samples in the training set used has not yet reached the predetermined number requirement, S700 may be returned.

And S707, finishing the training process of the entity recognition model.

According to the method, in the training process, the entity recognition model is used for carrying out first entity and second entity recognition processing on the fourth conversation statement sample in the training set, the recognition processing result is corrected, if the fourth conversation statement sample is submitted to the annotation platform for correction, the first entity annotation information and the second entity annotation information are set for the third conversation statement sample by using the correction result, the third conversation statement sample in the training set can be gradually enriched under the condition that the training set has fewer third conversation statement samples, and finally all the first conversation statement sample, the second conversation statement sample and the fourth conversation statement sample in the training set are all the third conversation statement samples, so that the annotation quantity of the conversation statement samples can be reduced to a greater extent, and the training efficiency of the entity model can be improved.

In an optional example, the present disclosure may perform a combined check on the first entity, the second entity, and the entity relationship using a preset combination rule based on a type to which the entity belongs. One specific example of the combinatorial check of the present disclosure is shown in fig. 8 below.

In fig. 8, S800 acquires a type to which the first entity belongs and a type to which the second entity belongs.

Optionally, the types of the first entity and the second entity in the present disclosure may be set according to the actual situation of the application field, for example, in the real estate field, the types of the first entity and the second entity may include: time, geographic location, administrative divisions, schools, cells, buildings, houses, etc. The present disclosure is not limited thereto.

Optionally, the present disclosure may determine the type to which the first entity belongs and the type to which the second entity belongs according to corresponding rules.

In the previous example, it is assumed that the conversation sentence is "× primary schools and elementary schools" in the neighborhood of the cell, and the first entity is "× cell", the second entity is "× primary school and elementary school", the type to which the first entity belongs is "cell", and the type to which the second entity belongs is "school".

In the previous example, it is assumed that the conversation sentence is "× park" around the cell, and the first entity is "× park", and the second entity is "× park", then the type to which the first entity belongs is "cell", and the type to which the second entity belongs is "entertainment facility".

S801, judging whether the type of the first entity, the type of the second entity and the entity relationship meet a preset combination rule or not, and if so, going to S802; if the preset rule is not met, go to S803.

Optionally, the preset combination rule of the present disclosure is generally set according to the actual situation of the application field. For example, for the real estate domain, the preset combination rules may include: a combination rule of 'cell, school information' and a combination rule of 'cell, park information'. In addition, the preset combination rule may include a forbidden combination rule, for example, the preset combination rule may use "cell, park, school information" as the forbidden combination rule. The present disclosure does not limit the specific contents of the preset combination rule.

S802, determining that the combination of the first entity, the second entity and the entity relationship passes the check, and storing the first entity, the second entity and the entity relationship as a triple in a triple set. To S806.

In the previous example, it is assumed that the predetermined combination rule includes: "cell, school information", and "house characteristics, time, house attributes", then ("xcell", "xelementary school and" xelementary school "and" park information ") may form a triplet, and" xcell "," xpark "and" park information "may form a triplet.

And S803, acquiring the alternative entity relationship in the conversation statement.

Optionally, the present disclosure may determine the alternative entity relationship in the conversational sentence according to the confidence of each entity relationship in the conversational sentence. For example, assuming that n entity relationships are preset, in the case of obtaining the entity relationship of the conversational sentence by using the entity relationship recognition model, the present disclosure may obtain n confidence degrees according to the output of the entity relationship recognition model, where the entity relationship with the highest confidence degree is used as the entity relationship of the conversational sentence, and the entity relationship with the next highest confidence degree may be used as the alternative entity relationship of the conversational sentence. In addition, the entity relationship with the confidence level which is the second highest and exceeds the preset confidence level can be used as the alternative entity relationship of the conversation statement. Furthermore, this disclosure also does not exclude the case where a plurality of entity relationships, except for the highest confidence level, whose confidence level exceeds a predetermined confidence level, are all used as candidate entity relationships. The present disclosure does not limit the number of alternative entity relationships.

S804, judging whether the type of the first entity, the type of the second entity and the alternative entity relation accord with a preset combination rule or not, and if so, going to S805; if the preset rule is not met, go to S806.

Optionally, when the candidate entity relationships are multiple, the present disclosure may determine, according to a sequence of confidence degrees of the candidate entity relationships from high to low, whether the type to which the first entity belongs, the type to which the second entity belongs, and the candidate entity relationships meet the preset combination rule, and once it is determined that the type to which the first entity belongs, the type to which the second entity belongs, and the candidate entity relationships meet the preset combination rule, it may not be determined whether the type to which the first entity belongs, the type to which the second entity belongs, and other candidate entity relationships meet the preset combination rule.

S805, determining that the combination of the first entity, the second entity and the alternative entity relation passes the check, and storing the first entity, the second entity and the alternative entity relation as a triple in a triple set.

And S806, ending the combined verification process.

Optionally, the disclosure may output a prompt message that the first entity, the second entity, the entity relationship, and the alternative entity relationship cannot form a triple.

According to the method and the device, whether the type of the first entity, the type of the second entity and the entity relation meet the preset combination rule or not is judged, so that the fact that the combination capable of forming the triple can be conveniently and rapidly determined is facilitated. By judging whether the type of the first entity, the type of the second entity and the alternative entity relationship meet the preset combination rule, the method is beneficial to avoiding the erroneous judgment of the entity relationship by the entity relationship recognition model and the influence on the formation of the triples.

Exemplary devices

Fig. 9 is a schematic structural diagram of an embodiment of a device for acquiring a tuple used for implementing a session according to the present disclosure. The apparatus of this embodiment may be used to implement the method embodiments of the present disclosure described above.

As shown in fig. 9, the apparatus of the present embodiment mainly includes: a get conversation statement module 900, a recognition module 901, a combination check module 902 and a tuple storage module 903. Optionally, the apparatus may further comprise: at least one of a first training module 904, a second training module 905, and an obtain alternative relationship module 906.

The get conversation statement module 900 is configured to get a conversation statement of a first conversation party in a first conversation.

The identifying module 901 is used for identifying a first entity, a second entity and an entity relationship in a conversational sentence. The entity relationship is used for representing the relationship between the first entity and the second entity.

Optionally, the identifying module 901 may include: an identify entity relationship sub-module 9011 and an identify entity sub-module 9012. The entity relationship identifying sub-module 9011 is configured to identify an entity relationship of the conversation statement. For example, the identify entity relationship sub-module 9011 may provide the utterance to an entity relationship identification model, perform entity relationship identification processing on the conversational sentence via the entity relationship identification model, and obtain the entity relationship of the conversational sentence according to the output of the entity relationship identification model. The entity identification submodule 9012 is configured to identify a first entity and a second entity in the conversation statement according to the conversation statement and the entity relationship thereof. For example, the identify entities sub-module 9012 may provide the entity relationship between the utterance and the conversational sentence to the entity identification model, perform entity identification processing on the conversational sentence via the entity identification model, and obtain the first entity and the second entity in the conversational sentence according to the output of the entity identification model.

The combined check module 902 is configured to perform combined check on the first entity, the second entity, and the entity relationship.

Optionally, the combination check module 902 may include: a type sub-module 9021 and a combined check sub-module 9022 are obtained. The obtaining type sub-module 9021 is configured to obtain a type to which the first entity belongs and a type to which the second entity belongs. The combined check sub-module 9022 is configured to determine whether the type to which the first entity belongs, the type to which the second entity belongs, and the entity relationship meet a preset combination rule, and determine that the combined check of the first entity, the second entity, and the entity relationship passes if the type to which the first entity belongs, the type to which the second entity belongs, and the entity relationship meet the preset combination rule.

The tuple storage module 903 is configured to store the first entity, the second entity, and the entity relationship as a tuple in the tuple set if the combination check of the combination check module 902 passes. The set of tuples in the present disclosure are used for: in the second conversation process, based on the current conversation statement of the second conversation party, selecting a corresponding multi-component group, and forming the current conversation statement of the first conversation party according to the selected multi-component group.

The first training module 904 is used for training the entity relationship recognition model. An example of the first training module 904 training the entity relationship recognition model may be:

first, the first training module 904 provides a plurality of first sentence samples in a training set to the entity relationship recognition model; and the first sentence sample is provided with entity relationship annotation information.

Next, the entity relationship recognition model performs entity relationship recognition processing on each first sentence sample, and the first training module 904 can obtain the entity relationship of each first sentence sample according to the output of the entity relationship recognition model.

Thirdly, the first training module 904 performs loss calculation according to the entity relationship labeling information of each first sentence sample and the obtained entity relationship of each first sentence sample, obtains a first loss calculation result, and adjusts the network parameters of the entity relationship recognition model by using the first loss calculation result.

Thereafter, the first training module 904 provides the plurality of second conversational sentence samples in the training set to the entity relationship recognition model, respectively. And the second conversation statement sample is not provided with entity relationship marking information.

Then, the entity relationship recognition model performs entity relationship recognition processing on each second conversational statement sample, and the first training module 904 obtains the entity relationship of each second conversational statement sample according to the output of the entity relationship recognition model.

Finally, the first training module 904 obtains the correction result of the entity relationship of each second conversational sentence sample, forms a first conversational sentence sample according to the correction result, and stores the first conversational sentence sample in the training set.

The second training module 905 is used for training the entity recognition model. An example of the training of the entity recognition model by the second training module 905 may be:

first, the second training module 905 provides the plurality of third session sentence samples in the training set and the entity relationship labeling information thereof to the entity identification model, respectively. The third conversation statement sample is provided with first entity marking information, second entity marking information and entity relation marking information.

Secondly, the entity recognition model performs entity recognition processing on each third conversational sentence sample and the entity relationship marking information thereof, and the second training module 905 obtains the first entity and the second entity of each third conversational sentence sample according to the output of the entity recognition model.

Thirdly, the second training module 905 performs loss calculation according to the first entity label information and the second entity label information of each third conversational sentence sample, and the obtained first entity and the second entity of each third conversational sentence sample, so as to obtain a second loss calculation result, and adjusts the network parameters of the entity identification model by using the second loss calculation result.

Then, the second training module 905 provides the plurality of fourth session sentence samples in the training set and the entity relationship labeling information thereof to the entity identification model respectively. And the fourth conversation statement sample is provided with entity relation marking information and is not provided with the first entity marking information and the second entity marking information.

Then, the entity recognition model performs entity recognition processing on each fourth conversational sentence sample and the entity relationship labeling information thereof, and the second training module 905 obtains the first entity and the second entity of each fourth conversational sentence sample according to the output of the entity recognition model.

Finally, the second training module 905 obtains the correction results of the first entity and the second entity of each fourth conversational sentence sample, forms a third conversational sentence sample according to the correction results, and stores the third conversational sentence sample in the training set.

The obtain alternative relationship module 906 is used to obtain alternative entity relationships in the conversational sentence. For example, in the case of obtaining the entity relationship of the conversational sentence by using the entity relationship identification model, the obtain alternative relationship module 906 may obtain the alternative entity relationship in the conversational sentence according to the confidence of each entity relationship in the conversational sentence output by the entity relationship identification model.

In the case that the apparatus of the present disclosure includes the module for obtaining an alternative relationship 906, the combined check submodule 9022 in the present disclosure is further configured to: if the type to which the first entity belongs, the type to which the second entity belongs, and the entity relationship do not meet the preset combination rule, it is determined whether the type to which the first entity belongs, the type to which the second entity belongs, and the alternative entity relationship meet the preset combination rule, and if the type to which the first entity belongs, the type to which the second entity belongs, and the alternative entity relationship meet the preset combination rule, the combination check sub-module 9022 determines that the combination check of the first entity, the second entity, and the alternative entity relationship passes. In this case, the multi-tuple storage module 903 is further configured to, when the second combination check submodule 9022 determines that the combination check of the first entity, the second entity, and the candidate entity relationship passes, store the first entity, the second entity, and the candidate entity relationship as a multi-tuple in the multi-tuple set.

The modules, sub-modules, units and operations specifically executed by the units included in the apparatus of the present disclosure may be referred to in the description of the above method embodiments, and are not described in detail here.

Exemplary electronic device

An electronic device according to an embodiment of the present disclosure is described below with reference to fig. 10. FIG. 10 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure. As shown in fig. 10, the electronic device 101 includes one or more processors 1011 and memory 1012.

The processor 1011 may be a Central Processing Unit (CPU) or other form of processing unit having the capability to obtain a set of tuples for implementing sessions and/or instruction execution capabilities, and may control other components in the electronic device 101 to perform desired functions.

Memory 1012 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory, for example, may include: random Access Memory (RAM) and/or cache memory (cache), etc. The nonvolatile memory, for example, may include: read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 1011 to implement the method of obtaining a tuple set for implementing a session and/or other desired functions of the various embodiments of the present disclosure described above. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 101 may further include: an input device 1013, an output device 1014, etc., which are interconnected by a bus system and/or other form of connection mechanism (not shown). Further, the input device 1013 may include, for example, a keyboard, a mouse, and the like. The output device 1014 can output various kinds of information to the outside. The output devices 1014 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device 101 relevant to the present disclosure are shown in fig. 10, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 101 may include any other suitable components, depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the method of obtaining a multi-tuple set for implementing a session according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification above.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the method of obtaining a multi-tuple set for implementing a session according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium may include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, and systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," comprising, "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects, and the like, will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A method of obtaining a set of tuples for implementing a session, comprising:

acquiring a conversation statement of a first conversation party in a first conversation;

identifying a first entity, a second entity and an entity relationship in the conversational sentence; wherein the entity relationship is used for representing the relationship between a first entity and a second entity;

performing combined check on the first entity, the second entity and the entity relationship;

if the combination check passes, the first entity, the second entity and the entity relation are taken as a multi-tuple and stored in a multi-tuple set;

wherein the set of tuples are to: in the second conversation process, based on the current conversation statement of the second conversation party, selecting a corresponding multi-component group from the multi-component group set, and forming the current conversation statement of the first conversation party according to the selected multi-component group.

2. The method of claim 1, wherein the identifying a first entity, a second entity, and an entity relationship in the conversational sentence comprises:

providing the conversation statement to an entity relationship identification model, carrying out entity relationship identification processing on the conversation statement through the entity relationship identification model, and obtaining the entity relationship of the conversation statement according to the output of the entity relationship identification model;

and identifying a first entity and a second entity in the conversation statement according to the conversation statement and the entity relationship thereof.

3. The method of claim 2, wherein the training process of the entity relationship recognition model comprises:

respectively providing a plurality of first sentence samples in the training set to an entity relationship recognition model; wherein the first discourse sentence sample is provided with entity relationship labeling information;

respectively carrying out entity relationship recognition processing on each first sentence sample through the entity relationship recognition model, and obtaining the entity relationship of each first sentence sample according to the output of the entity relationship recognition model;

according to the entity relationship labeling information of each first sentence sample and the obtained entity relationship of each first sentence sample, performing loss calculation to obtain a first loss calculation result, and adjusting the network parameters of the entity relationship recognition model by using the first loss calculation result;

respectively providing a plurality of second conversation statement samples in the training set to the entity relationship recognition model; wherein the second conversation statement sample is not provided with entity relationship labeling information;

respectively carrying out entity relationship identification processing on each second conversation statement sample through the entity relationship identification model, and obtaining the entity relationship of each second conversation statement sample according to the output of the entity relationship identification model;

and acquiring a correction result of the entity relationship of each second conversation sentence sample, forming a first conversation sentence sample according to the correction result, and storing the first conversation sentence sample in the training set.

4. The method of any one of claims 2 to 3, wherein said identifying a first entity and a second entity in the conversational sentence from the conversational sentence and its entity relationships comprises:

and providing the conversation statement and the entity relationship of the conversation statement to an entity recognition model, carrying out entity recognition processing on the conversation statement through the entity recognition model, and obtaining a first entity and a second entity in the conversation statement according to the output of the entity recognition model.

5. The method of claim 4, wherein the training process of the entity recognition model comprises:

respectively providing a plurality of third conversation statement samples in the training set and entity relationship marking information thereof to the entity recognition model; the third conversation statement sample is provided with first entity marking information, second entity marking information and entity relation marking information;

respectively carrying out entity identification processing on each third conversation statement sample and entity relation marking information thereof through the entity identification model, and obtaining a first entity and a second entity of each third conversation statement sample according to the output of the entity identification model;

according to the first entity marking information and the second entity marking information of each third conversation statement sample, and the obtained first entity and the second entity of each third conversation statement sample, performing loss calculation to obtain a second loss calculation result, and adjusting the network parameters of the entity identification model by using the second loss calculation result;

respectively providing a plurality of fourth conversation statement samples in the training set and entity relationship marking information thereof to the entity recognition model; the fourth conversation statement sample is provided with entity relation marking information, and is not provided with first entity marking information and second entity marking information;

respectively carrying out entity identification processing on each fourth conversation statement sample and entity relationship marking information thereof through the entity identification model, and obtaining a first entity and a second entity of each fourth conversation statement sample according to the output of the entity identification model;

and acquiring the correction results of the first entity and the second entity of each fourth conversational sentence sample, forming a third conversational sentence sample according to the correction results, and storing the third conversational sentence sample in a training set.

6. The method of any of claims 1-5, wherein the combined checking of the first entity, second entity, and the entity relationship comprises:

obtaining the type of the first entity and the type of the second entity;

and if the type of the first entity, the type of the second entity and the entity relationship meet a preset combination rule, determining that the combination check of the first entity, the second entity and the entity relationship is passed.

7. The method of claim 6, wherein the method further comprises:

acquiring an alternative entity relation in the conversation statement;

and if the first entity, the second entity and the entity relationship are not verified in a combined mode, and the type of the first entity, the type of the second entity and the alternative entity relationship meet a preset combination rule, taking the first entity, the second entity and the alternative entity relationship as a multi-tuple and storing the multi-tuple in a multi-tuple set.

8. An apparatus for acquiring a multi-tuple aggregation apparatus for implementing a session, wherein the apparatus comprises:

the obtaining conversation statement module is used for obtaining a conversation statement of a first conversation party in a first conversation;

the identification module is used for identifying a first entity, a second entity and an entity relationship in the conversation statement; wherein the entity relationship is used for representing the relationship between a first entity and a second entity;

the combined checking module is used for carrying out combined checking on the first entity, the second entity and the entity relation;

the multi-tuple storage module is used for taking the first entity, the second entity and the entity relationship as a multi-tuple and storing the multi-tuple in a multi-tuple set if the combination check passes;

wherein the set of tuples are to: in the second conversation process, based on the current conversation statement of the second conversation party, selecting a corresponding multi-element group from the multi-element group set, and forming the current conversation statement of the first conversation party according to the selected multi-element group.

9. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-7.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-7.