CN112270169A

CN112270169A - Dialogue role prediction method and device, electronic equipment and storage medium

Info

Publication number: CN112270169A
Application number: CN202011099233.3A
Authority: CN
Inventors: 潘政林; 白洁; 王毅
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2021-01-26
Anticipated expiration: 2040-10-14
Also published as: CN112270169B

Abstract

The application discloses a method and a device for predicting a role, electronic equipment and a storage medium, which relate to the artificial intelligence fields of natural language processing, intelligent voice, deep learning and the like, wherein the method comprises the following steps: acquiring the context of the dialogue from the text where the dialogue to be processed is located; acquiring first labels of text contents of each sentence in the context and/or the context, wherein the first labels are non-dialog or role information, and the roles are dialog speakers; and predicting the role information of the dialog according to the acquired context and the first label. By applying the scheme, the accuracy of the prediction result can be improved.

Description

Dialogue role prediction method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method and an apparatus for predicting a dialog role in the fields of natural language processing, intelligent speech, and deep learning, an electronic device, and a storage medium.

Background

The multi-role voiced novel is more and more concerned in the market, and accordingly, the role information of each sentence in the novel, namely, which speaker says the sentence is, needs to be marked (namely, predicted).

Currently, the role information of the dialog is usually predicted according to the context of the dialog, and the accuracy of the method is poor.

Disclosure of Invention

The application provides a method and a device for predicting a role, electronic equipment and a storage medium.

A method of predicting a character, comprising:

acquiring the context of the dialogue from the text where the dialogue to be processed is located;

acquiring first labels of text contents of each sentence in the context and/or the context, wherein the first labels are non-dialog or role information, and the roles are dialog speakers;

and predicting the role information of the dialogue according to the context and the first label.

An apparatus for predicting a white character, comprising:

the first acquisition module is used for acquiring the context of the dialogue from the text where the dialogue to be processed is located;

a second obtaining module, configured to obtain a first tag of each text content in an upper context and/or a lower context, where the first tag is non-dialog or role information, and the role is a dialog speaker;

and the predicting module is used for predicting the role information of the dialogue according to the context and the first label.

An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.

A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method as described above.

One embodiment in the above application has the following advantages or benefits: the character information of the dialog can be predicted by combining the context of the dialog and the first label of each sentence text content in the context and/or the text, and the like, wherein the first label is non-dialog or character information, and therefore the accuracy of the prediction result is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a flowchart illustrating a first embodiment of a method for predicting white characters according to the present application;

FIG. 2 is a flowchart illustrating a method for predicting white characters according to a second embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a structure of an embodiment of a white character predicting device 30 according to the present application;

fig. 4 is a block diagram of an electronic device according to the method of an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Fig. 1 is a flowchart illustrating a method for predicting a white character according to a first embodiment of the present application. As shown in fig. 1, the following detailed implementation is included.

In step 101, the context of the dialog is obtained from the text in which the dialog to be processed is located.

In step 102, a first tag of each sentence of text content in the context and/or the context is obtained, the first tag is non-dialog or role information, and the role is a dialog speaker.

In step 103, the role information of the dialog is predicted according to the obtained context and the first label.

It can be seen that, in the above method embodiment, the role information of the dialog may be predicted by combining the context of the dialog and the first tag of the text content of each sentence in the context and/or the text, so as to improve the accuracy of the prediction result.

In practical application, for a text to be processed, that is, a text where a dialog to be processed is located, the dialog in the text can be traversed, and each sentence dialog traversed is respectively used as the dialog to be processed.

The specific sequence of traversal is not limited. For example, traversal may be performed in a head-to-tail order. Accordingly, the first label of the text content of each sentence in the above text can be obtained. For any text content in the above sentence, if the text content is non-spoken, the first label of the text content is non-spoken, and if the text content is spoken, the first label of the text content is predicted character information of the spoken text.

In addition, the following ways may be used to identify the dialogs in the text: and taking the text content surrounded by the quotation marks as a dialogue, and/or determining whether the text content is the dialogue or not by utilizing a classification model obtained by pre-training aiming at any sentence of the text content.

The two recognition dialogue modes can be used independently or in combination, for example, for a certain sentence of text content surrounded by quotation marks, a classification model can be further utilized to determine whether the text content is dialogue, and the accuracy of recognition results is improved through double recognition.

The above identification method is only for illustration and is not used to limit the technical solution of the present application, and in practical applications, any feasible implementation method may be adopted. For example, the quotation marks can be other forms of symbols representing dialogue, and the like.

For the dialog to be processed, the context of the dialog may be obtained from the text in which the dialog resides. How to obtain the context of the dialog is likewise not limiting. For example, M sentence text content before the dialog (M sentence) and N sentence text content after the dialog can be used as the context of the dialog respectively, so as to obtain the context of the dialog, where M and N are positive integers, and the values may be the same or different, and the specific values may be determined according to actual needs. The text content of a continuous piece of content is composed of the text content of the dialogue upper part, the dialogue lower part and the dialogue lower part.

Besides the context of the dialogue, the first label of each sentence of text content in the context can be obtained, and the first label is non-dialogue or role information. That is, only the first tag of the text content of each sentence in the text may be acquired, and the first tags of the text contents of each sentence in the text may be acquired simultaneously. To distinguish from the tags appearing later, the tag of the text content of each sentence is referred to as a first tag.

As previously mentioned, assuming that the text is traversed in the order from beginning to end, the first label of the text content of each sentence in the above text can be obtained. For any text content in the above sentence, if the text content is non-spoken, the first tag of the text content is non-spoken, and if the text content is spoken, the first tag of the text content is predicted character information.

Further, the role information of the dialog can be predicted according to the acquired context and the first label.

Specifically, input information including a context of the dialog, a first label, and the dialog may be constructed, and the input information may be input into a pre-trained character prediction model, so as to obtain predicted character information of the dialog.

The specific form of the input information is not limited. For example, if the text content of 4 sentences including 4 text contents in the text content 1, the text content 2, the text content 3, and the text content 4, and the text content of 3 sentences including 3 text contents including text content 5, text content 6, and text content 7, respectively, and it is assumed that the first tag of the text content of 4 sentences in the text content is obtained, then "the first tag of the text content 1+ the first tag of the text content 2+ the text content 3+ the first tag of the text content 4+ the dialogue + the text content 5+ the text content 6+ the text content 7" may be used as input information.

After the input information is obtained, it can be input to the character prediction model, thereby obtaining the predicted character information of the dialogue.

Compared with the prior art, the method can enable the model to acquire more information, and actually, the information required by the character of one sentence of dialogue is judged, the information not only exists in the context, but also is contained in the first label of each sentence of text content in the context.

In addition, in the conventional method, the prediction effect is particularly poor in the case of continuous dialog tag transmission, such as continuous dialogue of a character, or alternate dialogue of a plurality of characters, and the method described in the present application has good applicability to various situations.

As previously described, the character prediction model may be pre-trained. Specifically, training samples may be constructed, each of which may respectively correspond to a sentence of dialogue in the text, where: and the input information and the second label corresponding to the dialogue, wherein the second label is the role information of the dialogue, and the role prediction model can be obtained by training with the training sample.

Based on the above description, fig. 2 is a flowchart of a second embodiment of the method for predicting a white character according to the present application. As shown in fig. 2, the following detailed implementation is included.

In step 201, the dialogs in the novel are traversed in a head-to-tail order.

In this embodiment, it is assumed that the text to be processed is a novel.

In addition, the text content surrounded by quotation marks can be used as the dialogue, and/or for any sentence of text content, whether the text content is the dialogue or not can be determined by utilizing a classification model obtained by training in advance.

In step 202, the processing shown in 202-207 is performed for each sentence dialogue traversed.

In step 203, the context of the dialog is obtained.

For example, M sentences of text content before the dialogue and N sentences of text content after the dialogue can be used as the context and the context of the dialogue, respectively, so as to obtain the context of the dialogue, where M and N are positive integers, and values may be the same or different.

In step 204, a first tag of the text content of each sentence in the above text is obtained, where the first tag is non-dialog or role information.

In step 205, input information is constructed that includes a context for the dialog, the first tag, and the dialog.

Assuming that the text content of the dialog includes 4 sentences, namely, the text content 1, the text content 2, the text content 3 and the text content 4, respectively, and the text content of the dialog includes 3 sentences, namely, the text content 5, the text content 6 and the text content 7, respectively, then the input information may be "the first label of the text content 1+ the first label of the text content 2+ the text content 3+ the first label of the text content 4+ the dialog + the text content 5+ the text content 6+ the text content 7".

In step 206, the input information is input into a character prediction model obtained by training in advance, and predicted character information of the dialogue is obtained.

Training samples can be constructed in advance, each training sample can respectively correspond to a sentence of dialogue in the text, and the method can include: and the input information and the second label corresponding to the dialogue, wherein the second label is the role information of the dialogue, and the role prediction model can be obtained by training with the training sample.

In step 207, the predicted character information is annotated with whitemarks.

In step 208, it is determined whether there is a next dialog, if so, step 203 is repeated for the next dialog, otherwise, step 209 is performed.

In step 209, the labeled novel is output, and the flow ends.

In the method embodiment, the dialogue role information can be predicted by combining the dialogue context, the first label and the like, so that the accuracy of the prediction result is improved, and the method is very quick and efficient, can finish the annotation of thousands of novels in a few minutes, and is an industrialized dialogue role prediction scheme.

It is noted that while for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application. In addition, for parts which are not described in detail in a certain embodiment, reference may be made to relevant descriptions in other embodiments.

The above is a description of method embodiments, and the embodiments of the present application are further described below by way of apparatus embodiments.

Fig. 3 is a schematic structural diagram of an embodiment of the white character prediction apparatus 30 according to the present application. As shown in fig. 3, includes: a first acquisition module 301, a second acquisition module 302, and a prediction module 303.

A first obtaining module 301, configured to obtain a context of a dialog from a text where the dialog to be processed is located.

The second obtaining module 302 is configured to obtain a first tag of each text content in the context, where the first tag is non-dialog or role information, and the role is a dialog speaker.

And the predicting module 303 is configured to predict the role information of the dialog according to the obtained context and the first tag.

The first obtaining module 301 may traverse the dialogs in the text, and respectively use each traversed sentence of the dialogs as the dialogs to be processed.

The specific sequence of traversal is not limited. For example, traversal may be performed in a head-to-tail order. Accordingly, the second obtaining module 302 may obtain the first label of the text content of each sentence in the above text. For any text content in the above sentence, if the text content is non-spoken, the first tag of the text content is non-spoken, and if the text content is spoken, the first tag of the text content is predicted character information.

In addition, the first obtaining module 301 may identify the dialogs in the text in the following manner: and taking the text content surrounded by the quotation marks as a dialogue, and/or determining whether the text content is the dialogue or not by utilizing a classification model obtained by pre-training aiming at any sentence of the text content.

The two ways of identifying the dialogs can be used separately or in combination, for example, for a text content of a sentence surrounded by quotation marks, a classification model can be further used to determine whether the text content is dialogs.

For the dialogue to be processed, the first obtaining module 301 may obtain the context thereof. For example, M sentences of text content before the dialogue and N sentences of text content after the dialogue can be used as the context and the context of the dialogue, respectively, so as to obtain the context of the dialogue, where M and N are positive integers, and values may be the same or different.

The second obtaining module 302 may further obtain a first tag of each text content in the context, where the first tag is non-dialog or role information. That is, the second obtaining module 302 may obtain only the first tag of the text content of each sentence in the text, and may obtain the first tags of the text contents of each sentence in the text at the same time.

As described above, assuming that the text to be processed is traversed in the order from beginning to end, the first label of the text content of each sentence in the above text can be obtained.

Further, the prediction module 303 may predict the role information of the dialog according to the obtained context and the first tag.

As shown in fig. 3, the apparatus may further include: the preprocessing module 300 is configured to construct training samples, where each training sample may respectively correspond to a sentence of dialogue in a text, and may include: and the input information corresponding to the dialogue and a second label are obtained, the second label is the role information of the dialogue, and the role prediction model is obtained by training with the training sample.

For a specific work flow of the apparatus embodiment shown in fig. 3, reference is made to the related description in the foregoing method embodiment, and details are not repeated.

In summary, by adopting the scheme of the embodiment of the apparatus of the present application, the role information of the dialog can be predicted by combining the context of the dialog and the first label of each sentence text content in the context, so that the accuracy of the prediction result is improved.

The scheme can be applied to the field of artificial intelligence, and particularly relates to the fields of natural language processing, intelligent voice and deep learning.

Artificial intelligence is a subject for studying a computer to simulate some thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a human, and has a hardware technology and a software technology, the artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology and the like.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 4 is a block diagram of an electronic device according to the method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 4, the electronic apparatus includes: one or more processors Y01, a memory Y02, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information for a graphical user interface on an external input/output device (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 4, one processor Y01 is taken as an example.

Memory Y02 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the methods provided herein.

Memory Y02 is provided as a non-transitory computer readable storage medium that can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods of the embodiments of the present application. The processor Y01 executes various functional applications of the server and data processing, i.e., implements the method in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory Y02.

The memory Y02 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Additionally, the memory Y02 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory Y02 may optionally include memory located remotely from processor Y01, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, blockchain networks, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device Y03 and an output device Y04. The processor Y01, the memory Y02, the input device Y03 and the output device Y04 may be connected by a bus or other means, and the bus connection is exemplified in fig. 4.

The input device Y03 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer, one or more mouse buttons, track ball, joystick, or other input device. The output device Y04 may include a display device, an auxiliary lighting device, a tactile feedback device (e.g., a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display, a light emitting diode display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific integrated circuits, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a cathode ray tube or a liquid crystal display monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area networks, wide area networks, blockchain networks, and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of predicting a character, comprising:

acquiring the context of the dialogue from the text where the dialogue of the pair to be processed is located;

2. The method of claim 1, further comprising:

and traversing the dialogs in the text, and taking each traversed sentence of the dialogs as the dialogs to be processed respectively.

3. The method of claim 2, wherein,

the traversing the dialogs in the text comprises: traversing the dialogues in the text from beginning to end;

the obtaining of the first tag of each sentence of text content in the context and/or the context includes: and acquiring a first label of each sentence of text content in the text.

4. The method of claim 1, further comprising:

taking the text content surrounded by quotation marks as dialogue;

and/or determining whether the text content is dialogue or not by utilizing a classification model obtained by pre-training aiming at any sentence of text content.

5. The method of claim 1, wherein the predicting the role information of the dialog from the context and the first label comprises:

constructing input information including the context, the first tag, and the dialog;

and inputting the input information into a role prediction model obtained by pre-training to obtain the predicted role information of the dialogue.

6. The method of claim 5, further comprising:

constructing training samples, wherein each training sample corresponds to a sentence of dialogue in the text, and the method comprises the following steps: the input information and the second label corresponding to the dialog are obtained, and the second label is the role information of the dialog;

and training by using the training sample to obtain the role prediction model.

7. An apparatus for predicting a white character, comprising:

8. The apparatus of claim 7, wherein,

the first acquisition module traverses the dialogue in the text, and each traversed sentence dialogue is respectively used as the dialogue to be processed.

9. The apparatus of claim 8, wherein,

the first acquisition module traverses the dialogues in the text from beginning to end;

the second obtaining module obtains the first label of each sentence text content in the text.

10. The apparatus of claim 7, wherein,

the first obtaining module is further configured to use the text content surrounded by the quotation marks as a dialogue, and/or determine whether the text content is a dialogue by using a classification model obtained through pre-training for any sentence of the text content.

11. The apparatus of claim 7, wherein,

and the prediction module constructs input information comprising the context, the first label and the dialogue, and inputs the input information into a role prediction model obtained by pre-training to obtain the predicted role information of the dialogue.

12. The apparatus of claim 11, further comprising:

the preprocessing module is used for constructing training samples, each training sample corresponds to a sentence of dialogue in the text, and the preprocessing module comprises: and the input information and the second label corresponding to the dialogue, wherein the second label is the role information of the dialogue, and the training sample is utilized to train to obtain the role prediction model.

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.