CN112270169B

CN112270169B - Method and device for predicting dialogue roles, electronic equipment and storage medium

Info

Publication number: CN112270169B
Application number: CN202011099233.3A
Authority: CN
Inventors: 潘政林; 白洁; 王毅
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2023-07-25
Anticipated expiration: 2040-10-14
Also published as: CN112270169A

Abstract

The application discloses a method, a device, electronic equipment and a storage medium for predicting a dialogue role, which relate to the artificial intelligence fields of natural language processing, intelligent voice, deep learning and the like, wherein the method can comprise the following steps: acquiring the context of the dialogue from the text where the dialogue to be processed is located; acquiring a first label of the text content of each sentence in the context and/or the text content in the context, wherein the first label is non-dialect or role information, and the role is a dialect speaker; and predicting the white role information according to the acquired context and the first label. By applying the scheme, the accuracy of the prediction result and the like can be improved.

Description

Method and device for predicting dialogue roles, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, an electronic device, and a storage medium for predicting a dialogue character in the fields of natural language processing, intelligent speech, and deep learning.

Background

Multi-character voiced novels are becoming more and more interesting in the market, and accordingly, character information about each sentence in the novels, i.e., which speaker said speech was uttered by, needs to be noted (i.e., predicted).

Currently, the role information of the dialect is usually predicted according to the context of the dialect, and the accuracy of this approach is poor.

Disclosure of Invention

The application provides a method, a device, electronic equipment and a storage medium for predicting a dialogue role.

A method of dialogue character prediction comprising:

acquiring the context of the dialect from the text of the dialect to be processed;

acquiring a first label of each text content in the context and/or the text content in the context, wherein the first label is non-dialect or role information, and the role is a dialect speaker;

and predicting the dialect character information according to the context and the first label.

A white-to-white character prediction apparatus comprising:

the first acquisition module is used for acquiring the context of the dialect from the text where the dialect to be processed is located;

a second obtaining module for obtaining the first label of each text content in the context and/or the text in the context, the first tag is non-dialect or character information, and the character is a dialect speaker;

and the prediction module is used for predicting the dialect role information according to the context and the first label.

An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as described above.

A computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

One embodiment of the above application has the following advantages or benefits: the method can be used for predicting the dialect character information by combining the context of the dialect and the first label of each sentence of text content in the context and/or the context, wherein the first label is non-dialect or character information, so that the accuracy of a prediction result is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

FIG. 1 is a flowchart of a first embodiment of a method for predicting a dialogue role according to the present application;

FIG. 2 is a flowchart of a second embodiment of a method for white-to-white character prediction according to the present application;

fig. 3 is a schematic structural diagram of an embodiment of a dialogue role prediction device 30 according to the present application;

fig. 4 is a block diagram of an electronic device according to a method according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is merely one association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

Fig. 1 is a flowchart of a first embodiment of a method for predicting a dialogue role according to the present application. As shown in fig. 1, the following detailed implementation is included.

In step 101, the context of the dialog is obtained from the text in which the dialog to be processed is located.

In step 102, a first tag of each text content of the context and/or the following sentence is obtained, where the first tag is non-dialect or character information and the character is a dialect speaker.

In step 103, the dialect character information is predicted according to the acquired context and the first tag.

It can be seen that in the above method embodiment, the character information of the dialect may be predicted in combination with the context of the dialect and the first tag of each sentence of text content in the context and/or the following, therefore, the accuracy of the prediction result is improved, and in addition, the text can be any form of text, such as novel, news, script and the like, and has universal applicability.

In practical application, for the text to be processed, namely the text in which the text to be processed is located, the dialect in the text can be traversed, and each traversed sentence dialect is respectively used as the dialect to be processed.

The particular order in which the traversals are performed is not limited. For example, the traversal may be performed in a head-to-tail order. Accordingly, the first tag of each sentence of text content in the above may be obtained. For any text content in the above text, if the text content is not dialect, the first label of the text content is not dialect, and if the text content is dialect, the first label of the text content is predicted character information of the dialect.

In addition, the dialect in the text may be identified in the following manner: the text content surrounded by the quotation marks is taken as the dialect, and/or for any sentence of text content, a classification model obtained by training in advance is utilized to determine whether the text content is the dialect.

The two recognition modes can be used independently or in combination, for example, a classification model can be further utilized to determine whether text content is dialect for a certain sentence of text content surrounded by quotation marks, and the accuracy of recognition results and the like are improved through double recognition.

The above method of identifying the spoken text is merely illustrative, and is not intended to limit the technical solution of the present application, and any feasible implementation may be adopted in practical applications. For example, the quotation marks may be other forms of symbols representing dialogues, or the like.

For the dialect to be processed, the context of the dialect can be obtained from the text in which the dialect is located. How to acquire the context of the white is also not limited. For example, the text content of M sentences before the dialogue and the text content of N sentences after the dialogue can be respectively used as the context and the context of the dialogue, so that the context of the dialogue is obtained, and the values of M and N are positive integers and can be the same or different, and the specific values can be determined according to actual needs. The text content of the text content is composed of the text content.

In addition to obtaining the context of the dialect, a first tag of the context and/or each sentence of text content in the context may be obtained, where the first tag is non-dialect or character information. That is, only the first tag of each sentence of text content in the above may be acquired, only the first tag of each sentence of text content in the below may be acquired, and the first tags of each sentence of text content in the above and the below may be acquired at the same time. To distinguish from tags that appear later, the tags of each sentence of text content are referred to as first tags.

As previously described, assuming that the text is traversed in a top-to-bottom order, the first tags of the text contents of each sentence above may be obtained. For any text content in the above text, if the text content is not dialect, the first label of the text content is not dialect, and if the text content is dialect, the first label of the text content is predicted character information.

Further, the dialect character information can be predicted according to the acquired context and the first label.

Specifically, input information including a context of the dialogue, a first tag, and the dialogue may be constructed, and the input information may be input into a character prediction model trained in advance, thereby obtaining predicted character information of the dialogue.

The specific form of the input information is not limited. For example, assuming that 4 sentences of text contents, namely, text content 1, text content 2, text content 3 and text content 4, are included in the text above, and 3 sentences of text contents, namely, text content 5, text content 6 and text content 7, are included in the text below, and assuming that the first tag of the 4 sentences of text contents above is acquired, then "first tag of text content 1+text content 1+first tag of text content 2+first tag of text content 3+first tag of text content 4+first tag of text content 5+text content 6+text content 7" may be taken as the input information.

After the input information is obtained, it can be input into a character prediction model, thereby obtaining predicted dialect character information.

Compared with the prior art, the method can enable the model to acquire more information, and actually, the information required for judging a sentence of the white character exists in the context, the method and the device are also embedded in the first label of each sentence of text content above and/or below, and accordingly, in the method and the device, when the character information of the current dialogue is predicted, the first label can be further combined, so that the accuracy of a prediction result is improved.

In addition, in the conventional method, the prediction effect is particularly poor for the case of continuous dialogue label transmission, such as continuous dialogues of characters, or alternate dialogues of multiple characters, and the method has good applicability to various cases.

As previously described, the character prediction model may be pre-trained. Specifically, training samples may be constructed, each training sample may correspond to a sentence in the text, where the training samples may include: the input information corresponding to the dialogue and the second label are the character information of the dialogue, and the character prediction model can be obtained by training the training sample.

Based on the above description, fig. 2 is a flowchart of a second embodiment of the method for predicting a white role according to the present application. As shown in fig. 2, the following detailed implementation is included.

In step 201, the dialogs in the novice are traversed in a head-to-tail order.

In this embodiment, it is assumed that the text to be processed is a novel.

In addition, text content surrounded by quotation marks may be used as dialogs, and/or, for any sentence of text content, a classification model trained in advance may be used to determine whether the text content is dialogs.

In step 202, processing shown in 202-207 is performed for each sentence traversed.

In step 203, a context for the dialog is obtained.

For example, M text contents before the dialogue and N text contents after the dialogue can be respectively used as the context and the context of the dialogue, so that the context of the dialogue is obtained, and the values of M and N are positive integers and can be the same or different.

In step 204, a first tag of each text content in the above sentence is obtained, where the first tag is non-dialect or character information.

In step 205, input information including the context of the dialogue, the first tag, and the dialogue is constructed.

Assuming that 4 sentences of text contents, namely text content 1, text content 2, text content 3 and text content 4, are included in the above text, and 3 sentences of text contents, namely text content 5, text content 6 and text content 7, are included in the below text, then "first tag of text content 1+text content 1+first tag of text content 2+first tag of text content 3+first tag of text content 4+first tag of text content 5+text content 6+text content 7" may be taken as input information.

In step 206, the input information is input into the character prediction model trained in advance, and the predicted diagonal character information is obtained.

Training samples may be constructed in advance, each training sample may correspond to a sentence in the text, where the training samples may include: the input information corresponding to the dialogue and the second label are the character information of the dialogue, and the character prediction model can be obtained by training the training sample.

In step 207, the predicted character information is labeled for the dialogue.

In step 208, it is determined whether there is a next white pair, if so, step 203 is repeatedly performed for the next white pair, otherwise, step 209 is performed.

In step 209, the annotated novel is output, ending the flow.

In the embodiment of the method, the context of the dialect, the first label and the like can be combined to predict the role information of the dialect, so that the accuracy of a prediction result is improved, and the dialect annotation of thousands of chapters can be completed in a very fast and efficient way usually only in a few minutes, thereby being an industrialized dialect role prediction scheme.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may take other order or occur simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application. In addition, portions of one embodiment that are not described in detail may be referred to in the description of other embodiments.

The foregoing is a description of embodiments of the method, and the following further describes embodiments of the device.

Fig. 3 is a schematic structural diagram of an embodiment of a dialogue role prediction device 30 described in the present application. As shown in fig. 3, includes: a first acquisition module 301, a second acquisition module 302 and a prediction module 303.

A first obtaining module 301, configured to obtain a context of a dialog from a text where the dialog to be processed is located.

A second obtaining module 302, configured to obtain a first tag of each text content of the context and/or the text content of the context, where the first tag is non-dialect or role information, and the role is dialect speaker.

And the predicting module 303 is configured to predict the dialect character information according to the acquired context and the first tag.

The first obtaining module 301 may traverse the dialect in the text, and take each traversed dialect as the dialect to be processed.

The particular order in which the traversals are performed is not limited. For example, the traversal may be performed in a head-to-tail order. Accordingly, the second obtaining module 302 may obtain the first tag of each sentence of text content above. For any text content in the above text, if the text content is not dialect, the first label of the text content is not dialect, and if the text content is dialect, the first label of the text content is predicted character information.

In addition, the first obtaining module 301 may identify the dialect in the text in the following manner: the text content surrounded by the quotation marks is taken as the dialect, and/or for any sentence of text content, a classification model obtained by training in advance is utilized to determine whether the text content is the dialect.

The two ways of identifying the dialect can be used separately or in combination, for example, for a text content of a sentence surrounded by a quotation mark, a classification model can be further utilized to determine whether the text content is the dialect.

For the dialect to be processed, the first acquisition module 301 may acquire its context. For example, M text contents before the dialogue and N text contents after the dialogue can be respectively used as the context and the context of the dialogue, so that the context of the dialogue is obtained, and the values of M and N are positive integers and can be the same or different.

The second obtaining module 302 may also obtain a first tag of the text content of the context and/or each sentence in the context, where the first tag is non-dialect or character information. That is, the second obtaining module 302 may obtain only the first tag of each text content in the above, may obtain only the first tag of each text content in the below, and may also obtain the first tags of each text content in the above and the below at the same time.

As described above, assuming that the text to be processed is traversed in the order from the beginning to the end, the first tag of each text content in the above may be acquired.

Further, the prediction module 303 may predict the dialect role information according to the acquired context and the first tag.

As shown in fig. 3, the apparatus may further include: the preprocessing module 300 is configured to construct training samples, where each training sample may respectively correspond to a sentence in a text, and may include: the input information corresponding to the dialogue and the second label are the character information of the dialogue, and the character prediction model is obtained through training by using the training sample.

The specific workflow of the embodiment of the apparatus shown in fig. 3 is referred to the related description in the foregoing method embodiment, and will not be repeated.

In summary, by adopting the scheme of the embodiment of the application device, the context of the dialect and the first label of each text content in the context and/or the following can be combined to predict the role information of the dialect, so that the accuracy of the prediction result and the like are improved.

The scheme can be applied to the field of artificial intelligence, and particularly relates to the fields of natural language processing, intelligent voice and deep learning.

Artificial intelligence is the subject of studying certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) that make a computer simulate a person, and has technology at both hardware and software levels, and artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, etc., and artificial intelligence software technologies mainly include computer vision technologies, speech recognition technologies, natural language processing technologies, machine learning/deep learning, big data processing technologies, knowledge graph technologies, etc.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 4, is a block diagram of an electronic device according to a method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 4, the electronic device includes: one or more processors Y01, memory Y02, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of a graphical user interface on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). In fig. 4, a processor Y01 is taken as an example.

The memory Y02 is a non-transitory computer readable storage medium provided in the present application. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the methods provided herein.

The memory Y02 serves as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present application. The processor Y01 executes various functional applications of the server and data processing, i.e., implements the methods in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory Y02.

The memory Y02 may include a memory program area that may store an operating system, at least one application program required for functions, and a memory data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory Y02 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory Y02 may optionally include memory located remotely from processor Y01, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, blockchain networks, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device Y03 and an output device Y04. The processor Y01, memory Y02, input device Y03, and output device Y04 may be connected by a bus or otherwise, for example in fig. 4.

The input device Y03 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and like input devices. The output means Y04 may include a display device, an auxiliary lighting means, a tactile feedback means (e.g., a vibration motor), and the like. The display device may include, but is not limited to, a liquid crystal display, a light emitting diode display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific integrated circuitry, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. The terms "machine-readable medium" and "computer-readable medium" as used herein refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices) for providing machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a cathode ray tube or a liquid crystal display monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local area networks, wide area networks, blockchain networks, and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of dialogue character prediction comprising:

traversing the dialogs in the text according to the sequence from beginning to end, and taking each traversed dialogs as the dialogs to be processed; acquiring the context of the to-be-processed dialogue from the text;

acquiring a first label of each sentence of text content in the context, wherein the first label is non-dialect or role information, and the role is dialect speaker;

predicting the dialect character information according to the context and the first label, wherein the predicting comprises the following steps: constructing input information comprising the context, the first label and the to-be-processed dialect, inputting the input information into a role prediction model to obtain the predicted to-be-processed dialect role information, wherein the role prediction model is obtained by training by using constructed training samples, and each training sample corresponds to a sentence of dialect in a text respectively, and the method comprises the following steps: the input information and the second label corresponding to the dialect are the role information of the dialect.

2. The method of claim 1, further comprising:

taking the text content surrounded by the quotation marks as the dialect;

and/or determining whether the text content is a dialogue or not by utilizing a classification model which is obtained by training in advance aiming at any sentence of the text content.

3. A white-to-white character prediction apparatus comprising:

the first acquisition module is used for traversing the dialogs in the text according to the sequence from beginning to end, and taking each traversed sentence of dialogs as the dialogs to be processed respectively; acquiring the context of the to-be-processed dialogue from the text;

the second acquisition module is used for acquiring a first label of each sentence of text content in the context, wherein the first label is non-dialect or role information, and the role is a dialect speaker;

the prediction module is configured to predict the dialect role information according to the context and the first tag, and includes: constructing input information comprising the context, the first label and the to-be-processed dialect, inputting the input information into a role prediction model to obtain the predicted to-be-processed dialect role information, wherein the role prediction model is obtained by training by using constructed training samples, and each training sample corresponds to a sentence of dialect in a text respectively, and the method comprises the following steps: the input information and the second label corresponding to the dialect are the role information of the dialect.

4. The apparatus of claim 3, wherein,

the first obtaining module is further configured to take text content surrounded by quotation marks as dialect, and/or determine, for any text content, whether the text content is dialect by using a classification model obtained by training in advance.

5. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-2.

6. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-2.