CN117149965A

CN117149965A - Dialogue processing method, dialogue processing device, computer equipment and computer readable storage medium

Info

Publication number: CN117149965A
Application number: CN202310974721.1A
Authority: CN
Inventors: 黄润乾; 张伟彬; 陈东鹏; 李亚桐
Original assignee: Voiceai Technologies Co ltd
Current assignee: Voiceai Technologies Co ltd
Priority date: 2023-08-03
Filing date: 2023-08-03
Publication date: 2023-12-01

Abstract

The application discloses a dialogue processing method, a dialogue processing device, a computer device and a computer readable storage medium, wherein the dialogue processing method can comprise the following steps: acquiring at least two dialogue contents of a user; analyzing at least two dialogue contents to obtain corresponding text contents; extracting dialogue keywords in text content; according to the dialogue keywords, associating at least two dialogue contents, and determining target texts and emotion intentions of users; a dialog is conducted with the user based on the target text and the emotional intent. Therefore, the technical scheme of the application improves the accuracy of identifying the chat content and emotion of the user by correlating the dialogue keywords in at least two dialogue contents, and increases the naturalness of man-machine dialogue with the user according to the target text and emotion intention. Therefore, the fluency and conversation effect of the man-machine conversation are improved.

Description

Dialogue processing method, dialogue processing device, computer equipment and computer readable storage medium

Technical Field

The present application relates to the field of artificial intelligence technology, and in particular, to a method and apparatus for processing a dialogue, a computer device, and a computer readable storage medium.

Background

With the continued development of artificial intelligence, a smooth conversation between people and machines has been enabled. In the related art, a manner of implementing a conversation between a person and a machine is to identify chat contents of a user and determine reply contents of the user in a conversation process. However, the accuracy of the results of identifying chat content of users is currently low. Accordingly, the reply content is difficult to reply to the user effectively, and the naturalness of the man-machine conversation is poor. Therefore, the problems of low accuracy of chat content recognition results, poor naturalness of man-machine conversation and the like cause the problems of low fluency and poor conversation effect when conversation is carried out between people and machines.

Disclosure of Invention

The embodiment of the application provides a dialogue processing method, a dialogue processing device, computer equipment and a computer readable storage medium, which can improve the fluency and dialogue effect of man-machine dialogue.

In a first aspect, an embodiment of the present application discloses a method for processing a dialogue, where the method includes:

acquiring at least two dialogue contents of a user;

analyzing the at least two dialogue contents to obtain corresponding text contents;

extracting dialogue keywords in the text content;

according to the dialogue keywords, associating the at least two dialogue contents, and determining target texts and emotional intentions of the users;

And carrying out dialogue with the user according to the target text and the emotion intention.

In a second aspect, an embodiment of the present application discloses a dialogue processing apparatus, including:

an acquisition unit for acquiring at least two dialogue contents of a user;

the identification unit is used for analyzing the at least two dialogue contents to obtain corresponding text contents;

an extracting unit, configured to extract dialogue keywords in the text content;

a first determining unit, configured to determine, according to the dialogue keyword, a target text and an emotional intent of the user, in association with the at least two dialogue contents;

and the dialogue unit is used for carrying out dialogue with the user according to the target text and the emotion intention.

In a third aspect, an embodiment of the present application discloses a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the processor invokes the computer program to implement the above-mentioned dialog processing method.

In a fourth aspect, an embodiment of the present application discloses a readable storage medium storing program code that is called by a processor to implement the above-described dialog processing method.

In a fifth aspect, the application discloses a computer program product comprising computer program code which, when run by a processor, causes the above-mentioned communication method to be performed.

In the embodiment of the application, at least two dialogue contents of a user are obtained; analyzing at least two dialogue contents to obtain corresponding text contents; extracting dialogue keywords in the text content; according to the dialogue keywords, associating at least two dialogue contents, and determining target texts and emotion intentions of users; and carrying out dialogue with the user according to the target text and the emotion intention. Therefore, the technical scheme of the application associates the dialogue keywords in at least two dialogue contents, improves the accuracy of identifying the chat contents and the emotion of the user, carries out dialogue with the user according to the target text and the emotion intention, and increases the naturalness of man-machine dialogue. Therefore, the fluency and conversation effect of the man-machine conversation are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an architecture of an interactive system disclosed in an embodiment of the present application;

FIG. 2 is a flow chart of a dialogue processing method disclosed in an embodiment of the application;

FIG. 3 is a schematic diagram of a scenario of dialog content disclosed in an embodiment of the present application;

FIG. 4 is a schematic diagram of a scenario of another dialog content disclosed in an embodiment of the present application;

FIG. 5 is a flow chart of another dialog processing method disclosed in an embodiment of the present application;

FIG. 6 is a schematic diagram of a dialogue processing device according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a computer device according to an embodiment of the present application;

fig. 8 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present application and are not to be construed as limiting the present application.

In order to enable those skilled in the art to better understand the solution of the present application, the following description will make clear and complete descriptions of the technical solution of the present application in the embodiments of the present application with reference to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Currently, when a man-machine conversation is realized through a natural language model (for example: chatGPT), the emotional intent of a user is generally determined by identifying the conversation content of the current section or sentence input by the user, and corresponding reply content is matched according to the emotional intent of the user. However, in the process of performing a man-machine conversation with a user, one conversation may include a plurality of sections or sentences of conversation content, and the emotion intention of the user determined by identifying the single section or sentence of conversation content of the user is often difficult to correctly express the conversation emotion and conversation requirement of the user. Accordingly, the reply content matched according to the emotional intent of the user cannot effectively and naturally reply to the dialogue content of the user. Therefore, the problems of low fluency and poor conversation effect when people and machines perform conversation are caused by the problems of low accuracy of a result of identifying chat content of a user in the human-machine conversation process, poor naturalness of the human-machine conversation and the like.

In order to solve the above problems, in the embodiment of the present application, at least two dialogue contents of a user are acquired; analyzing at least two dialogue contents to obtain corresponding text contents; extracting dialogue keywords in text content; according to the dialogue keywords, associating at least two dialogue contents, and determining target texts and emotion intentions of users; a dialog is conducted with the user based on the target text and the emotional intent. Therefore, the technical scheme of the application associates the dialogue keywords in at least two dialogue contents, improves the accuracy of identifying the chat contents and the emotion of the user, carries out dialogue with the user according to the target text and the emotion intention, and increases the naturalness of man-machine dialogue. Therefore, the fluency and conversation effect of the man-machine conversation are improved.

For a better understanding of the embodiments of the present application, a system architecture diagram is described below.

Referring to fig. 1, fig. 1 is a schematic diagram of a dialogue system 100 according to an embodiment of the application. As shown in fig. 1, the dialog system 100 may include a terminal device 110 and a server 120.

The terminal device 110 is installed with a client corresponding to the server 120. Terminal device 110 may include a mobile terminal device or a PC (personal computer ) terminal device. The terminal device 110 may comprise input means and output means. The input device may include a keyboard and a mouse. The input device may be used to receive dialog content of a user. The output device may include a display screen, a display. The output device may be used to display dialog content entered by a user. The terminal device 110 may receive the dialogue content of the user input by the input means through the client; the terminal device 110 may also return the corresponding reply content to the output device through the client to implement a dialogue with the user.

The server 120 may be used to execute the method of the present application, and in particular may be used to perform corresponding analysis on dialog content of different dialog types using different analysis modules, so as to obtain text content corresponding to the dialog content. And the method can also be used for correlating dialogue keywords of the personal content to determine the emotional intention of the user. A dialogue between a person and a machine may also be implemented. The server 120 may also include a storage module for storing the natural language processing model. The storage module can also store dialogue contents of picture types, video types, audio types and expression types, and text contents corresponding to the dialogue contents of the types. The server 120 may be connected to the terminal device 110.

As an embodiment, the user may input the dialog content at the client via the input device of the terminal device 110. The server 120 analyzes different types of dialogue contents based on different analysis modules according to the dialogue contents, and determines target texts and emotional intents of the dialogue contents. And determining corresponding reply content according to the target text and the emotion intention. The reply content is returned to the client of the terminal device 110, and is displayed through the output device of the terminal device 110.

It should be noted that the system architecture shown in fig. 1 is only an example, and the technical solution disclosed in the embodiment of the present application is not limited. With the evolution of the interactive system architecture and the appearance of new application scenes, the technical scheme disclosed by the embodiment of the application is applicable to similar technical problems.

Referring to fig. 2, fig. 2 is a flow chart of a dialogue processing method according to an embodiment of the application. The dialog processing method may include the following steps.

201. At least two dialogue contents of a user are acquired.

The application can be applied to the process of a user in a man-machine conversation with a natural language model (such as ChatGPT). The terminal device may acquire the dialogue content of the user through the input device, and may also send the dialogue content of the user to the server. After the server acquires the dialogue content of the user, at least two dialogue contents sent by the user can be acquired.

The user's conversation content may be conversation content (first reply content) that the user makes based on the initial chat content (e.g., greeting content triggered by a specific condition, topic content for hot spot information, etc.) issued by the system. The first reply content can be a reply of the user to the initial chat content sent by the system, and the first reply content has an association relationship with the initial chat content. The dialogue content of the user may also be a second reply content (system reply content) generated by the system according to the first reply content replied by the user, and then the dialogue content (third reply content) made again by the user according to the second reply content. The third reply content may be a reply of the second reply content of the system reply by the user, and the third reply content has an association relationship with the second reply content.

It should be noted that, based on the above-mentioned initial chat content (chat content actively sent by the system), the first reply content (first reply content of the user), the second reply content (second reply content of the system) and the third reply content (second reply content of the user), the man-machine conversation is realized based on the chat content actively sent by the system in the man-machine conversation process. The first reply content and the third reply content may include at least two dialogue contents (at least two dialogues), and correspondingly, the second reply content may also include at least two dialogue contents (at least two dialogues). The man-machine conversation process described above is an exemplary conversation process of the method of the present application, and the number of replies to the man-machine conversation is not particularly limited herein. A session may be ended when the system no longer receives the user's session content.

In some other embodiments, the conversation content of the user may also be an initial chat content sent by the user to the system, where the initial chat content sent by the user may be a conversation content expressing the view of the user, and may also be a conversation content carrying a problem, and the initial chat content sent by the user is not limited specifically herein.

It should be noted that, the dialogue content (the dialogue content that is actively issued, or the dialogue content that replies according to the content issued by the system) of the user may include a plurality of dialogue contents (a plurality of dialogues). The plurality of dialog contents of the user may be a plurality of dialog contents of the same dialog type but representing different emotion intentions of the user. The plurality of dialog contents representing different emotional intentions of the user may exhibit complex emotional intentions of the user.

In the related art, when identifying the dialogue content of a user, it is common to perform a one-to-one correspondence materialization analysis for each of a plurality of dialogues, and to determine the emotional intent of the user corresponding to a certain dialog. The emotional intention of the user corresponding to a certain dialogue is not identical to the complicated emotional intention of the user containing the dialogue contents of a plurality of dialogues. Thus, the accuracy of emotional intent recognition for the user is low due to the generalization of the user's complex emotional intent with the dialog segments in the dialog content.

In order to solve the above technical problems, in this embodiment, text contents corresponding to a plurality of dialogue contents are associated up and down by acquiring the plurality of dialogue contents of a user, so as to determine a target text that the user actually needs to express in the plurality of dialogue contents, and an emotional intention of the user is determined according to the target text determined after the association. And determining the emotion intention of the user according to the target text obtained by associating the dialogue keywords of at least two dialogues of the user, thereby improving the accuracy of the emotion intention of the user. Accordingly, dialogue is carried out with the user according to the emotion intention, and the smoothness and dialogue effect of man-machine dialogue are improved.

Illustratively, at least two conversational content uttered by the user to the system may be:

[ user 1]: i prefer this building style o-!

[ user 1]: however, it is not known where this place is over-what

[ user 1]: do you know where is you?

In the above-described plurality of dialogue contents of the user, each dialogue represents a different emotional intention of the user. The embodiment of the application can be used for carrying out up-and-down association on a plurality of dialogue contents in the plurality of dialogue contents and determining the text contents and the emotion intentions to be expressed by the user. The correctness of identifying the emotion intention of the user is improved, and the fluency of man-machine conversation is further improved.

In addition, at least two dialogue contents of the user may be dialogue contents of different dialogue types. The dialog type of the dialog content may be a text type, a picture type, an audio type, a video type, or an expression type. For a plurality of dialogue contents of different dialogue types, when the dialogue contents of a plurality of types are identified, the dialogue contents of a plurality of types characterizing a plurality of emotional intentions of the user are also ignored. Or the individual recognition of the multiple types of dialogue contents is performed to determine the emotional intention of the user corresponding to the single dialogue content. The technical scheme above causes the accuracy of the identified emotional intent of the user to be low when determining the emotional intent of the user according to the plurality of dialogue contents of the user.

In order to solve the technical problem, the embodiment can also perform a plurality of dialogue correlations according to a plurality of dialogue contents of corresponding dialogue types, determine the emotion intention of the user according to the correlated target text, and improve the accuracy of identifying the emotion intention of the user. Accordingly, dialogue is carried out with the user according to the emotion intention, and the smoothness and dialogue effect of man-machine dialogue are improved.

Illustratively, at least two conversational content of at least one type that the user issues to the system may also be:

[ user 2]: i prefer this building style o-!

[ user 2]: referring to FIG. 3, FIG. 3 is an exemplary dialog content of the picture type

[ user 2]: however, it is not known where this place is over-what

[ user 2]: referring to fig. 4, fig. 4 is exemplary dialog content for expression types

[ user 2]: do you know where is you?

Thus, after identifying the dialog type for at least two dialog content of at least one type issued by the user to the system, it may be determined that their corresponding dialog type may be:

[ user 2]: [ text ]

[ user 2]: [ picture ]

[ user 2]: [ text ]

[ user 2]: [ emoticon ]

[ user 2]: [ text ]

It should be noted that, each dialogue type may be associated with a corresponding analysis module, and after determining the dialogue type of the dialogue content, the analysis model corresponding to the dialogue type is used to analyze the dialogue content to obtain the corresponding text content.

202. And analyzing at least two dialogue contents to obtain corresponding text contents.

The dialogue content of the user comprises a plurality of different types, after at least two dialogue contents of the user are acquired, the server can identify the dialogue type of the dialogue content, and corresponding text content is determined according to an analysis module corresponding to the dialogue type.

The user's dialog content may be text-type dialog content, or may be picture-type and/or audio-type and/or video-type and/or expression-type dialog content. After acquiring at least two dialog contents including at least one type transmitted by the user, the at least two dialog contents of the at least one type of the user may be converted into corresponding text contents.

It should be noted that, for each type of dialogue content, the analysis module associates a corresponding analysis module, and the analysis module may execute an analysis manner corresponding to a different dialogue type of dialogue content. When dialogue contents of respective dialogue types appear in dialogue contents of a user, dialogue types of at least two dialogue contents of at least one type of the user may be identified. And identifying each dialogue content input by the user by using an analysis module corresponding to the dialogue type of the dialogue content, and obtaining text content corresponding to each dialogue content.

In an embodiment, when the dialogue type appears in the dialogue content of the user as the picture type, the picture analysis module is used for carrying out picture analysis on the dialogue content of the picture type. Identifying picture information in the dialogue content and determining a target object of at least one of the picture information. And determining a description text corresponding to the dialogue content of the picture type according to at least one target object in the picture information and the state information of the target object, and obtaining the text content corresponding to the dialogue content of the picture type.

Illustratively, the picture-type conversation content shown in FIG. 3 occurs when the user's conversation content appears. The picture information in the dialogue content is identified by using a preset picture analysis module, and a plurality of target objects can be determined to be contained in the picture information. For example: buildings, watersides, trees, people, etc. In addition, picture information in the dialogue content is identified, and it is also possible to determine state information of a plurality of target objects contained in the picture information. For example: ancient and ancient architecture, turbid water flow, depressed onion trees, and the people with the rare and the rare, etc. Therefore, according to at least one target object identified from the picture information and the state information corresponding to the at least one target object, the description text of the picture information can be determined, and the corresponding text content can be: the garden is a Chinese garden, is built on the water side, has many ancient and antique buildings, such as pavilions and archways, and has many tourists.

In another embodiment, in the case that the dialogue type appears in the dialogue content of the user as the video type, the video analysis module is used for video analysis on the dialogue content of the video type. And analyzing the video information in the dialogue content, and determining at least one action information of a target object and corresponding action information of the target object in at least one piece of video information. And determining the description text of the dialogue content of the video type according to at least one target object in the picture information and the action information of at least one target object to obtain the text content corresponding to the dialogue content of the video type.

In another embodiment, in the case that the dialog content of the user appears to have a dialog type of audio type, the audio analysis module is used to perform audio analysis on the dialog content of the audio type. And analyzing the audio information in the dialogue content, and converting the audio information into a corresponding text to obtain text content corresponding to the dialogue content of the video type. In particular, the manner in which the audio information is converted into the corresponding text information may be a speech recognition method implemented based on an acoustic model.

In another embodiment, in the case that the dialogue type appears as the expression type in the dialogue content of the user, it may be determined whether the dialogue content of the expression type is a dynamic expression type or a static expression type. If the dialogue content is of a dynamic expression type, video analysis is carried out on the dialogue content of the dynamic expression type by using a video analysis module, and text content corresponding to the dialogue content of the dynamic expression type is determined; if the dialogue content is of a static expression type, using a picture analysis module to carry out picture analysis on the dialogue content of the static picture type, and determining text content corresponding to the dialogue content of the static expression type.

Illustratively, when the user's dialog content appears with the static expression type dialog content shown in fig. 4. And identifying expression information in the dialogue content by using a preset picture analysis module, and determining text content corresponding to the expression information. And the user emotion expressed by the dialogue content of the static expression type can be determined by extracting the meaning expressed by the content in the text content corresponding to the expression information. For example: kittens, crying, represent difficulty.

At least two dialog content of at least one type entered by the user may be converted into corresponding text content.

Illustratively, at least two dialogue contents of at least one type sent by a user to a system are analyzed by using an analysis module corresponding to the dialogue type, and the obtaining of the corresponding text content may be:

[ user 2]: i prefer this building style o-!

[ user 2]: the garden is a Chinese garden, is built on the water side, has many ancient and antique buildings, such as pavilions and archways, and has many tourists.

[ user 2]: however, it is not known where this place is over-what

[ user 2]: kittens, crying, represent difficulty.

[ user 2]: do you know where is you?

203. Dialogue keywords in text content are extracted.

In text content corresponding to different types of dialogue content according to users, the server can extract dialogue keywords in the text content. The emotional intent of the user may also be determined from the dialog keywords in the text content. It is known that the text content may include a plurality of segments of text content corresponding to at least two dialogue contents, and the plurality of segments of text content may include different dialogue keywords. The different dialog keywords may express the same or different emotional intent of the user. Thus, a plurality of dialogue keywords in text content corresponding to at least two dialogue contents of at least one type can be extracted, and a plurality of emotional intentions corresponding to a plurality of sections of text content can be determined according to each dialogue keyword.

Specifically, extracting the corresponding dialogue key in the plurality of pieces of text content may be determining the corresponding dialogue key according to a syntax structure of the text content. For example, the text content may be a syntactic structure of a host-guest, an object in the syntactic structure may be extracted as a dialogue key for the piece of text content, and the dialogue key with the object as the dialogue key may characterize the reference information in the user's pieces of text content. Predicates in the syntactic structure may also be extracted as dialogue keywords for the piece of text content, which may characterize the emotion of the user.

Illustratively, the dialogue keywords corresponding to the text content obtained by at least two dialogue contents of at least one type sent by the user to the system may be:

[ user 2]: building style

[ user 2]: chinese garden

[ user 2]: places and places

[ user 2]: difficult to get over

[ user 2]: where to

204. And according to the dialogue keywords, at least two dialogue contents are associated, and the target text and the emotion intention of the user are determined.

The server can also effectively correlate the text contents according to the dialogue keywords, so as to integrate the text contents and obtain the corresponding target text. According to the dialogue keywords, text contents corresponding to at least two dialogue contents of at least one type of the user can be associated, and associated target text is obtained. Specifically, the text content corresponding to at least two dialogue contents is associated, and the mode of obtaining the target text can be that whether the reference information exists in the context in the text content or not is determined according to the dialogue keywords. If obvious indication information exists, key information extraction of text contents is achieved according to the indication information, and key information in the text contents is associated to obtain a target text.

The server may also determine the emotional intent of the corresponding user based on the target text. The emotional intent of the user corresponding to the target text can be determined by semantic understanding of the target text. It is also possible to determine whether the user emotion can be used as the user's emotional intent by extracting the user emotion in the dialogue keyword according to the mood in the target text.

For example, in the above text, the reference information in the context may be "building style" for "Chinese garden" and "place" for "Liujiang old town". The target text associated with the dialogue key words may be: i prefer the style of Chinese garden, but do not know where the ancient town of the willow is located. The emotional intent of the user determined from the target text may be: difficult to get through.

205. A dialog is conducted with the user based on the target text and the emotional intent.

After determining the target text of the user for the conversation and the emotional intent of the user, the server may generate corresponding reply content through the target text and the emotional intent of the user. The reply content may be a plurality of dialog content of a plurality of types. The server can send the reply content to the terminal equipment provided with the user terminal, and the user terminal of the terminal equipment displays the reply content through the output device so as to realize the dialogue with the user.

In the method embodiment depicted in fig. 2, at least two conversational content of a user is obtained; analyzing at least two dialogue contents to obtain corresponding text contents; extracting dialogue keywords in text content; according to the dialogue keywords, associating at least two dialogue contents, and determining target texts and emotion intentions of users; a dialog is conducted with the user based on the target text and the emotional intent. Therefore, the technical scheme of the application associates the dialogue keywords in at least two dialogue contents, improves the accuracy of identifying the chat contents and the emotion of the user, carries out dialogue with the user according to the target text and the emotion intention, and increases the naturalness of man-machine dialogue. Therefore, the fluency and conversation effect of the man-machine conversation are improved.

Referring to fig. 5, fig. 5 is a flow chart illustrating another dialogue processing method according to an embodiment of the application. The dialog processing method may include the following steps.

501. At least two dialogue contents of a user are acquired.

The specific implementation process of step 501 may be referred to the description of step 201 in the embodiment corresponding to fig. 2, and the detailed description will not be repeated here.

502. And determining the dialogue type corresponding to at least two dialogue contents, wherein the dialogue type is associated with a corresponding analysis module, and the analysis module executes a corresponding analysis mode.

The at least two dialogue contents of the user may be dialogue contents of the same dialogue type, or dialogue contents of different dialogue types. Each dialog content has its corresponding dialog type. It should be noted that the dialogue type of the dialogue content may include a text type, a picture type, a video type, an expression type, and an audio type.

Thus, after acquiring at least two dialog contents of a user, the dialog type of each dialog content can be first identified by the form identification module. Illustratively, when a conversation content is received, the conversation content may be represented by x, the identified conversation type of the conversation content may be represented by y, and S (x) =y, which may be a value in text, picture, video, emoticon, audio. By determining the value of y, the dialog type of the dialog content is determined.

503. And analyzing at least two dialogue contents by using an analysis module corresponding to the dialogue type to obtain corresponding text contents.

The text type dialog content is the most common dialog form in chat and dialog, and is the basic dialog type for dialog. The dialogue content of the text type can be realized by semantic understanding based on model recognition, so that the dialogue content of different dialogue types can be converted into the dialogue content of the corresponding text type, and the semantic understanding is performed according to the dialogue content of the text type based on a preset model, thereby realizing man-machine dialogue.

The picture type or video type conversation content may be used to supplement text type conversation content. The picture analysis module or the video analysis module can be called to identify image information or video information in the dialogue content, and the description text corresponding to the dialogue content of the picture type or the video type can be determined; the emotion type dialogue content can express emotion of a user, and emotion enhancement can be performed on the text type dialogue content. Corresponding user emotion can be determined by extracting emotion meanings in dialogue content of the expression type; the audio type dialogue content is a speech type expression form of the text type dialogue content, and the speech type dialogue content can be directly converted into the corresponding text content.

504. Integrating the text content with the text content in the dialogue content to obtain integrated text content.

It should be noted that, after at least two dialogue contents are analyzed, text type dialogue contents may be included, and text contents obtained after analysis may also be included. When text content obtained by converting dialogue content of other dialogue types except the text types in dialogue content of a user is overlapped, the dialogue content of the text type and the text content need to be integrated to obtain integrated text content.

505. And inquiring whether the reference content exists in the context of the integrated text content, wherein the reference content is at least two keywords with the same category and different ranges.

The picture type or video type conversation content may be used to supplement text type conversation content. Whether the reference content exists in the context of the integrated text content or not can be determined according to the subjects, objects, complements and the like in the complete syntactic structure of the description text by determining the description text corresponding to the dialogue content of the picture type or the video type. The emotion type dialogue content can express emotion of a user, and emotion enhancement can be performed on the text type dialogue content. Dialogue keywords expressing the emotion of the user can be determined by extracting emotion meanings in dialogue contents of the expression type; the audio type dialogue content is a speech type expression form of the text type dialogue content, so that the speech type dialogue content can be directly converted into the corresponding text content, and dialogue keywords of a syntax structure of the text content are extracted.

It should be noted that, keywords in which the content is in at least two different ranges of the same category are referred to. The reference content may include at least two keywords. The two keywords in the content have corresponding relations and can be keywords in the same category and different ranges. That is, the reference content may be two concepts of upper and lower levels, and the two keywords in the reference content may strengthen the importance of the keywords in the dialogue content. For example, a place name 1 and an instantiated place name 2 have a reference relationship. In the context of text content, reference content in which reference relationships exist is correspondingly present and cannot exist alone.

506. If the text content exists, extracting dialogue keywords in the integrated text content.

And if the text content exists, extracting the reference content from the integrated text content to obtain at least two corresponding dialogue keywords. When a corresponding reference content exists in the text content, the corresponding existing reference content can be extracted as a dialogue keyword. At least two text contents where at least two dialogue keywords are located are associated. And integrating at least two text contents according to at least two dialogue keywords to obtain a target text.

507. And according to the dialogue keywords, at least two dialogue contents are associated, and the target text and the emotion intention of the user are determined.

508. And carrying out dialogue with the user according to the target text and the emotion intention.

The specific implementation process of step 507 and step 508 may be referred to the description of step 204 and step 205 in the embodiment corresponding to fig. 2, and the detailed description will not be repeated here.

In the method embodiment depicted in FIG. 5, at least two conversational content of a user is obtained; analyzing at least two dialogue contents to obtain corresponding text contents; extracting dialogue keywords in text content; according to the dialogue keywords, associating at least two dialogue contents, and determining target texts and emotion intentions of users; a dialog is conducted with the user based on the target text and the emotional intent. Therefore, the technical scheme of the application associates the dialogue keywords in at least two dialogue contents, improves the accuracy of identifying the chat contents and the emotion of the user, carries out dialogue with the user according to the target text and the emotion intention, and increases the naturalness of man-machine dialogue. Therefore, the fluency and conversation effect of the man-machine conversation are improved.

It is to be understood that the same or corresponding information as in the different embodiments described above may be referred to each other.

Referring to fig. 6, fig. 6 is a schematic diagram of a dialogue processing device 600 according to an embodiment of the application. As shown in fig. 6, the dialog processing device 600 may include:

An obtaining unit 601, configured to obtain at least two dialogue contents of a user;

an identifying unit 602, configured to analyze the at least two dialogue contents to obtain corresponding text contents;

an extracting unit 603 for extracting dialogue keywords in the text content;

a first determining unit 604, configured to determine a target text and an emotional intent of the user according to the dialogue keyword associating the at least two dialogue contents;

a dialogue unit 605 for conducting a dialogue with the user according to the target text and the emotional intent.

In some embodiments, the dialog processing device 600 may further include:

a second determining unit 606, configured to determine a dialogue type corresponding to the at least two dialogue contents, where the dialogue types are associated with corresponding analysis modules, and the analysis modules execute corresponding analysis manners;

the identifying unit 602 is specifically configured to analyze the at least two dialogue contents by using an analysis module corresponding to the dialogue type, so as to obtain corresponding text contents.

In some embodiments, the identification unit 602 is specifically configured to:

if the dialogue type is a picture type, identifying picture information in the dialogue content by using a picture analysis module so as to determine at least one target object in the picture information;

And determining text content corresponding to the dialogue content according to the at least one target object.

In some embodiments, the identification unit 602 is specifically configured to:

if the dialogue type is a video type, analyzing video information in the dialogue content by using a video analysis module to determine a target object of at least one of the video information and action information of at least one of the target objects;

and determining text content corresponding to the dialogue content according to the at least one target object and the action information of at least one corresponding to the target object.

In some embodiments, the identification unit 602 is specifically configured to:

and under the condition that the dialogue type is the audio type, analyzing the audio information in the dialogue content by using an audio analysis module to obtain text content corresponding to the audio information.

In some embodiments, the dialog processing device 600 may further include:

an integrating unit 607, configured to integrate the text content with the text content in the dialogue content, to obtain an integrated text content;

the extraction unit 603 is specifically configured to:

inquiring whether a reference content exists in the context of the integrated text content, wherein the reference content is at least two keywords with the same category and different ranges;

And if the text content exists, extracting the reference content from the integrated text content to obtain at least two corresponding dialogue keywords.

In some embodiments, the first determining unit 605 is specifically configured to:

associating at least two text contents in which the at least two dialogue keywords are located;

integrating the at least two text contents according to the at least two dialogue keywords to obtain a target text;

and determining the emotional intention of the user according to the target text.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus and modules described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In the several embodiments disclosed herein, the modules may be electrically, mechanically, or otherwise coupled to each other.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

As shown in fig. 7, the embodiment of the present application further discloses a schematic structural diagram of a computer device 700, where the computer device 700 includes a processor 710 and a memory 720, and the memory 720 stores computer program instructions, and when the computer program instructions are called by the processor 710, the computer program instructions may actually perform the various method steps disclosed in the foregoing embodiments. It will be appreciated by those skilled in the art that the structure of the computer device shown in the drawings does not constitute a limitation of the computer device, and may include more or less components than those illustrated, or may combine certain components, or may be arranged in different components. Wherein:

processor 710 may include one or more processing cores. The processor 710 utilizes various interfaces and lines to connect various portions of the overall battery management system, and monitors the computer device as a whole by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 720, invoking data stored in the memory 720, performing various functions of the battery management system and processing data, and performing various functions of the computer device and processing data. Alternatively, the processor 710 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 710 may integrate one or a combination of several of a central processor 710 (Central Processing Unit, CPU), an image processor 710 (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 710 and may be implemented solely by a single communication chip.

The Memory 720 may include a random access Memory 720 (Random Access Memory, RAM) or a Read-Only Memory 720 (Read-Only Memory). Memory 720 may be used to store instructions, programs, code sets, or instruction sets. The memory 620 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, etc. The storage data area may also store data created by the computer device in use (e.g., phonebook, audio-video data, chat-record data), etc. Accordingly, memory 720 may also include a memory controller to disclose access to memory 720 by processor 710.

Although not shown, the computer device 700 may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 710 in the computer device loads executable files corresponding to the processes of one or more application programs into the memory 720 according to the following instructions, and the processor 710 executes the application programs stored in the memory 720, so as to implement the various method steps disclosed in the foregoing embodiment.

As shown in fig. 8, an embodiment of the present application further discloses a computer readable storage medium 800, where the computer readable storage medium 800 stores computer program instructions 810, and the computer program instructions 810 may be called by a processor to perform the method described in the above embodiment.

The computer readable storage medium may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium comprises a Non-volatile computer readable storage medium (Non-Transitory Computer-Readable Storage Medium). The computer readable storage medium 800 has storage space for program code to perform any of the method steps described above. The program code can be read from or written to one or more computer program products. The program code may be compressed, for example, in a suitable form.

According to one aspect of the present application, a computer program product or computer program is disclosed, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods disclosed in the various alternative implementations disclosed in the above embodiments.

Although the present application has been described in terms of the preferred embodiments, it should be understood that the present application is not limited to the specific embodiments, but is capable of numerous modifications and equivalents, and alternative embodiments and modifications of the embodiments described above, without departing from the spirit and scope of the present application.

Claims

1. A method of dialog processing, the method comprising:

acquiring at least two dialogue contents of a user;

extracting dialogue keywords in the text content;

2. The method of claim 1, wherein the method further comprises:

Determining the dialogue type corresponding to the at least two dialogue contents, wherein the dialogue type is associated with a corresponding analysis module, and the analysis module executes a corresponding analysis mode;

the analyzing the at least two dialogue contents to obtain corresponding text contents comprises the following steps:

and analyzing the at least two dialogue contents by using the analysis module corresponding to the dialogue type to obtain corresponding text contents.

3. The method of claim 2, wherein in the case that the dialog type is a picture type, the analyzing the at least two dialog contents using the analysis module corresponding to the dialog type to obtain corresponding text contents includes:

identifying picture information in the dialogue content by using a picture analysis module so as to determine at least one target object in the picture information;

4. The method of claim 2, wherein in the case that the dialog type is a video type, the analyzing the at least two dialog contents using the analysis module corresponding to the dialog type to obtain corresponding text contents includes:

Analyzing the video information in the dialogue content by using a video analysis module to determine a target object of at least one of the video information and action information of at least one of the target objects;

5. The method of claim 2, wherein in the case that the dialog type is an audio type, the analyzing the at least two dialog contents using the analysis module corresponding to the dialog type to obtain corresponding text contents includes:

and analyzing the audio information in the dialogue content by using an audio analysis module to obtain text content corresponding to the audio information.

6. The method of claim 1, wherein the method further comprises:

integrating the text content with the text content in the dialogue content to obtain integrated text content;

the extracting the dialogue keywords in the text content comprises the following steps:

7. The method of claim 6, wherein the determining the target text and the emotional intent of the user based on the associating the at least two conversation contents with the conversation keywords comprises:

8. A dialog processing device, the dialog processing device comprising:

an acquisition unit for acquiring at least two dialogue contents of a user;

9. A computer device comprising a memory storing a computer program and a processor invoking the computer program to implement the method of any of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program or computer instructions, which, when executed by a processor, implement the method of any of claims 1-7.