CN112164394A

CN112164394A - Information interaction method and device, storage medium and electronic equipment

Info

Publication number: CN112164394A
Application number: CN202010945782.1A
Authority: CN
Inventors: 包梦蛟; 陈欢
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2021-01-01

Abstract

The specification discloses an information interaction method, an information interaction device, a storage medium and electronic equipment, wherein an interaction state used for positioning an interaction process is preset according to business requirements, a proper response mode in the current interaction state is selected according to the intention and emotion of information to be responded for feeding back to a user, and the current interaction state is updated. The method can be seen in that the interactive process can be dynamically positioned to the preset interactive state, the appropriate response mode can be generated in different interactive states according to the intention and emotion of the information to be responded, and purposeful response can be dynamically generated in the continuous interactive process, so that assistance is provided for services.

Description

Information interaction method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an information interaction method and apparatus, a storage medium, and an electronic device.

Background

With the progress of artificial intelligence technology, computer interactive programs driven only by preset rules and keywords without understanding information scenes and meanings in the early days have been developed to be chat robots responding to various interactive tools of requests by providing various assistance to human life in an interactive manner so as to release manpower from tedious work in order to use technologies such as natural language processing, machine learning, and the like.

In the prior art, the chat robot generally generates a passive response after receiving information, and from a functional point of view, the chat robot includes: the chat type conversation robot carries out anthropomorphic interaction under the background of an open domain and is generally used for meeting the communication requirement of a user; the task type robot executes a certain task such as weather query and air ticket reservation by accurately understanding a user instruction and adopting a query mode and the like; the question-answering robot gives answers to questions posed by users by using preset service resources, such as a customer service robot, a sales robot and the like.

However, the existing chat robots can only generate passive responses according to the semantics of the received information, but cannot perform continuous and purposeful responses according to business requirements and in combination with interaction states.

Disclosure of Invention

The present specification provides an information interaction method, an information interaction apparatus, a storage medium, and an electronic device, so as to partially solve the above problems in the prior art.

The technical scheme adopted by the specification is as follows:

the information interaction method provided by the specification comprises the following steps:

receiving information to be responded sent by a user;

recognizing the information to be responded by adopting a trained intention recognition model to obtain the intention of the information to be responded; recognizing the information to be responded by adopting a trained emotion recognition model to obtain the emotion of the information to be responded;

determining a current interaction state;

determining a response mode according to the intention, the emotion and the current interaction state;

updating the current interaction state according to the intention and the emotion; and generating response information according to the response mode, and feeding back the response information to the user.

Optionally, the information to be responded comprises voice information;

recognizing the information to be responded by adopting the trained intention recognition model to obtain the intention of the information to be responded, and specifically comprising the following steps:

recognizing the voice information by adopting a trained voice recognition model to obtain text information corresponding to the voice information;

and recognizing the text information by adopting the trained intention recognition model to obtain the intention of the text information, and taking the intention of the text information as the intention of the voice information corresponding to the text information.

Optionally, the information to be responded comprises voice information;

adopting the trained emotion recognition model to recognize the information to be responded, and obtaining the emotion of the information to be responded, wherein the method specifically comprises the following steps:

recognizing the voice information by adopting a trained voice emotion recognition model to obtain voice emotion of the voice information;

and determining the emotion of the voice information according to the human voice emotion.

Optionally, determining the emotion of the voice information according to the human voice emotion specifically includes:

recognizing the text information by adopting a trained text emotion recognition model to obtain the emotion of the text information, and taking the emotion of the text information as the text emotion of the voice information corresponding to the text information;

and determining the emotion of the voice information according to the human voice emotion of the voice information and the text emotion of the voice information.

Optionally, determining a response mode according to the intention, the emotion and the current interaction state specifically includes:

determining a preset response information template corresponding to the intention, the emotion and the current interaction state according to the intention, the emotion and the current interaction state;

the response information template comprises variable information and constant information;

generating response information according to the response mode, specifically comprising:

determining a variable type corresponding to variable information contained in the response information template;

determining a variable value of the variable information corresponding to the variable type according to the user information of the user and/or the updated current interaction state;

and generating response information according to the variable value and the constant information contained in the response information template.

Optionally, updating the current interaction state according to the intention and the emotion, specifically including:

determining preset interaction states and preset jump conditions among the interaction states;

determining a current interaction state and jump conditions met by the intention and the emotion in the current interaction state;

and updating the current interaction state to be the interaction state meeting the jump condition.

Optionally, the constant information includes pre-recorded voice information;

generating response information according to the variable value and the constant information contained in the response information template, specifically comprising:

generating voice information corresponding to the variable value by adopting a trained variable voice generation model according to the variable value of the variable information;

and splicing the generated voice information with the pre-recorded voice information to obtain response information.

This specification provides an information interaction device, including:

an information receiving module: receiving information to be responded sent by a user;

an information identification module: recognizing the information to be responded by adopting a trained intention recognition model to obtain the intention of the information to be responded; recognizing the information to be responded by adopting a trained emotion recognition model to obtain the emotion of the information to be responded;

a state determination module: determining a current interaction state;

a response mode module: determining a response mode according to the intention, the emotion and the current interaction state;

the state updating and response information module: updating the current interaction state according to the intention and the emotion; and generating response information according to the response mode, and feeding back the response information to the user.

The present specification provides a computer-readable storage medium, which stores a computer program, and the computer program realizes the above information interaction method when executed by a processor.

The present specification provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the information interaction method is implemented.

The technical scheme adopted by the specification can achieve the following beneficial effects:

in the information interaction method provided by the present specification, an interaction state used for positioning an interaction process is preset according to a service requirement, a suitable response mode in the current interaction state is selected according to the intention and emotion of information to be responded for feedback to a user, and the current interaction state is updated. The method can be seen in that the interactive process can be dynamically positioned to the preset interactive state, the appropriate response mode can be generated in different interactive states according to the intention and emotion of the information to be responded, and purposeful response can be dynamically generated in the continuous interactive process, so that assistance is provided for services.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

FIG. 1 is a schematic flow chart of an information interaction method in the present specification;

fig. 2A is a schematic flow chart illustrating a response determining method according to an embodiment of the present disclosure;

FIG. 2B is a flow chart illustrating a response determination method according to another embodiment of the present disclosure;

FIG. 3 is a schematic flow chart illustrating updating a current interaction state according to the present disclosure;

FIG. 4 is a schematic diagram illustrating a process for obtaining intent and emotion of a voice message;

FIG. 5 is a schematic diagram of an information interaction device provided in the present specification;

fig. 6 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the present specification.

At present, a chat robot for information interaction generally understands information to be responded after receiving the information to be responded sent by a user, generates a proper response in different ways according to different required functions, and feeds the response back to the user. After the semantics of the information to be responded are identified, the task robot executes corresponding actions, such as opening corresponding files in equipment, inquiring weather forecast in a networking manner, making a call to a specified person, and the like; the question-answering robot generally answers questions generated by a user in a business process by using to-be-responded information in combination with preset business resources, for example, after receiving the quantity of commodity stocks inquired by the user, a customer service robot of an e-commerce calls the commodity stock quantity information prestored in the database, generates reply information containing the commodity stock quantity information, and feeds the reply information back to the user; the chat robot can understand the semantics of the currently received information to be responded, and under the condition of multi-round interaction, the currently received information to be responded is generally analyzed in combination with the previous information to obtain more accurate semantics, and the corresponding reply information in the corpus is selected to be fed back to the user.

Generally, the chat robot can only perform passive response to the information to be responded, that is, the interaction process is dominated by the information sent by the user, the interaction process cannot be actively influenced according to the business purpose, and the generated response has no purposiveness, so that the chat robot cannot be used for acquiring the information required by the business or completing a specific business process.

The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of an information interaction method in this specification, which specifically includes the following steps:

s100: and receiving the information to be responded sent by the user.

Natural language is an important way for interpersonal communication, and has the advantages of long history, convenience in understanding, wide application and the like, so that the burden of information expression and exchange between people is borne. Based on this, in the process of information interaction between a robot used for information interaction and a user at present, natural language understanding technology is also often used for interacting with the user in an anthropomorphic manner through natural language.

Generally, the process of interacting with a user through natural language generally includes three processes of receiving information sent by the user, processing the information sent by the user, generating response information and feeding back the response information to the user, and can perform multiple rounds of interaction. The embodiment of the present specification does not limit the round included in one interactive process, and the following description is only given by taking a single round of interaction as an example, where the single round of interaction may be the only single round of interaction in the information interactive process, or may be one single round of interaction in multiple rounds of interaction in the information interactive process.

In this embodiment of the present specification, the establishment mode of the whole interaction process is not limited, and according to a service requirement, the interaction may be actively established for a user, or may also be actively established for a robot with a user, after the interaction is established, the information to be responded may be actively sent to the user as information to be responded, and when the single-round interaction is a non-first-round interaction in a multi-round interaction, the information to be responded may also be feedback information generated by the user according to the response information in a previous single-round interaction.

The information interaction device is capable of implementing the information interaction method according to any embodiment of the present specification, and is different from a user terminal device, and a robot (i.e., an execution subject for executing the method shown in fig. 1) for performing information interaction by using the method shown in fig. 1 may be implemented by any intelligent device, such as a server, a Personal Computer (PC), and the like.

S102: recognizing the information to be responded by adopting a trained intention recognition model to obtain the intention of the information to be responded; and identifying the information to be responded by adopting the trained emotion identification model to obtain the emotion of the information to be responded.

Based on the theory of language behavior, the intention is a semantic meaning in a behavior view, that is, an expression way of the semantic meaning, that is, the intention is an intention that a certain purpose is expected to be achieved, and in the embodiment of the present specification, the intention is an intention tendency that the information to be responded is identified; the emotion is information carried in natural language and capable of reflecting human perception and attitude, and in the embodiment of the specification, the emotion is an emotional tendency identified by the information to be responded.

The recognition model for recognizing the intention and emotion of the information to be responded is a model that has been trained respectively according to recognition purposes including recognition of the intention and emotion of the information to be responded, and can be various machine learning models such as a neural network. The method for training the recognition model specifically may be: firstly, selecting a sample, and then initializing the identification model, namely, assigning initial values to all model parameters in the identification model. And then inputting the samples into the corresponding recognition models, determining the prediction results respectively output by the recognition models (namely, the intention of the samples recognized by the intention recognition models and the emotion of the samples recognized by the emotion recognition models), and respectively determining the loss of the recognition models according to the prediction results respectively output by the recognition models and the labels of the corresponding samples, wherein the smaller the loss is, the closer the prediction results output by the recognition models and the labels are, and otherwise, the larger the difference between the prediction results and the labels is. Gradients can be determined from the losses and model parameters in the identified model can be adjusted using a gradient descent algorithm. And when the adjustment times of the model parameters reach a preset time threshold value and/or the loss is less than a preset loss threshold value, respectively using the obtained recognition models as the trained intention recognition model and the trained emotion recognition model.

After the trained intention recognition model and emotion recognition model are obtained, the intention recognition result and emotion recognition result of the information to be responded can be obtained through the trained intention recognition model and emotion recognition model, and specifically, the intention recognition result can be the probability that the intention type of the information to be responded is preset each type of intention type; the emotion recognition result may be a probability that the emotion type of the information to be responded is a preset emotion type of each type.

S104: the current interaction state is determined.

According to the service requirements, a service process usually includes at least one preset service link, and correspondingly, in the embodiment of the present specification, at least one interaction state is preset, where the interaction state can represent the corresponding service link. The current interaction state is a preset interaction state capable of representing the service state represented by the current interaction.

In an embodiment of this specification, according to the service requirement, one of the service links needs to confirm the user identity, and in the service link, the preset interaction state includes: the robot comprises a user identity to-be-confirmed state, a user identity and preset identity information corresponding state and a user identity and preset identity information non-corresponding state, wherein the preset identity information is a preset corresponding relation between the identity information and a contact way, before the user identity is confirmed, the corresponding current interaction state is the user identity to-be-confirmed state, and the robot feeds back response information to the user: ask you for three sheets? Identifying information to be responded input by a user, and updating the current interaction state according to an identification result, specifically, if the user answers "yes", updating the current interaction state into a state corresponding to the user identity and the preset identity information; and if the user answers 'no', updating the current interaction state to be the state that the user identity does not correspond to the preset identity information.

S106: and determining a response mode according to the intention, the emotion and the current interaction state.

In this specification, each preset interaction state corresponds to at least one response mode, and the response mode is used for generating response information and feeding the response information back to a user. And when the current interaction state is a preset certain interaction state, the response mode corresponding to the preset interaction state is the response mode corresponding to the current interaction state. That is, if the current interaction state is updated, the response mode corresponding to the current interaction state is also updated accordingly.

For a specific interactive state, the response mode is determined according to the intention and the emotion of the information to be responded, that is, in one interactive state, the selected response mode is different for different intentions and emotions of the information to be responded.

Specifically, as shown in fig. 2A, the interaction state may correspond to a response mode table formed by at least one response mode, the response mode table records all response modes in the interaction state, selects a response mode according to intention and emotion, identifies the information to be responded, obtains the intention type and the emotion type of the information to be responded, and selects a corresponding response mode according to the identified intention type and emotion type, specifically, in state a, the information to be responded is identified as emotion type 3 and intention type 2, that is, selects a response mode (3, 2) as a response mode, and generates response information according to the response mode (3, 2) and feeds the response information back to the user.

In another embodiment of this specification, as shown in fig. 2B, the interaction state may correspond to a two-dimensional plane formed by two coordinate axes, the two coordinate axes are respectively preset to represent intentions and emotions, a response mode is preset in all regions of the two-dimensional plane where the recognition result may appear, coordinates of the information to be responded on the two-dimensional plane are located according to the intention recognition result and the emotion recognition result of the information to be responded, so as to represent the intention and the emotion of the information to be responded, a response mode is selected according to the coordinates of the information to be responded on the two-dimensional plane, and response information is generated according to the response mode and fed back to the user.

S108: updating the current interaction state according to the intention and the emotion; and generating response information according to the response mode, and feeding back the response information to the user.

In an embodiment of this specification, in a single round of interaction, the current interaction state includes a present state and a present state, where the interaction state before the current interaction state is updated is the present state, the interaction state after the current interaction state is updated is the present state, and the present state is a minor state, where once the update is performed, the minor state becomes the present state, that is, after the current interaction state is updated, the interaction state corresponding to the current interaction state is changed from the minor state to the present state.

In an embodiment of the present specification, a finite state machine is used to represent logic of a service link, and to process switching between states, specifically, after determining an interaction state capable of representing a corresponding service link as a current state of the single-round interaction, according to the obtained intention and emotion, an interaction state capable of representing the service link is selected as a secondary state, and the selected secondary state is updated to the current state, that is, the current interaction state is updated to the interaction state selected as the secondary state.

Specifically, as shown in fig. 3, in a single-round interaction, a state a is an interaction state before the current interaction state is updated, and when the intention and the emotion satisfy a condition c, the current interaction state is updated from the state a to a state D.

It can be seen that the current interaction state is not always fixed, and as the position state of the current ongoing interaction in the service link changes as each round of information interaction progresses, the current interaction state needs to be updated according to the progress of the information interaction, so as to ensure that the service link in which each round of information interaction is located can be represented by the current interaction state when the round of information interaction progresses during dynamic multi-round information interaction. And determining an interaction state which is more consistent with the information interaction according to the intention and the emotion of the information to be responded as a current interaction state.

In an embodiment of the present specification, the response mode is a response policy determined according to the current interaction state, and the intention and emotion of the information to be responded, and the response information fed back to the user is generated according to the response mode.

In a multi-round interaction with natural language as a carrier, often, information to be responded input by a user is logically associated with response information sent by a robot, for example, when the response information is a question, an answer to the question is usually received, and specifically, in a business link of confirming the identity of the user, the response information sent to the user is: ask you for three sheets? The information to be responded to received in the next round of interaction usually contains information that can judge the identity of the user, such as "yes" or "no". Based on this, the response mode is set reasonably, and appropriate response information is generated, so that the information to be responded which meets the service requirement can be obtained generally.

In an embodiment of the present specification, a response state that a user feedback meeting a service requirement is more likely to be obtained in terms of probability is preset according to the service link and a service purpose that the service is expected to achieve, that is, the interaction has a greater probability of developing toward a service requirement.

As can be seen from the method described in fig. 1, the present specification provides an information interaction method, which introduces a current interaction state positioning interaction process, and generates a response according to the current interaction state, the intention of information to be responded, and emotion, and feeds back the response to a user, where the current interaction state is updated along with the progress of interaction, and when the method is used for assisting a service, the current interaction state can be guaranteed to always represent the link of the current interaction in the service, optionally, the method can also preset a suitable response mode according to the service requirement, so as to promote the interaction process to develop towards the direction of a target service link, and provide assistance for the service with the preset link and the service requirement, and the method adopts two dimensions of intention and emotion to extract the characteristics of the information to be responded, so as to more accurately identify the intention and fine-grained emotion expressed by the user, therefore, a response more conforming to the interpersonal interaction habit is generated, and the non-perception rate of the user to the interaction system in the interaction process is improved.

Further, in the actual information interaction process using natural language as a carrier, usually, not only a fixed information type is used, but also different types of information are used for convenience of meaning expression, for example, when a more implied expression is desired, information such as poetry (text type), music (audio type) and the like can be used; in the network era, the preset emoticons (image types) are rather used as a popular culture and are often used as means for representing intentions and emotions in information interaction. Because various types of information are often adopted to express intentions and emotions in the natural language communication process, compared with an information interaction method which can only understand one type of fixed information, the information to be responded can be understood more accurately in the information interaction process.

Based on this, the embodiments of the present specification provide a method for identifying an intention or an emotion of the information to be responded, in a case where types of the information to be responded are different, so as to achieve more accurate understanding of the information to be responded.

In one embodiment of the present specification, upon receiving text-type information to be responded, a method of identifying an intention of the information to be responded may include: and recognizing the text information by adopting a trained text intention recognition model, and obtaining the intention of the text information as the intention of the information to be responded. The method for identifying the emotion of the information to be responded can comprise the following steps: and recognizing the text information by adopting the trained text emotion recognition model to obtain the emotion of the text information as the emotion of the information to be responded.

Similarly, when receiving the information to be responded of the audio type, the method for identifying the intention of the information to be responded may be: firstly, judging whether the audio information is voice information. If the audio information is voice information, the trained voice recognition model can be directly adopted to recognize the voice information to obtain text information corresponding to the voice information, and the trained text intention recognition model is adopted to recognize the text information to obtain the intention of the text information as the intention of the information to be responded. If the audio information is not voice information, determining audio information matched with the audio information sent by the user according to the similarity between pre-stored audio information and the audio information sent by the user, and determining the intention of the audio information sent by the user, namely the intention of the information to be responded, as standard audio information according to the preset intention corresponding to the standard audio information. If the audio information matched with the audio information sent by the user is not determined in the pre-stored audio information, the intention of the audio information sent by the user can be recognized by adopting the trained non-voice audio intention recognition model as the intention of the information to be responded. The method for identifying the emotion of the information to be responded can be as follows: firstly, judging whether the audio information is voice information; if the audio information is voice information, the voice information can be recognized by adopting a trained voice recognition model to obtain text information corresponding to the voice information, the text information is recognized by adopting a trained text emotion recognition model to obtain the emotion of the text information, and the emotion of the text information is used as the emotion of the information to be responded. The voice information can also be recognized by adopting a trained voice emotion recognition model to obtain the voice emotion of the voice information, and the voice emotion of the voice information is used as the emotion of the information to be responded. As shown in fig. 4, the trained speech recognition model is used to recognize the speech information to obtain text information corresponding to the speech information, and the trained text emotion recognition model is used to recognize the text information to obtain the emotion of the text information; recognizing the voice information by adopting a trained voice emotion recognition model to obtain voice emotion of the voice information; and determining the emotion of the information to be responded according to the emotion of the text information and the voice emotion. If the audio information is not voice information, determining audio information matched with the audio information sent by the user according to the similarity between pre-stored audio information and the audio information sent by the user to serve as standard audio information, and determining the emotion of the audio information sent by the user according to the preset emotion corresponding to the standard audio information, namely the emotion of the information to be responded. If the audio information matched with the audio information sent by the user is not determined in the pre-stored audio information, the emotion of the audio information sent by the user can be recognized by adopting the trained non-voice audio recognition model, and the recognized emotion of the audio information is used as the emotion of the information to be responded.

When the received information to be responded is audio information, before the intention and/or emotion of the audio information is identified by adopting the method, the intention and/or emotion corresponding to various audio information can be labeled, and the audio information and the corresponding intention and/or emotion label are stored.

The similarity between the pre-stored audio information and the audio information sent by the user may be a probability of similarity to the standard audio information, which is obtained by identifying the characteristics of the information to be responded according to the audio type by using a computer technology, specifically, the characteristics may be numerical values capable of representing characteristics such as audio frequency, amplitude, beat, zero crossing rate, short-time energy, mel-frequency cepstrum coefficient, and the like, the computer technology may be various technologies including a statistical method, a neural network method, a template matching method, and the like, any one of the prior art can be used for the characteristics and the computer technology, and details are not repeated in this specification.

Similarly, when the information to be responded of the image type is received, the method of identifying the intention of the information to be responded may include: determining image information matched with the image information sent by the user according to the similarity between the pre-stored image information and the image information sent by the user, wherein the image information is used as standard image information, and determining the intention of the image information sent by the user according to the preset intention corresponding to the standard image information, namely the intention of information to be responded. If the image information matched with the image information sent by the user is not determined in the pre-stored image information, the intention of the image information sent by the user can be recognized by adopting a trained image intention recognition model, and the recognized intention of the image information is taken as the intention of the information to be responded. The method for identifying the emotion of the information to be responded comprises the following steps: the image information matched with the image information sent by the user can be determined according to the similarity between the pre-stored image information and the image information sent by the user to serve as standard image information, and the emotion of the image information sent by the user is determined according to the preset emotion corresponding to the standard image information to serve as the emotion of the information to be responded. If the image information matched with the image information sent by the user is not determined in the pre-stored image information, the emotion of the image information sent by the user can be recognized by adopting a trained image emotion recognition model to serve as the emotion of the information to be responded.

When the received information to be responded is image information, before the intention and/or emotion of the image information is identified by adopting the method, the intention and/or emotion corresponding to various image information can be labeled, and the image information and the corresponding intention and/or emotion label can be stored.

In an embodiment of this specification, the similarity between the pre-stored image information and the image information sent by the user may be obtained by identifying, according to a feature of the information to be responded of an image type, a similarity probability between the pre-stored image information and the image information sent by the user and the standard image information, specifically, the feature may be a numerical value capable of representing a color, a shape, a gray scale, a texture, and the like of an image, the computer technology may be a plurality of technologies including a statistical method, a syntax identification method, a neural network method, a template matching method, a geometric transformation method, and the like, and any one of the existing technologies may be used for the feature and the computer technology, and details are not repeated in this specification.

In an embodiment of this specification, specifically, the above-mentioned using a finite-state machine to represent the logic of the business link, and the processing the switching between the states includes: and presetting a jump condition among the states, wherein the jump condition is used for judging the condition satisfaction of the intention and the emotion of the recognition result, executing the jump of the interactive states according to the judgment result, presetting at least one corresponding response mode for each interactive state, selecting one of the response modes corresponding to the current interactive state according to the intention and the emotion, and the response mode is used for generating response information and feeding the response information back to the user.

In one embodiment of the present specification, as shown in fig. 3, the finite state machine includes a state a, a state B, a state C, and a state D, a preset jump condition exists between the states, and when the jump condition of the current state is satisfied, a jump of the state may be performed. For example, the current interaction state before updating is state a, and when the intention and emotion identified by the information to be responded satisfy condition c, the current interaction state is updated to be state D.

In an embodiment of the present specification, the response mode is a natural language template, and response information using a natural language as a carrier can be generated according to the template, where the response information template includes variable information and constant information, and specifically, after the selected response mode is determined, information included in the constant information does not change due to different users; and the variable information is different for different users. After the response mode is determined, the constant information is determined, and then the variable information is determined according to the user information and/or the current interaction state.

In one embodiment of the present specification, when the user is reminded of the predetermined service period, the response template may be: do you want to expire in B minutes for the scheduled A service, ask for a delay? In this example, the content a of the service predetermined by the user and the time B from the expiration of the service can be determined according to the user information.

The constant information includes pre-recorded voice information, and may also include pre-stored other fixed and unchangeable voice information. For the variable information, according to the variable value of the variable information, the trained variable speech generation model may be used to generate the speech information corresponding to the variable value, specifically, in the above embodiment of reminding the user of the predetermined service deadline, after determining the variable value A, B, the trained variable speech generation model is used to generate the speech information output in a natural language manner, and the speech information generation manner may be a speech generation manner using a computer technology such as a neural network. After the voice information corresponding to the variable information is generated, the voice information corresponding to the variable information and the voice information of the prerecorded constant information can be spliced to obtain response information.

The above is only an exemplary method for generating voice information according to variable information, and those skilled in the art can understand that other manners may also be used to generate voice information corresponding to variable information, which is not limited in this specification.

Based on the same idea, the information interaction method provided by the embodiment of the present specification further provides a corresponding apparatus, a storage medium, and an electronic device.

Fig. 5 is a schematic structural diagram of an information selection apparatus provided in an embodiment of the present specification, where the apparatus includes:

the information receiving module 500 is configured to receive information to be responded, which is sent by a user;

an information identification module 502, configured to identify the information to be responded by using a trained intention identification model, and obtain an intention of the information to be responded; recognizing the information to be responded by adopting a trained emotion recognition model to obtain the emotion of the information to be responded;

a state determination module 504, configured to determine a current interaction state;

a response mode module 506, configured to determine a response mode according to the intention, the emotion, and the current interaction state;

a status update and response information module 508, configured to update the current interaction status according to the intention and the emotion; and generating response information according to the response mode, and feeding back the response information to the user.

Optionally, the information identifying module 502 specifically includes an intention identifying unit 502A and an emotion identifying unit 502B, where the intention identifying unit 502A is specifically configured to identify the information to be responded by using a trained intention identifying model, and obtain an intention of the information to be responded; the emotion recognition unit 502B is specifically configured to recognize the information to be responded by using the trained emotion recognition model, and obtain an emotion of the information to be responded.

Optionally, the state updating and response information module 508 in the information interaction device specifically includes a current interaction state updating unit 508A and a response information generating unit 508B, where the current interaction state updating unit 508A is specifically configured to update the current interaction state according to the intention and the emotion, and the response information generating unit 508B is specifically configured to generate response information according to the response mode, and feed back the response information to the user.

Optionally, the information to be responded comprises voice information; the information recognition module 502 is specifically configured to recognize the voice information by using a trained voice recognition model, and obtain text information corresponding to the voice information; and recognizing the text information by adopting the trained intention recognition model to obtain the intention of the text information, and taking the intention of the text information as the intention of the voice information corresponding to the text information.

Optionally, the information to be responded comprises voice information; the information recognition module 502 is specifically configured to recognize the voice information by using a trained voice emotion recognition model, and obtain voice emotion of the voice information; and determining the emotion of the voice information according to the human voice emotion.

Optionally, the information recognition module 502 is specifically configured to recognize the voice information by using a trained voice recognition model, and obtain text information corresponding to the voice information; recognizing the text information by adopting a trained text emotion recognition model to obtain the emotion of the text information, and taking the emotion of the text information as the text emotion of the voice information corresponding to the text information; and determining the emotion of the voice information according to the human voice emotion of the voice information and the text emotion of the voice information.

Optionally, the response mode module 506 is specifically configured to determine a preset response information template corresponding to the intention, the emotion, and the current interaction state according to the intention, the emotion, and the current interaction state; the response information template comprises variable information and constant information; generating response information according to the response mode, specifically comprising: determining a variable type corresponding to variable information contained in the response information template; determining a variable value of the variable information corresponding to the variable type according to the user information of the user and/or the updated current interaction state; and generating response information according to the variable value and the constant information contained in the response information template.

Optionally, the state updating and response information module 508 is specifically configured to determine a preset interaction state and a preset jump condition between the interaction states; determining a current interaction state and jump conditions met by the intention and the emotion in the current interaction state; and updating the current interaction state to be the interaction state meeting the jump condition.

Optionally, the constant information includes pre-recorded voice information; the state updating and response information module 508 is specifically configured to generate, according to a variable value of the variable information, speech information corresponding to the variable value by using a trained variable speech generation model; and splicing the generated voice information with the pre-recorded voice information to obtain response information.

Specifically, the information interaction device shown in fig. 5 may be a robot for performing information interaction with a user.

The present specification also provides a computer-readable storage medium storing a computer program, which can be used to execute the information interaction method provided in fig. 1.

This specification also provides a schematic block diagram of the electronic device shown in fig. 6. As shown in fig. 6, at the hardware level, the information interaction device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, and may also include hardware required by other services. The processor reads a corresponding computer program from the non-volatile memory into the memory and then runs the computer program to implement the information interaction method described in fig. 1. Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. An information interaction method, comprising:

receiving information to be responded sent by a user;

determining a current interaction state;

2. The method of claim 1, wherein the information to be responded to includes voice information;

3. The method of claim 1, wherein the information to be responded to includes voice information;

4. The method of claim 3, wherein determining the emotion of the speech information from the human voice emotion specifically comprises:

5. The method according to any one of claims 1 to 4, wherein determining a response mode according to the intention, the emotion and the current interaction state specifically comprises:

6. The method according to any one of claims 1 to 4, wherein updating the current interaction state according to the intent and the emotion specifically comprises:

7. The method of claim 5, wherein the constant information comprises pre-recorded voice information;

8. An information interaction apparatus, comprising:

a state determination module: determining a current interaction state;

9. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the information interaction method of any one of the preceding claims 1 to 7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the method of any of claims 1-7.