CN110895658A

CN110895658A - Information processing method and device and robot

Info

Publication number: CN110895658A
Application number: CN201811068854.8A
Authority: CN
Inventors: 张龙; 文旷瑜; 连园园; 宋德超
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai
Priority date: 2018-09-13
Filing date: 2018-09-13
Publication date: 2020-03-20

Abstract

The invention discloses an information processing method and device and a robot. Wherein, the method comprises the following steps: receiving voice information; analyzing the voice information by using a first model, and determining an emotion category of the voice information, wherein the first model is trained by machine learning by using a plurality of groups of data, and each group of data in the plurality of groups of data comprises: the voice information and the emotion type corresponding to the voice information; and playing response voice matched with the voice information based on the emotion category. The invention solves the technical problem that the robot can not send out proper voice information according to the current emotional state of the user because the current service robot and the home robot can only execute the related actions according to the command sent by the user.

Description

Information processing method and device and robot

Technical Field

The invention relates to the field of neural network control, in particular to an information processing method and device and a robot.

Background

With the development of artificial intelligence, service robots and home robots have gradually entered into the lives of people. However, the current service robot and home robot can only execute relevant actions according to commands sent by users, but such a way can cause that the robot is used by users, the robot only plays a role of providing services, and the robot cannot send appropriate voice information according to the current emotional state of the users, so that the state of the users develops towards a positive direction, and the experience of the users for using the robot is reduced.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides an information processing method, an information processing device and a robot, and at least solves the technical problem that the robot cannot send out proper voice information according to the current emotional state of a user because the current service robot and the current home robot can only execute related actions according to a command sent by the user.

According to an aspect of an embodiment of the present invention, there is provided an information processing method including: receiving voice information; analyzing the voice information by using a first model, and determining an emotion category of the voice information, wherein the first model is trained by machine learning by using a plurality of groups of data, and each group of data in the plurality of groups of data comprises: the voice information and the emotion type corresponding to the voice information; and playing response voice matched with the voice information based on the emotion category.

Optionally, before analyzing the speech information using the first model, the method further includes: determining the quantity of voice information and voice energy distribution in a preset time period; and determining whether to play preset voice or not according to the number and the voice energy distribution.

Optionally, the determining whether to play the preset voice message according to the number and the voice energy distribution includes: and if the number is larger than a first preset threshold value and the voice capability distribution indicates that the voice tone is larger than a second preset threshold value, determining to play the preset voice.

Optionally, the method further includes: and playing the preset voice in a preset tone under the condition of determining to play the preset voice.

Optionally, the playing the response voice matched with the voice information based on the emotion category includes: extracting intonation and speed corresponding to the emotion category; and playing the response voice according to the intonation and the speed.

Optionally, the first model comprises a deep neural network WDE-LSTM.

According to another aspect of the embodiments of the present invention, there is also provided an information processing apparatus including: a receiving unit for receiving voice information; a processing unit, configured to analyze the voice information using a first model, and determine an emotion category of the voice information, where the first model is trained through machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: the voice information and the emotion type corresponding to the voice information; and the control unit is used for playing the response voice matched with the voice information based on the emotion type.

Optionally, the processing unit is further configured to determine the number of voice messages and the voice energy distribution within a preset time period; and determining whether to play preset voice or not according to the number and the voice energy distribution.

Optionally, the processing unit is configured to execute the following steps to determine whether to play preset voice information according to the number and the voice energy distribution: and if the number is larger than a first preset threshold value and the voice capability distribution indicates that the voice tone is larger than a second preset threshold value, determining to play the preset voice.

Optionally, the control unit is further configured to play the preset voice in a preset tone under the condition that it is determined that the preset voice is played.

Optionally, the control unit is configured to play the response voice matched with the voice information based on the emotion category by performing the following steps: extracting intonation and speed corresponding to the emotion category; and playing the response voice according to the intonation and the speed.

According to another aspect of the embodiments of the present invention, there is also provided a robot including the information processing apparatus described above.

In the embodiment of the invention, the voice information is received; analyzing the voice information by using a first model, and determining an emotion category of the voice information, wherein the first model is trained by machine learning by using a plurality of groups of data, and each group of data in the plurality of groups of data comprises: the voice information and the emotion type corresponding to the voice information; based on the emotion type, playing a mode of responding voice matched with the voice information, analyzing the voice sent by the user through the first model, outputting the emotion semantic type corresponding to the voice, and then sending service voice adaptive to the emotion type based on the output emotion semantic type, so that the purpose of making corresponding voice adjustment according to the emotion type of the user is achieved, the use experience of the user is improved, the user feels pleasurable technical effect, and the technical problem that the robot cannot send appropriate voice information according to the current emotion state of the user due to the fact that the current service robot and the current home robot can only execute relevant actions according to the command sent by the user is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow diagram illustrating an alternative information processing method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an alternative information processing apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided a method embodiment of an information processing method, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that herein.

Fig. 1 is an information processing method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:

step S102, receiving voice information.

And step S104, analyzing the voice information by using a first model, and determining the emotion type of the voice information.

Wherein the first model is trained by machine learning using a plurality of sets of data, each of the plurality of sets of data comprising: and the voice information and the emotion type corresponding to the voice information. The first model may include a deep neural network WDE-LSTM.

And step S106, playing response voice matched with the voice information based on the emotion type.

Wherein the playing the response voice matched with the voice information based on the emotion category comprises: extracting intonation and speed corresponding to the emotion category; and playing the response voice according to the intonation and the speed.

Through the steps, the voice sent by the user is analyzed through the first model, the emotion semantic category corresponding to the voice is output, then the service voice adaptive to the emotion category can be sent out based on the output emotion semantic category, and the purpose of making corresponding voice adjustment according to the emotion category of the user is achieved, so that the use experience of the user is improved, the user feels pleasurable technical effect, and the technical problem that the robot cannot send out appropriate voice information according to the current emotion state of the user due to the fact that the current service robot and the current home robot can only execute related actions according to the command sent by the user is solved.

In the method, the customer service robot receives the voice questions sent by the user, determines the current emotional state of the user based on the voice emotion classification of the weak supervision deep learning, and adjusts and controls the tone of the answer questions according to the current emotional state of the user while making corresponding answers.

When the current emotional state of the user is determined, specifically, the voice sent by the user in the preset time period can be recognized through the voice training model, noise is filtered, the voice quantity and the voice energy distribution in the preset time period are determined, and therefore voice emotion analysis is performed according to the voice quantity and the voice energy distribution. Meanwhile, a deep neural network WDE-LSTM is used for inputting sentences into an input layer, a classification layer in the neural network is used for emotion classification, the output result of the classification layer is the emotion semantic category corresponding to the sentences, and then service voice adaptive to the emotion category can be sent out based on the output emotion semantic category. So that the user feels pleasure and the use experience of the user is improved.

Example 2

According to an embodiment of the present invention, there is provided an embodiment of an information processing apparatus, and fig. 2 is an information processing apparatus according to an embodiment of the present invention, as shown in fig. 2, the apparatus including:

a receiving unit 20 for receiving voice information;

a processing unit 22, configured to analyze the speech information using a first model, and determine an emotion category of the speech information, where the first model is trained through machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: the voice information and the emotion type corresponding to the voice information;

and the control unit 24 is used for playing the response voice matched with the voice information based on the emotion category.

Optionally, the processing unit 22 is further configured to determine the number of voice messages and the voice energy distribution within a preset time period; and determining whether to play preset voice or not according to the number and the voice energy distribution.

Optionally, the processing unit 22 is configured to perform the following steps to determine whether to play a preset voice message according to the number and the voice energy distribution: and if the number is larger than a first preset threshold value and the voice capability distribution indicates that the voice tone is larger than a second preset threshold value, determining to play the preset voice.

Optionally, the control unit 24 is further configured to play the preset voice in a preset tone under the condition that it is determined that the preset voice is played.

Optionally, the control unit 24 is configured to play the response voice matched with the voice information based on the emotion category by performing the following steps: extracting intonation and speed corresponding to the emotion category; and playing the response voice according to the intonation and the speed.

According to an embodiment of the present invention, there is also provided a robot including the information processing apparatus described above.

The robot in this embodiment may be, but is not limited to, a customer service robot, where the customer service robot collects voice information sent by a user, and the customer service robot analyzes the voice information sent by the user and matches answers stored in a database.

Matching the answer information, filtering noise through a voice training model (namely the first model) before answering, and then determining the voice quantity and the voice energy distribution in a preset time period so as to perform voice emotion analysis according to the voice quantity and the voice energy distribution. For example, it is analyzed that the user has a fast pace of speech and a high pitch, it can be determined that the user is in urgency, the customer service robot can placate the user before answering the question, "no urgency, we will solve your confusion", and answer with a relaxed pitch, so that the nervous and anxious mood of the user can be restored.

Matching the answer information, before answering, inputting the sentence into an input layer by using a deep neural network WDE-LSTM, carrying out emotion classification by using a classification layer in the neural network, wherein the result output by the classification layer is the emotion semantic category corresponding to the sentence, and then sending out the service voice adaptive to the emotion category based on the output emotion semantic category. For example, when a child asks a question, the emotion semantic category of a sentence corresponding to information sent by the child is acquired, service voice corresponding to the emotion category is sent, and the service voice is answered by using the tone and the speed close to those of the child, so that the speed of the speech is slow, and the understanding is facilitated. For example, when the old people ask questions, the emotion semantic category of a sentence corresponding to information sent by the old people is acquired, service voice corresponding to the emotion category is sent, the old people can answer the service voice by using the similar tone and the similar speed of the old people, the speed of the voice is slow, the tone of the voice is high, the old people with poor hearing can conveniently acquire the answer content, and the system is more humanized.

By combining the above modes, the emotion types of the users can be analyzed, corresponding voice adjustment is made, and the use feeling of the users is improved.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An information processing method characterized by comprising:

receiving voice information;

analyzing the voice information by using a first model, and determining an emotion category of the voice information, wherein the first model is trained by machine learning by using a plurality of groups of data, and each group of data in the plurality of groups of data comprises: the voice information and the emotion type corresponding to the voice information;

and playing response voice matched with the voice information based on the emotion category.

2. The method of claim 1, wherein prior to analyzing the speech information using the first model, the method further comprises:

determining the quantity of voice information and voice energy distribution in a preset time period;

and determining whether to play preset voice or not according to the number and the voice energy distribution.

3. The method of claim 2, wherein determining whether to play a predetermined voice message according to the amount and the voice energy distribution comprises:

and if the number is larger than a first preset threshold value and the voice capability distribution indicates that the voice tone is larger than a second preset threshold value, determining to play the preset voice.

4. The method of claim 2, further comprising:

and playing the preset voice in a preset tone under the condition of determining to play the preset voice.

5. The method of claim 1, wherein playing the response speech matching the speech information based on the emotion classification comprises:

extracting intonation and speed corresponding to the emotion category;

and playing the response voice according to the intonation and the speed.

6. The method of any of claims 1 to 5, wherein the first model comprises a deep neural network (WDE-LSTM).

7. An information processing apparatus characterized by comprising:

a receiving unit for receiving voice information;

a processing unit, configured to analyze the voice information using a first model, and determine an emotion category of the voice information, where the first model is trained through machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: the voice information and the emotion type corresponding to the voice information;

and the control unit is used for playing the response voice matched with the voice information based on the emotion type.

8. The apparatus of claim 7,

the processing unit is further used for determining the number of voice messages and the voice energy distribution in a preset time period; and determining whether to play preset voice or not according to the number and the voice energy distribution.

9. The apparatus of claim 8, wherein the processing unit is configured to determine whether to play a predetermined voice message according to the number and the voice energy distribution by:

10. The apparatus of claim 8,

the control unit is further configured to play the preset voice in a preset tone under the condition that the preset voice is determined to be played.

11. The device of claim 7, wherein the control unit is configured to play the response voice matching the voice information based on the emotion classification by:

extracting intonation and speed corresponding to the emotion category;

and playing the response voice according to the intonation and the speed.

12. A robot characterized by comprising the information processing apparatus according to any one of claims 7 to 11.