CN110895658A - Information processing method and device and robot - Google Patents

Information processing method and device and robot Download PDF

Info

Publication number
CN110895658A
CN110895658A CN201811068854.8A CN201811068854A CN110895658A CN 110895658 A CN110895658 A CN 110895658A CN 201811068854 A CN201811068854 A CN 201811068854A CN 110895658 A CN110895658 A CN 110895658A
Authority
CN
China
Prior art keywords
voice
preset
information
voice information
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811068854.8A
Other languages
Chinese (zh)
Inventor
张龙
文旷瑜
连园园
宋德超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN201811068854.8A priority Critical patent/CN110895658A/en
Publication of CN110895658A publication Critical patent/CN110895658A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Child & Adolescent Psychology (AREA)
  • Artificial Intelligence (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses an information processing method and device and a robot. Wherein, the method comprises the following steps: receiving voice information; analyzing the voice information by using a first model, and determining an emotion category of the voice information, wherein the first model is trained by machine learning by using a plurality of groups of data, and each group of data in the plurality of groups of data comprises: the voice information and the emotion type corresponding to the voice information; and playing response voice matched with the voice information based on the emotion category. The invention solves the technical problem that the robot can not send out proper voice information according to the current emotional state of the user because the current service robot and the home robot can only execute the related actions according to the command sent by the user.

Description

Information processing method and device and robot
Technical Field
The invention relates to the field of neural network control, in particular to an information processing method and device and a robot.
Background
With the development of artificial intelligence, service robots and home robots have gradually entered into the lives of people. However, the current service robot and home robot can only execute relevant actions according to commands sent by users, but such a way can cause that the robot is used by users, the robot only plays a role of providing services, and the robot cannot send appropriate voice information according to the current emotional state of the users, so that the state of the users develops towards a positive direction, and the experience of the users for using the robot is reduced.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides an information processing method, an information processing device and a robot, and at least solves the technical problem that the robot cannot send out proper voice information according to the current emotional state of a user because the current service robot and the current home robot can only execute related actions according to a command sent by the user.
According to an aspect of an embodiment of the present invention, there is provided an information processing method including: receiving voice information; analyzing the voice information by using a first model, and determining an emotion category of the voice information, wherein the first model is trained by machine learning by using a plurality of groups of data, and each group of data in the plurality of groups of data comprises: the voice information and the emotion type corresponding to the voice information; and playing response voice matched with the voice information based on the emotion category.
Optionally, before analyzing the speech information using the first model, the method further includes: determining the quantity of voice information and voice energy distribution in a preset time period; and determining whether to play preset voice or not according to the number and the voice energy distribution.
Optionally, the determining whether to play the preset voice message according to the number and the voice energy distribution includes: and if the number is larger than a first preset threshold value and the voice capability distribution indicates that the voice tone is larger than a second preset threshold value, determining to play the preset voice.
Optionally, the method further includes: and playing the preset voice in a preset tone under the condition of determining to play the preset voice.
Optionally, the playing the response voice matched with the voice information based on the emotion category includes: extracting intonation and speed corresponding to the emotion category; and playing the response voice according to the intonation and the speed.
Optionally, the first model comprises a deep neural network WDE-LSTM.
According to another aspect of the embodiments of the present invention, there is also provided an information processing apparatus including: a receiving unit for receiving voice information; a processing unit, configured to analyze the voice information using a first model, and determine an emotion category of the voice information, where the first model is trained through machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: the voice information and the emotion type corresponding to the voice information; and the control unit is used for playing the response voice matched with the voice information based on the emotion type.
Optionally, the processing unit is further configured to determine the number of voice messages and the voice energy distribution within a preset time period; and determining whether to play preset voice or not according to the number and the voice energy distribution.
Optionally, the processing unit is configured to execute the following steps to determine whether to play preset voice information according to the number and the voice energy distribution: and if the number is larger than a first preset threshold value and the voice capability distribution indicates that the voice tone is larger than a second preset threshold value, determining to play the preset voice.
Optionally, the control unit is further configured to play the preset voice in a preset tone under the condition that it is determined that the preset voice is played.
Optionally, the control unit is configured to play the response voice matched with the voice information based on the emotion category by performing the following steps: extracting intonation and speed corresponding to the emotion category; and playing the response voice according to the intonation and the speed.
According to another aspect of the embodiments of the present invention, there is also provided a robot including the information processing apparatus described above.
In the embodiment of the invention, the voice information is received; analyzing the voice information by using a first model, and determining an emotion category of the voice information, wherein the first model is trained by machine learning by using a plurality of groups of data, and each group of data in the plurality of groups of data comprises: the voice information and the emotion type corresponding to the voice information; based on the emotion type, playing a mode of responding voice matched with the voice information, analyzing the voice sent by the user through the first model, outputting the emotion semantic type corresponding to the voice, and then sending service voice adaptive to the emotion type based on the output emotion semantic type, so that the purpose of making corresponding voice adjustment according to the emotion type of the user is achieved, the use experience of the user is improved, the user feels pleasurable technical effect, and the technical problem that the robot cannot send appropriate voice information according to the current emotion state of the user due to the fact that the current service robot and the current home robot can only execute relevant actions according to the command sent by the user is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow diagram illustrating an alternative information processing method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an alternative information processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
In accordance with an embodiment of the present invention, there is provided a method embodiment of an information processing method, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that herein.
Fig. 1 is an information processing method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:
step S102, receiving voice information.
And step S104, analyzing the voice information by using a first model, and determining the emotion type of the voice information.
Wherein the first model is trained by machine learning using a plurality of sets of data, each of the plurality of sets of data comprising: and the voice information and the emotion type corresponding to the voice information. The first model may include a deep neural network WDE-LSTM.
And step S106, playing response voice matched with the voice information based on the emotion type.
Wherein the playing the response voice matched with the voice information based on the emotion category comprises: extracting intonation and speed corresponding to the emotion category; and playing the response voice according to the intonation and the speed.
Through the steps, the voice sent by the user is analyzed through the first model, the emotion semantic category corresponding to the voice is output, then the service voice adaptive to the emotion category can be sent out based on the output emotion semantic category, and the purpose of making corresponding voice adjustment according to the emotion category of the user is achieved, so that the use experience of the user is improved, the user feels pleasurable technical effect, and the technical problem that the robot cannot send out appropriate voice information according to the current emotion state of the user due to the fact that the current service robot and the current home robot can only execute related actions according to the command sent by the user is solved.
Optionally, before analyzing the speech information using the first model, the method further includes: determining the quantity of voice information and voice energy distribution in a preset time period; and determining whether to play preset voice or not according to the number and the voice energy distribution.
Optionally, the determining whether to play the preset voice message according to the number and the voice energy distribution includes: and if the number is larger than a first preset threshold value and the voice capability distribution indicates that the voice tone is larger than a second preset threshold value, determining to play the preset voice.
Optionally, the method further includes: and playing the preset voice in a preset tone under the condition of determining to play the preset voice.
In the method, the customer service robot receives the voice questions sent by the user, determines the current emotional state of the user based on the voice emotion classification of the weak supervision deep learning, and adjusts and controls the tone of the answer questions according to the current emotional state of the user while making corresponding answers.
When the current emotional state of the user is determined, specifically, the voice sent by the user in the preset time period can be recognized through the voice training model, noise is filtered, the voice quantity and the voice energy distribution in the preset time period are determined, and therefore voice emotion analysis is performed according to the voice quantity and the voice energy distribution. Meanwhile, a deep neural network WDE-LSTM is used for inputting sentences into an input layer, a classification layer in the neural network is used for emotion classification, the output result of the classification layer is the emotion semantic category corresponding to the sentences, and then service voice adaptive to the emotion category can be sent out based on the output emotion semantic category. So that the user feels pleasure and the use experience of the user is improved.
Example 2
According to an embodiment of the present invention, there is provided an embodiment of an information processing apparatus, and fig. 2 is an information processing apparatus according to an embodiment of the present invention, as shown in fig. 2, the apparatus including:
a receiving unit 20 for receiving voice information;
a processing unit 22, configured to analyze the speech information using a first model, and determine an emotion category of the speech information, where the first model is trained through machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: the voice information and the emotion type corresponding to the voice information;
and the control unit 24 is used for playing the response voice matched with the voice information based on the emotion category.
Optionally, the processing unit 22 is further configured to determine the number of voice messages and the voice energy distribution within a preset time period; and determining whether to play preset voice or not according to the number and the voice energy distribution.
Optionally, the processing unit 22 is configured to perform the following steps to determine whether to play a preset voice message according to the number and the voice energy distribution: and if the number is larger than a first preset threshold value and the voice capability distribution indicates that the voice tone is larger than a second preset threshold value, determining to play the preset voice.
Optionally, the control unit 24 is further configured to play the preset voice in a preset tone under the condition that it is determined that the preset voice is played.
Optionally, the control unit 24 is configured to play the response voice matched with the voice information based on the emotion category by performing the following steps: extracting intonation and speed corresponding to the emotion category; and playing the response voice according to the intonation and the speed.
According to an embodiment of the present invention, there is also provided a robot including the information processing apparatus described above.
The robot in this embodiment may be, but is not limited to, a customer service robot, where the customer service robot collects voice information sent by a user, and the customer service robot analyzes the voice information sent by the user and matches answers stored in a database.
Matching the answer information, filtering noise through a voice training model (namely the first model) before answering, and then determining the voice quantity and the voice energy distribution in a preset time period so as to perform voice emotion analysis according to the voice quantity and the voice energy distribution. For example, it is analyzed that the user has a fast pace of speech and a high pitch, it can be determined that the user is in urgency, the customer service robot can placate the user before answering the question, "no urgency, we will solve your confusion", and answer with a relaxed pitch, so that the nervous and anxious mood of the user can be restored.
Matching the answer information, before answering, inputting the sentence into an input layer by using a deep neural network WDE-LSTM, carrying out emotion classification by using a classification layer in the neural network, wherein the result output by the classification layer is the emotion semantic category corresponding to the sentence, and then sending out the service voice adaptive to the emotion category based on the output emotion semantic category. For example, when a child asks a question, the emotion semantic category of a sentence corresponding to information sent by the child is acquired, service voice corresponding to the emotion category is sent, and the service voice is answered by using the tone and the speed close to those of the child, so that the speed of the speech is slow, and the understanding is facilitated. For example, when the old people ask questions, the emotion semantic category of a sentence corresponding to information sent by the old people is acquired, service voice corresponding to the emotion category is sent, the old people can answer the service voice by using the similar tone and the similar speed of the old people, the speed of the voice is slow, the tone of the voice is high, the old people with poor hearing can conveniently acquire the answer content, and the system is more humanized.
By combining the above modes, the emotion types of the users can be analyzed, corresponding voice adjustment is made, and the use feeling of the users is improved.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (12)

1. An information processing method characterized by comprising:
receiving voice information;
analyzing the voice information by using a first model, and determining an emotion category of the voice information, wherein the first model is trained by machine learning by using a plurality of groups of data, and each group of data in the plurality of groups of data comprises: the voice information and the emotion type corresponding to the voice information;
and playing response voice matched with the voice information based on the emotion category.
2. The method of claim 1, wherein prior to analyzing the speech information using the first model, the method further comprises:
determining the quantity of voice information and voice energy distribution in a preset time period;
and determining whether to play preset voice or not according to the number and the voice energy distribution.
3. The method of claim 2, wherein determining whether to play a predetermined voice message according to the amount and the voice energy distribution comprises:
and if the number is larger than a first preset threshold value and the voice capability distribution indicates that the voice tone is larger than a second preset threshold value, determining to play the preset voice.
4. The method of claim 2, further comprising:
and playing the preset voice in a preset tone under the condition of determining to play the preset voice.
5. The method of claim 1, wherein playing the response speech matching the speech information based on the emotion classification comprises:
extracting intonation and speed corresponding to the emotion category;
and playing the response voice according to the intonation and the speed.
6. The method of any of claims 1 to 5, wherein the first model comprises a deep neural network (WDE-LSTM).
7. An information processing apparatus characterized by comprising:
a receiving unit for receiving voice information;
a processing unit, configured to analyze the voice information using a first model, and determine an emotion category of the voice information, where the first model is trained through machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: the voice information and the emotion type corresponding to the voice information;
and the control unit is used for playing the response voice matched with the voice information based on the emotion type.
8. The apparatus of claim 7,
the processing unit is further used for determining the number of voice messages and the voice energy distribution in a preset time period; and determining whether to play preset voice or not according to the number and the voice energy distribution.
9. The apparatus of claim 8, wherein the processing unit is configured to determine whether to play a predetermined voice message according to the number and the voice energy distribution by:
and if the number is larger than a first preset threshold value and the voice capability distribution indicates that the voice tone is larger than a second preset threshold value, determining to play the preset voice.
10. The apparatus of claim 8,
the control unit is further configured to play the preset voice in a preset tone under the condition that the preset voice is determined to be played.
11. The device of claim 7, wherein the control unit is configured to play the response voice matching the voice information based on the emotion classification by:
extracting intonation and speed corresponding to the emotion category;
and playing the response voice according to the intonation and the speed.
12. A robot characterized by comprising the information processing apparatus according to any one of claims 7 to 11.
CN201811068854.8A 2018-09-13 2018-09-13 Information processing method and device and robot Pending CN110895658A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811068854.8A CN110895658A (en) 2018-09-13 2018-09-13 Information processing method and device and robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811068854.8A CN110895658A (en) 2018-09-13 2018-09-13 Information processing method and device and robot

Publications (1)

Publication Number Publication Date
CN110895658A true CN110895658A (en) 2020-03-20

Family

ID=69785553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811068854.8A Pending CN110895658A (en) 2018-09-13 2018-09-13 Information processing method and device and robot

Country Status (1)

Country Link
CN (1) CN110895658A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413113A (en) * 2013-01-15 2013-11-27 上海大学 Intelligent emotional interaction method for service robot
CN105260416A (en) * 2015-09-25 2016-01-20 百度在线网络技术(北京)有限公司 Voice recognition based searching method and apparatus
CN106773923A (en) * 2016-11-30 2017-05-31 北京光年无限科技有限公司 The multi-modal affection data exchange method and device of object manipulator
CN107452400A (en) * 2017-07-24 2017-12-08 珠海市魅族科技有限公司 Voice broadcast method and device, computer installation and computer-readable recording medium
CN107464566A (en) * 2017-09-21 2017-12-12 百度在线网络技术(北京)有限公司 Audio recognition method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413113A (en) * 2013-01-15 2013-11-27 上海大学 Intelligent emotional interaction method for service robot
CN105260416A (en) * 2015-09-25 2016-01-20 百度在线网络技术(北京)有限公司 Voice recognition based searching method and apparatus
CN106773923A (en) * 2016-11-30 2017-05-31 北京光年无限科技有限公司 The multi-modal affection data exchange method and device of object manipulator
CN107452400A (en) * 2017-07-24 2017-12-08 珠海市魅族科技有限公司 Voice broadcast method and device, computer installation and computer-readable recording medium
CN107464566A (en) * 2017-09-21 2017-12-12 百度在线网络技术(北京)有限公司 Audio recognition method and device

Similar Documents

Publication Publication Date Title
US20240054117A1 (en) Artificial intelligence platform with improved conversational ability and personality development
JP6876752B2 (en) Response method and equipment
CN109147802B (en) Playing speed adjusting method and device
CN109960723A (en) A kind of interactive system and method for psychological robot
CN103236259A (en) Voice recognition processing and feedback system, voice response method
CN110444229A (en) Communication service method, device, computer equipment and storage medium based on speech recognition
US20230412995A1 (en) Advanced hearing prosthesis recipient habilitation and/or rehabilitation
CN106774845A (en) A kind of intelligent interactive method, device and terminal device
ter Heijden et al. Design and evaluation of a virtual reality exposure therapy system with automatic free speech interaction
JP2017207641A (en) Information processing device, control method and computer program thereof
Khouzaimi et al. Optimising turn-taking strategies with reinforcement learning
CN114341922A (en) Information processing system, information processing method, and program
CN111460094A (en) Method and device for optimizing audio splicing based on TTS (text to speech)
US20190362737A1 (en) Modifying voice data of a conversation to achieve a desired outcome
CN116030788B (en) Intelligent voice interaction method and device
Tu Learn to speak like a native: AI-powered chatbot simulating natural conversation for language tutoring
CN112634886A (en) Interaction method of intelligent equipment, server, computing equipment and storage medium
CN110895658A (en) Information processing method and device and robot
JP7418106B2 (en) Information processing device, information processing method and program
Fukumoto et al. Optimization of sound of autonomous sensory meridian response with interactive genetic algorithm
CN115965251A (en) Teaching evaluation method, teaching evaluation device, storage medium, and server
EP3514783A1 (en) Contextual language learning device, system and method
Agarwal et al. Towards MOOCs for lipreading: Using synthetic talking heads to train humans in lipreading at scale
CN109726267A (en) Story recommended method and device for Story machine
CN113053186A (en) Interaction method, interaction device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200320