CN110599999A

CN110599999A - Data interaction method and device and robot

Info

Publication number: CN110599999A
Application number: CN201910876510.8A
Authority: CN
Inventors: 寇晓宇; 曹德福; 徐开明; 樊琴; 徐胤博; 李幸林
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2019-12-20

Abstract

The invention relates to a data interaction method, a data interaction device and a robot, wherein current information of a user is obtained firstly; inputting the current information into a pre-constructed detection model for processing to obtain target emotion information corresponding to the current information; generating emotion feedback sentences corresponding to the target emotion information according to the target emotion information; converting the emotion feedback statement into emotion feedback voice by utilizing a voice synthesis technology; and outputting the emotional feedback voice so as to feed back the user according to the emotion of the user. By adopting the technical scheme of the invention, the current emotion of the user can be judged by analyzing the current information of the user, and corresponding voice can be fed back to the user according to the current emotion of the user, for example, the method plays a comforting role when the user is injured, plays a soothing role when the user is angry, and improves the practicability of the robot.

Description

Data interaction method and device and robot

Technical Field

The invention relates to the technical field of robots, in particular to a data interaction method and device and a robot.

Background

The robot is a machine device which automatically executes work, can receive human commands, can run a pre-arranged program, and can perform actions according to principles formulated by artificial intelligence technology.

Nowadays, an accompanying robot mainly using artificial intelligence has become a new favorite in the market as a special application type, and most of the accompanying robots are designed for children, the elderly, or pregnant women.

The existing accompanying robot is mostly a question-answer type chat robot, only carries out question-answer type chat with a user in an accompanying process, and no matter which state the user is in, the feedback information of the accompanying robot is the same, so that the accompanying effect of the accompanying robot on the user is poor, and the practicability of the accompanying robot is reduced.

Disclosure of Invention

In view of the above, the present invention provides a data interaction method, an apparatus and a robot, so as to solve the problems that in the prior art, an accompanying effect of an accompanying robot on a user is poor, and the practicability of the accompanying robot is reduced.

In order to achieve the purpose, the invention adopts the following technical scheme:

a data interaction method, comprising:

acquiring current information of a user;

inputting the current information into a pre-constructed detection model for processing to obtain target emotion information corresponding to the current information;

generating an emotion feedback statement corresponding to the target emotion information according to the target emotion information;

converting the emotion feedback statement into emotion feedback voice by utilizing a voice synthesis technology;

and outputting the emotional feedback voice so as to feed back the user according to the emotion of the user.

Further, in the above method, the current information includes an emoticon;

the detection model comprises a picture emotion detection model;

the step of inputting the current information into a pre-constructed detection model for processing to obtain target emotion information corresponding to the current information includes:

and inputting the expression picture into the picture emotion detection model for processing to obtain picture emotion information corresponding to the expression picture as the target emotion information.

Further, in the method, the construction process of the image emotion detection model includes:

searching a neural network structure by using a neural network searching algorithm to determine a target neural network model;

acquiring a prestored expression picture sample set, wherein the expression picture sample set comprises expression picture samples and sample picture emotion information corresponding to the expression picture samples;

and according to a first preset iteration rule, performing iterative training on the target neural network model by using the expression picture sample and the sample picture emotion information to obtain a well-trained target neural network model as the picture emotion detection model.

Further, in the above method, before the searching the neural network structure by using the neural network search algorithm and determining the target neural network model, the method further includes:

acquiring face image information of a user;

the searching of the neural network structure by using the neural network searching algorithm to determine the target neural network model comprises the following steps:

and searching a neural network structure aiming at the face image information by utilizing the neural network searching algorithm, and determining the target neural network model corresponding to the face image information so as to realize personalized design aiming at a user.

Further, in the above method, the current information further includes voice information;

the detection model further comprises a speech emotion detection model:

the inputting the current information into a pre-constructed detection model for processing to obtain target emotion information corresponding to the current information further comprises:

and inputting the voice information into the voice emotion detection model for processing to obtain voice emotion information corresponding to the voice information as the target emotion information.

Further, in the method, after the obtaining the current information, the method further includes:

converting the voice information into text sentences by utilizing a pre-constructed mixed structure voice recognition model;

inputting the text sentences into a pre-constructed natural language processing model for processing to obtain reply sentences corresponding to the text sentences;

converting the reply sentence into reply voice by utilizing a voice synthesis technology;

and outputting the reply voice to enable the user to acquire the required information.

Further, in the method, the process of constructing the natural language processing model includes:

acquiring a pre-stored dialogue data set;

and according to a second preset iteration rule, carrying out iterative training on the BERT model added with the token-annotation mechanism by using the dialogue data set to obtain a trained and mature BERT model as the natural language processing model.

Further, the method described above further includes:

acquiring feedback information of the user, wherein the feedback information carries an adjustment object identifier;

if the adjustment object identification represents the adjustment of the picture emotion detection model, adjusting the picture emotion detection model according to the feedback information;

if the adjustment object identification represents the adjustment of the natural language processing model, adjusting the natural language processing model according to the feedback information;

and if the adjustment object identification represents the adjustment of the voice emotion detection model, adjusting the voice emotion detection model according to the feedback information.

The invention also provides a data interaction device, comprising:

the acquisition module is used for acquiring the current information of the user;

the processing module is used for inputting the current information into a pre-constructed detection model for processing to obtain target emotion information corresponding to the current information;

the generating module is used for generating an emotion feedback statement corresponding to the target emotion information according to the target emotion information;

the conversion module is used for converting the emotion feedback sentences into emotion feedback voices by utilizing a voice synthesis technology;

and the output module is used for outputting the emotion feedback voice so as to feed back the user according to the emotion of the user.

The present invention also provides a robot comprising: the device comprises a collecting device, a voice player, a processor and a memory;

the acquisition device, the voice player and the memory are all connected with the processor;

the acquisition device is used for acquiring the current information of a user and sending the current information to the processor;

the voice player is used for receiving and playing the voice output by the processor;

the memory is used for storing a computer program, and the computer program is at least used for executing the data interaction method;

the processor is used for calling and executing the computer program in the memory.

According to the data interaction method, the data interaction device and the robot, the current information of a user is obtained firstly; inputting the current information into a pre-constructed detection model for processing to obtain target emotion information corresponding to the current information; generating emotion feedback sentences corresponding to the target emotion information according to the target emotion information; converting the emotion feedback statement into emotion feedback voice by utilizing a voice synthesis technology; and outputting the emotional feedback voice so as to feed back the user according to the emotion of the user. By adopting the technical scheme of the invention, the current emotion of the user can be judged by analyzing the current information of the user, and corresponding voice can be fed back to the user according to the current emotion of the user, for example, the voice playing a comforting role when the user is injured, the voice playing a soothing role when the user is angry, the accompanying effect of the robot on the user is enhanced, and the practicability of the robot is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a first embodiment of a data interaction method of the present invention;

FIG. 2 is a flow chart of a second embodiment of the data interaction method of the present invention;

FIG. 3 is a block diagram of the hybrid-structured speech recognition model of FIG. 2;

FIG. 4 is a schematic structural diagram of a first data interaction device according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a second data interaction device according to the present invention;

fig. 6 is a schematic structural diagram of an embodiment of the robot of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

Fig. 1 is a flowchart of a first embodiment of a data interaction method according to the present invention. As shown in fig. 1, the data interaction method of the present embodiment specifically includes the following steps:

s101, acquiring current information of a user;

in this embodiment, first, current information of a user needs to be acquired, where the current information includes an emoticon of the user or voice information of the user. The robot adopting the data interaction method of the embodiment can actively scan the surrounding environment, take pictures of the facial expressions of the user and collect the voice information of the user.

S102, inputting the current information into a pre-constructed detection model for processing to obtain target emotion information corresponding to the current information;

through the steps, after the current information of the user is acquired, the current information is input into a pre-constructed detection model, and the current information is analyzed and processed through the detection model, so that target emotion information corresponding to the current information is output, wherein the target emotion information comprises happiness, anger, injury and the like.

S103, generating emotion feedback sentences corresponding to the target emotion information according to the target emotion information;

after the target emotion information corresponding to the current information is obtained through the steps, the emotion feedback sentence corresponding to the target emotion information is generated according to the target emotion information. For example, the robot applying the data interaction method of this embodiment preferably targets a user as a pregnant woman, and if the obtained target emotion information is happy, the generated emotion feedback sentence may be "good beauty like smile of queen adult, so that the fetus is well developed and good mood needs to be kept"; if the obtained target emotion information is anger, an emotion feedback sentence capable of soothing is required, and if the obtained target emotion information is hurry, an emotion feedback sentence capable of soothing is required. The emotion feedback sentences can be set in advance or the model can be trained in advance, and the function of generating different emotion feedback sentences aiming at different types of users and different target emotion information is achieved.

S104, converting the emotion feedback sentences into emotion feedback voices by utilizing a voice synthesis technology;

after the emotion feedback sentences are generated through the steps, the emotion feedback sentences are converted into emotion feedback voice by utilizing a voice synthesis technology. Among them, Text-To-Speech (TTS) is a process of converting Text into Speech. The speech synthesis technique employed in the present embodiment is an open source tool with hundreds of degrees.

And S105, outputting the emotion feedback voice.

Through the steps, the emotion feedback voice is obtained and output so that the user can hear the emotion feedback voice, and therefore the effects of encouraging, comforting or soothing are achieved.

The data interaction method of the embodiment includes the steps of firstly, obtaining current information of a user; inputting the current information into a pre-constructed detection model for processing to obtain target emotion information corresponding to the current information; generating emotion feedback sentences corresponding to the target emotion information according to the target emotion information; converting the emotion feedback statement into emotion feedback voice by utilizing a voice synthesis technology; and outputting the emotional feedback voice so as to feed back the user according to the emotion of the user. By adopting the technical scheme of the embodiment, the current emotion of the user can be judged by analyzing the current information of the user, and meanwhile, corresponding voice can be fed back to the user according to the current emotion of the user, for example, the method plays a comforting role when the user is injured, plays a soothing role when the user is angry, enhances the accompanying effect of the robot applying the data interaction method of the embodiment, and improves the practicability of the robot.

Fig. 2 is a flowchart of a second embodiment of the data interaction method of the present invention. As shown in fig. 2, the data interaction method of this embodiment is further described in more detail on the basis of the embodiment shown in fig. 1.

As shown in fig. 2, the data interaction method of this embodiment may specifically include the following steps:

s201, obtaining an expression picture of a user;

the execution process of this step is the same as the execution process of S101 shown in fig. 1, and is not described here again.

In addition, in this embodiment, after the expression picture of the user is acquired, the expression picture of the user can be uploaded to the cloud, and the user can check the expression picture of the user through the cloud at any time.

S202, inputting the expression picture into a picture emotion detection model which is constructed in advance for processing to obtain picture emotion information corresponding to the expression picture as target emotion information;

through the steps, after the expression picture of the user is obtained, the expression picture is input into a picture emotion detection model in a pre-constructed detection model, and the expression picture is analyzed and processed through the picture emotion detection model, so that picture emotion information corresponding to the expression picture is obtained and serves as target emotion information. The construction process of the picture emotion detection model comprises the following steps:

firstly, searching a neural network structure by using a neural network searching algorithm, and determining a target neural network model. The neural network searching algorithm is preferably an effective neural network structure searching algorithm (ENAS), and the ENAS algorithm and a reinforcement learning algorithm are adopted to construct a neural network structure of the convolution kernel circulation neural network, so that a target neural network model is determined. In addition, if the neural network searching algorithm is used for searching the neural network structure and before the target neural network model is determined, the facial image information of the user is obtained, then the neural network searching algorithm is used for searching the neural network structure aiming at the facial image information of the user, and the target neural network model corresponding to the facial image information is determined, so that the personalized design of the user can be realized, namely when the expression picture received by the generated picture emotion detection model is not matched with the user in the facial image information, the corresponding picture emotion information is not generated, and emotion feedback voice can not be provided for the user not matched with the facial image information. Therefore, personalized design can be realized, and the face image information acquisition system is specially used for user service for acquiring face image information. If the face image information of the user is not acquired before the neural network structure search is carried out by utilizing the neural network search algorithm and the target neural network model is determined, the face image information is not personalized design and can be used by all people, and emotion feedback voice can be provided for all people providing expression pictures.

Secondly, a prestored expression picture sample set is obtained, wherein the expression picture sample set comprises expression picture samples and sample picture emotion information corresponding to the expression picture samples. The pre-stored expression picture sample set may include a data set published on the internet, in this embodiment, pictures and expression packages crawled from the internet according to expression labels are further included, and in cooperation with related technology companies, a sufficient amount of expression pictures of a specific user are obtained on the premise of user agreement, and if a robot adopting the data interaction method of this embodiment is specially used for providing services for a pregnant woman, the specific user here may be the pregnant woman.

And finally, according to a first preset iteration rule, performing iterative training on the target neural network model by using the expression picture sample and the sample picture emotion information to obtain a well-trained target neural network model as a picture emotion detection model. The first preset iteration rule in this embodiment may be a set iteration number, and the training is stopped when the iteration training number reaches the set iteration number, or may be a rule such as iteration training until the detection accuracy rate is converged.

S203, acquiring voice information of a user;

S204, inputting the voice information into a pre-constructed voice emotion detection model for processing to obtain voice emotion information corresponding to the voice information as target emotion information;

through the steps, after the voice information of the user is obtained, the voice information is input into the voice emotion detection model in the pre-constructed detection model to be analyzed, so that the voice emotion information corresponding to the voice information is obtained, and the voice emotion information is used as the target emotion information. The construction process of the voice emotion detection model comprises the following steps:

firstly, a pre-stored voice information sample set is obtained, wherein the voice information sample set comprises voice samples and sample voice emotion information corresponding to the voice samples. Wherein, the voice sample set can be obtained from the network, and the sample voice emotion information is emotion mark carried out on each voice sample.

And then, carrying out iterative training on the pre-constructed pre-trained neural network model by using the voice sample and the sample voice emotion information corresponding to the voice sample to obtain a trained and mature neural network model as a voice emotion detection model. The training rules in the iterative training process may be set iteration times, the training is stopped when the iteration training times reach the set iteration times, and the rules may also be the rules of iterative training until the detection accuracy rate is converged and the like.

S205, generating emotion feedback sentences corresponding to the target emotion information according to the target emotion information;

the execution process of this step is the same as the execution process of S103 shown in fig. 1, and is not described here again.

S206, converting the emotion feedback sentence into emotion feedback voice by utilizing a voice synthesis technology;

the execution process of this step is the same as the execution process of S104 shown in fig. 1, and is not described here again.

S207, outputting emotion feedback voice;

the execution process of this step is the same as the execution process of S105 shown in fig. 1, and is not described here again.

S208, converting the voice information into a text sentence by using a pre-constructed mixed structure voice recognition model;

after the voice information of the user is acquired in step S203, the voice information may be converted into text sentences by using a pre-established mixed structure voice recognition model. Fig. 3 is a structural diagram of the mixed structure speech recognition model in fig. 2, wherein the mixed structure speech recognition model is an end-to-end speech recognition model, and the specific structure of the mixed structure speech recognition model is as shown in fig. 3, first, an Input layer is provided, a convolutional layer is provided behind the Input layer, the convolutional layer has two layers of Conv3 × 3, one layer of Max firing, two layers of Conv3 × 3 and one layer of Max firing, an Encoder structure of a BiLSTM layer is provided behind the convolutional layer, and then a Decoder structure based on Attention and a Decoder structure based on CTC are respectively connected. A full connection layer FC is connected between the Encoder structure of the BiLSTM layer and the Decoder structure based on the CTC, and the full connection layer FC and a Softmax Loss function Softmax-Loss are connected behind the Decoder structure based on the Attention; the Decoder structure based on the CTC also comprises a CTC Loss function CTC-Loss; and finally, respectively connecting the Softmax-Loss and the CTC-Loss with a Total Loss function, so that the gradient is calculated by using the Total Loss function, and the accuracy of the model can be continuously improved by continuously optimizing the model.

In this embodiment, a mixed-structure speech recognition model is adopted, which includes both a CTC decoding model and an attention decoding model, so that during the process of converting speech information into text statements, not only can the connection between the previous and subsequent speech features during the decoding process be ensured, but also the monotone time sequence of speech can be ensured.

S209, inputting the text sentences into a pre-constructed natural language processing model for processing to obtain reply sentences corresponding to the text sentences;

through the steps, after the text sentence corresponding to the converted voice information is obtained, the text sentence is input into a natural language processing model which is constructed in advance, and the reply sentence corresponding to the text sentence is obtained by utilizing the analysis processing of the natural language processing model. Wherein, a sentence of words and sentences can correspond to a plurality of sentences of reply sentences, a sentence of reply sentences can be randomly selected, and proper reply sentences can also be selected according to the voice emotion information corresponding to the words and sentences. The construction process of the natural language processing model comprises the following steps:

first, a pre-stored dialogue data set is obtained, where the dialogue data set may include a dialogue data set published on the internet and some recorded dialogue data sets, where the dialogue data set may be collected according to a user type for which the robot applying the data interaction method of the present embodiment is intended, for example, a user for which the robot is mainly intended is a pregnant woman, and then a question and answer dialogue of some pregnant women may be collected as the dialogue data set.

And then, according to a second preset iteration rule, carrying out iterative training on the BERT model added with the token-annotation mechanism by using a dialogue data set to obtain a trained mature BERT model as the natural language processing model. The BERT model is a pre-training model, and a token-annotation mechanism is added in the BERT model, so that the model can focus more on certain key words in the text sentences, the generalization capability of the model is improved, and the feedback quality is improved. the token-annotation mechanism labels each token in the sentence by BERT after fine-tune, defining whether they are important or not. Then adding an auxiliary task to BERT, mainly a binary task, to judge whether a word is important, and simultaneously using logits obtained from the auxiliary task as attribute to assist the main task to better operate. The second preset iteration rule in this embodiment may be a rule such as a set iteration number, stopping training when the iteration training number reaches the set iteration number, or performing iteration training until the detection accuracy rate is converged.

S210, converting the reply sentence into reply voice by using a voice synthesis technology;

through the above steps, after the reply sentence corresponding to the text sentence is obtained, the reply sentence is converted into the reply voice by using the voice synthesis technology, which has been already described in the above embodiments and is not described herein again.

S211, outputting reply voice;

through the steps, after the reply voice corresponding to the reply sentence is obtained, the reply voice is output to complete the dialogue with the user.

S212, obtaining feedback information of the user carrying the adjustment object identifier;

through the above steps, after the emotion feedback voice or the reply voice is output, the user can make feedback for the emotion feedback voice or the reply voice, so that feedback information of the user needs to be acquired, wherein the feedback information carries the adjustment object identifier, that is, if the user makes feedback for the reply voice, the adjustment object identifier represents the adjustment of the natural language processing model; if the user is the feedback made aiming at the emotion feedback voice, and the emotion feedback voice is generated according to the target emotion information detected by the expression picture, the adjustment object identification represents the adjustment of the picture emotion detection model; if the user is feedback made for emotion feedback voice generated according to target emotion information detected by the voice information, the adjustment object identification represents voice emotion detection model adjustment.

S213, if the adjustment object identification represents the adjustment of the picture emotion detection model, adjusting the picture emotion detection model according to the feedback information;

through the steps, if the adjustment object identification carried by the acquired feedback information represents the adjustment of the picture emotion detection model, the picture emotion detection model is adjusted according to the feedback information. If the feedback provided by the user is positive feedback, the picture emotion detection model is correctly detected, the expression picture and the target emotion information corresponding to the expression picture are used as training data sets, training data are added, the picture emotion detection model is retrained, and the generalization capability of the model is improved; if the feedback provided by the user is negative feedback, the picture emotion detection model is detected wrongly, and the picture emotion detection model continues to be trained or adjusted.

S214, if the adjustment object identification represents the adjustment of the natural language processing model, adjusting the natural language processing model according to the feedback information;

through the steps, if the obtained adjustment object identification carried by the feedback information represents the adjustment of the natural language processing model, the natural language processing model is adjusted according to the feedback information. If the feedback provided by the user is positive feedback, the detection of the natural language processing model is correct, the voice information and the reply sentence corresponding to the voice information are both used as a training data set, the training data is added, the natural language processing model is retrained, and the generalization capability of the model is improved; and if the feedback provided by the user is negative feedback, the detection error of the natural language processing model is proved, and the natural language processing model continues to be trained or adjusted.

S215, if the adjustment object identification represents the adjustment of the voice emotion detection model, the voice emotion detection model is adjusted according to the feedback information.

Through the steps, if the obtained adjustment object identification carried by the feedback information represents the adjustment of the voice emotion detection model, the voice emotion detection model is adjusted according to the feedback information. If the feedback provided by the user is positive feedback, the voice emotion detection model is correctly detected, the voice information and the target emotion information corresponding to the voice information are both used as training data sets, training data are added, the voice emotion detection model is retrained, and the generalization capability of the model is improved; and if the feedback provided by the user is negative feedback, the detection of the voice emotion detection model is wrong, and the training is continued or the voice emotion detection model is adjusted.

In this embodiment, the order of executing steps S213, S214, and S215 is not limited, and step S213, step S214, and step S215 may be executed first.

The types of the users for the embodiment can include the old, children, pregnant women and the like, and the users for the embodiment are preferably the pregnant women, so that the functions of voice interaction, emotion monitoring and the like for the pregnant women can be realized, and the pregnant women can be helped to worry about smoothly passing the gestational period.

According to the data interaction method, a pre-constructed picture emotion detection model or voice emotion detection model can be used for detecting according to the obtained expression picture or voice information of the user to obtain corresponding target emotion information, and emotion feedback sentences corresponding to the target emotion information are regenerated; converting the emotion feedback statement into emotion feedback voice by utilizing a voice synthesis technology; and outputting the emotional feedback voice so as to feed back the user according to the emotion of the user. Therefore, the current emotion of the user can be judged by analyzing the expression pictures or the voice information of the user, and corresponding voice can be fed back to the user according to the current emotion of the user, for example, the effect of comforting the user when the user is injured is achieved, and the effect of soothing the user when the user is angry is achieved. And moreover, the picture emotion detection model can be constructed for the user when constructed, and personalized design is realized. In addition, the user can also realize man-machine conversation through the embodiment, solve questions and chat for the user, enhance the accompanying effect of the robot applying the data interaction method of the embodiment, and improve the practicability of the robot.

In order to be more comprehensive, the application also provides a data interaction device corresponding to the data interaction method provided by the embodiment of the invention.

Fig. 4 is a schematic structural diagram of a first data interaction device according to an embodiment of the present invention. As shown in fig. 4, the data interaction apparatus of the present embodiment includes an obtaining module 11, a processing module 12, a generating module 13, a transforming module 14, and an outputting module 15.

An obtaining module 11, configured to obtain current information of a user;

the processing module 12 is configured to input the current information into a pre-constructed detection model for processing, so as to obtain target emotion information corresponding to the current information;

the generating module 13 is configured to generate an emotion feedback statement corresponding to the target emotion information according to the target emotion information;

a conversion module 14, configured to convert the emotion feedback statement into emotion feedback voice by using a voice synthesis technology;

and the output module 15 is used for outputting the emotion feedback voice so as to feed back the user according to the emotion of the user.

In the data interaction device of this embodiment, first, the obtaining module 11 obtains the current information of the user; the processing module 12 inputs the current information into a pre-constructed detection model for processing to obtain target emotion information corresponding to the current information; the generating module 13 generates an emotion feedback statement corresponding to the target emotion information according to the target emotion information; the conversion module 14 converts the emotion feedback sentence into emotion feedback voice by using a voice synthesis technology; the output module 15 outputs the emotion feedback voice to feed back the user according to the emotion of the user. By adopting the technical scheme of the embodiment, the current emotion of the user can be judged by analyzing the current information of the user, and meanwhile, corresponding voice can be fed back to the user according to the current emotion of the user, for example, the voice feedback device plays a comforting role when the user is worried, plays a soothing role when the user is angry, enhances the accompanying effect of the robot applying the data interaction device of the embodiment, and improves the practicability of the robot.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 5 is a schematic structural diagram of a second data interaction device according to the present invention. As shown in fig. 5, in the data interaction apparatus of this embodiment, on the basis of the embodiment shown in fig. 4, the obtaining module 11 includes a first obtaining unit 111 and a second obtaining unit 112, and the processing module 12 includes a first processing unit 121 and a second processing unit 122.

A first obtaining unit 111, configured to obtain an expression picture of a user;

the first processing unit 121 is configured to input the expression picture into the picture emotion detection model for processing, and obtain picture emotion information corresponding to the expression picture as target emotion information.

The first processing unit 121 is further configured to construct a picture emotion detection model, where the construction process includes:

and according to a first preset iteration rule, performing iterative training on the target neural network model by using the expression picture sample and the sample picture emotion information to obtain a well-trained target neural network model serving as a picture emotion detection model.

In addition, the first processing unit 121 is further configured to obtain facial image information of the user before performing a neural network structure search using a neural network search algorithm to determine a target neural network model.

Correspondingly, searching a neural network structure by using a neural network searching algorithm to determine a target neural network model, and the method comprises the following steps: and searching a neural network structure aiming at the face image information by using a neural network searching algorithm, and determining a target neural network model corresponding to the face image information so as to realize personalized design aiming at the user.

A second acquiring unit 112, configured to acquire voice information of a user;

and the second processing unit 122 is configured to input the voice information into the voice emotion detection model for processing, and obtain voice emotion information corresponding to the voice information as target emotion information.

Further, in the data interaction apparatus of the present embodiment, the processing module 12 further includes a third processing unit 123;

the conversion module 14 is further configured to convert the voice information into text statements by using a pre-constructed mixed structure voice recognition model;

a third processing unit 123, configured to input the text statement into a natural language processing model that is constructed in advance for processing, so as to obtain a reply statement corresponding to the text statement;

the third processing unit 123 is further configured to construct a natural language processing model, where the construction process includes: acquiring a pre-stored dialogue data set; and according to a second preset iteration rule, carrying out iterative training on the BERT model added with the token-entry mechanism by using the dialogue data set to obtain a trained and mature BERT model serving as a natural language processing model.

The conversion module 14 is further configured to convert the reply sentence into a reply voice by using a voice synthesis technology;

the output module 15 is further configured to output a reply voice to enable the user to obtain the required information.

Further, the data interaction apparatus of the present embodiment further includes an adjusting module 16, and the obtaining module 11 further includes a third obtaining unit 113;

a third obtaining unit 113, configured to obtain feedback information of the user, where the feedback information carries an adjustment object identifier;

the adjusting module 16 is configured to adjust the picture emotion detection model according to the feedback information if the adjustment object identifier indicates that the picture emotion detection model is adjusted; if the adjustment object identification represents the adjustment of the natural language processing model, adjusting the natural language processing model according to the feedback information; and if the adjustment object identification represents the adjustment of the voice emotion detection model, adjusting the voice emotion detection model according to the feedback information.

In the data interaction device of this embodiment, the first obtaining unit 111 or the second obtaining unit 112 may perform detection by using a pre-constructed picture emotion detection model or a pre-constructed voice emotion detection model according to the obtained expression picture or voice information of the user, so as to obtain corresponding target emotion information, and the generating module 13 regenerates an emotion feedback sentence corresponding to the target emotion information; the conversion module 14 converts the emotion feedback sentence into emotion feedback voice by using a voice synthesis technology; the output module 15 outputs the emotion feedback voice to feed back the user according to the emotion of the user. Therefore, the current emotion of the user can be judged by analyzing the expression pictures or the voice information of the user, and corresponding voice can be fed back to the user according to the current emotion of the user, for example, the effect of comforting the user when the user is injured is achieved, and the effect of soothing the user when the user is angry is achieved. And moreover, the picture emotion detection model can be constructed for the user when constructed, and personalized design is realized. In addition, the user can also realize man-machine conversation through the embodiment, solve questions and chat for the user, enhance the accompanying effect of the robot applying the data interaction device of the embodiment, and improve the practicability of the robot.

Fig. 6 is a schematic structural diagram of an embodiment of the robot of the present invention. As shown in fig. 6, the robot of the present embodiment includes a pickup device 21, a processor 22, a voice player 23, and a memory 24. Wherein, the acquisition device 21, the voice player 23 and the memory 24 are all connected with the processor 22;

the acquisition device 21 is used for acquiring the current information of the user and sending the current information to the processor 22;

a voice player 23 for receiving and playing the voice output by the processor 22;

a memory 24 for storing a computer program for performing at least the intelligent data interaction method of the above embodiments;

a processor 22 for invoking and executing the computer program in the memory 24.

The robot of the embodiment acquires the current information of the user through the acquisition device 21, sends the current information to the processor 22, and the processor 23 invokes and executes the data interaction method stored in the memory 24 in the embodiment, generates target emotion information corresponding to the current information of the user according to the current information of the user, and generates emotion feedback voice according to the target emotion information; the voice player 23 receives and plays the voice output from the processor 22. Therefore, the current emotion of the user can be judged by analyzing the current information of the user, and corresponding voice can be fed back to the user according to the current emotion of the user, so that the method plays a comforting role when the user is worried, plays a soothing role when the user is angry, enhances the accompanying effect of the robot applying the data interaction method of the embodiment, and improves the practicability of the robot.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method for data interaction, comprising:

acquiring current information of a user;

2. The method of claim 1, wherein the current information comprises an emoticon;

the detection model comprises a picture emotion detection model;

3. The method of claim 2, wherein the construction process of the picture emotion detection model comprises:

4. The method of claim 3, wherein before determining the target neural network model by performing the neural network structure search using the neural network search algorithm, further comprising:

acquiring face image information of a user;

5. The method of claim 2, wherein the current information further comprises voice information;

the detection model further comprises a speech emotion detection model:

6. The method of claim 5, wherein after obtaining the current information, further comprising:

7. The method of claim 6, wherein the natural language processing model building process comprises:

acquiring a pre-stored dialogue data set;

8. The method of claim 6, further comprising:

9. A data interaction device, comprising:

10. A robot, comprising: the device comprises a collecting device, a voice player, a processor and a memory;

the memory is used for storing a computer program, and the computer program is at least used for executing the data interaction method of any one of claims 1-8;