CN106658129B

CN106658129B - Terminal control method and device based on emotion and terminal

Info

Publication number: CN106658129B
Application number: CN201611226550.0A
Authority: CN
Inventors: 杨鹏; 何嘉; 朱频频
Original assignee: Shanghai Xiaoi Robot Technology Co Ltd
Current assignee: Shanghai Xiaoi Robot Technology Co Ltd
Priority date: 2016-12-27
Filing date: 2016-12-27
Publication date: 2020-09-01
Anticipated expiration: 2036-12-27
Also published as: CN106658129A

Abstract

A terminal control method, a device and a terminal based on emotion are provided, wherein the terminal control method based on emotion comprises the following steps: carrying out voice recognition processing on the received voice data to obtain a voice recognition result; determining a first emotion result based on the voice recognition result and a preset emotion word bank; and determining the emotion of the user at least according to the first emotion result, and pushing the playing content related to the emotion of the user. The technical scheme of the invention realizes the analysis of the user intention and accurately recommends programs.

Description

Terminal control method and device based on emotion and terminal

Technical Field

The invention relates to the technical field of intelligent control, in particular to a terminal control method, device and terminal based on emotion.

Background

With the development of the technology, the existing intelligent terminal can be controlled in various ways, for example, the intelligent television can be controlled by gestures, voice and the like.

In the prior art, gesture recognition means that a user makes a specified gesture action facing a television image acquisition device to trigger related operations to control a television. The voice control means receiving sound through a television box or a remote controller microphone, understanding the accurate purpose of user instructions through voice recognition, and sending a control signal to a television.

However, gesture and voice control are based on user operations to control the terminal. When a user selects a program, how to analyze the user intention and accurately recommend the program is a problem to be solved urgently.

Disclosure of Invention

The invention solves the technical problem of how to analyze the user intention and accurately recommend programs.

In order to solve the above technical problem, an embodiment of the present invention provides a terminal control method based on emotion, where the terminal control method based on emotion includes:

carrying out voice recognition processing on the received voice data to obtain a voice recognition result; determining a first emotion result based on the voice recognition result and a preset emotion word bank; and determining the emotion of the user at least according to the first emotion result, and pushing the playing content related to the emotion of the user.

Optionally, the determining the first emotion result based on the voice recognition result and the preset emotion word bank includes:

extracting keywords from the voice recognition result to obtain keywords; matching the keywords with emotion score word banks of all emotion categories in the preset emotion word bank to determine emotion scores of the voice recognition results for all emotion categories according to matching results, wherein the higher the matching degree of the keywords with the emotion score word banks is, the higher the emotion scores are; and determining the emotion category with the highest emotion score as the first emotion result.

Optionally, the pushing the playing content associated with the user emotion includes: and pushing the playing content according to the emotion type corresponding to the first emotion result, wherein the playing content corresponds to the emotion type one to one.

Optionally, the terminal control method further includes: face recognition is performed on the received face image to determine a second emotional result.

Optionally, the performing facial recognition on the received facial image includes: matching the facial image with a preset image to determine emotion scores of the facial image for all emotion categories; determining the mood category with the highest mood score as the second mood result.

Optionally, the pushing the playing content associated with the user emotion includes: and pushing corresponding playing content according to the first emotion result and the second emotion result.

Optionally, the pushing the playing content associated with the user emotion includes: when the first emotion result is an emotion, pushing the playing content according to the emotion category corresponding to the first emotion result; and when the first emotion result is not emotion, pushing the playing content according to the emotion category corresponding to the second emotion result.

Optionally, the pushing the playing content associated with the user emotion includes: when the emotion types corresponding to the first emotion result and the second emotion result are consistent, pushing the playing content according to the emotion types; when the emotion categories corresponding to the first emotion result and the second emotion result are not consistent, pushing the playing content by the emotion category with the highest emotion score; and when the first emotion result is not emotion, pushing the playing content by the emotion category corresponding to the second emotion result.

Optionally, the playing content is pushed in one or more of the following ways: and sending a dialog box to prompt whether the playing content is played or not, and sending a voice prompt whether the playing content is played or not.

Optionally, the terminal control method further includes: matching the voice recognition result with a preset instruction library to obtain a first control instruction; and controlling the terminal to execute corresponding operation according to the first control instruction, wherein the corresponding operation does not comprise pushing and playing content.

Optionally, the terminal control method further includes: matching the received gesture action data with a preset action library to obtain a second control instruction; and controlling the terminal to execute corresponding operation according to the second control instruction, wherein the corresponding operation does not comprise pushing and playing content.

In order to solve the above technical problem, an embodiment of the present invention further provides a terminal control device based on emotion, where the terminal control device based on emotion includes:

the voice recognition unit is used for carrying out voice recognition processing on the received voice data to obtain a voice recognition result; the first emotion result determining unit is used for determining a first emotion result based on the voice recognition result and a preset emotion word bank; and the pushing unit is used for determining the emotion of the user at least according to the first emotion result and pushing the playing content related to the emotion of the user.

Optionally, the first emotion result determining unit includes:

a keyword extraction subunit, configured to perform keyword extraction processing on the voice recognition result to obtain a keyword; the first matching subunit is used for matching the keywords with the emotion score word banks of all emotion categories in the preset emotion word bank so as to determine emotion scores of the voice recognition result for all emotion categories according to the matching result, wherein the higher the matching degree of the keywords with the emotion score word banks is, the higher the emotion scores are, and the preset emotion word bank is used for storing the emotion word banks corresponding to the emotion categories; and the first determining subunit is used for determining the emotion category with the highest emotion score as the first emotion result.

Optionally, the pushing unit pushes the playing content according to the emotion type corresponding to the first emotion result, and the playing content corresponds to the emotion type one to one.

Optionally, the terminal control device further includes: a face recognition unit for performing face recognition on the received face image to determine a second emotion result.

Optionally, the face recognition unit includes: the second matching subunit is used for matching the face image with a preset image so as to determine the emotion score of the face image for each emotion category; a second determining subunit, configured to determine the emotion category with the highest emotion score as the second emotion result.

Optionally, the pushing unit pushes corresponding playing content according to the first emotion result and the second emotion result.

Optionally, the pushing unit includes: the first pushing subunit is used for pushing the playing content according to the emotion category corresponding to the first emotion result when the first emotion result is emotional; and the second pushing subunit is used for pushing the playing content according to the emotion category corresponding to the second emotion result when the first emotion result is not emotion.

Optionally, the pushing unit includes: the third pushing subunit is used for pushing the playing content according to the emotion type when the emotion types corresponding to the first emotion result and the second emotion result are consistent; a fourth pushing subunit, configured to, when the emotion categories corresponding to the first emotion result and the second emotion result are not consistent, push the playing content in the emotion category with the highest emotion score; and the fifth pushing subunit is configured to, when the first emotion result is not emotion, push the playing content according to the emotion category corresponding to the second emotion result.

Optionally, the pushing unit pushes the playing content in one or more of the following manners: and sending a dialog box to prompt whether the playing content is played or not, and sending a voice prompt whether the playing content is played or not.

Optionally, the terminal control device further includes: the first matching unit is used for obtaining a first control instruction based on the matching of the voice recognition result and a preset instruction library, wherein the preset instruction library is used for storing the first control instruction, and the first control instruction comprises at least one word; and the first control unit is used for controlling the terminal to execute corresponding operation according to the first control instruction, wherein the corresponding operation does not comprise push playing content.

Optionally, the terminal control device further includes: the second matching unit is used for matching the received gesture action data with a preset action library to obtain a second control instruction, and the preset action library is used for storing the second control instruction corresponding to the gesture action; and the second control unit is used for controlling the terminal to execute corresponding operation according to the second control instruction, wherein the corresponding operation does not comprise pushing and playing content.

In order to solve the technical problem, an embodiment of the present invention further provides a terminal, where the terminal includes the emotion-based terminal control apparatus.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

the technical scheme of the invention carries out voice recognition processing on the received voice data to obtain a voice recognition result; determining a first emotion result based on the voice recognition result and a preset emotion word bank; and determining the emotion of the user at least according to the first emotion result, and pushing the playing content related to the emotion of the user. According to the technical scheme, the voice data of the user are identified and matched with the preset emotion word bank, so that a first emotion result corresponding to the voice data can be determined, and further the emotion of the user when the user sends the voice data can be determined; then, playing content related to the emotion of the user is pushed for the user according to the emotion of the user, so that the playing content of the terminal is matched with the emotion of the user, and the technical effect that the terminal automatically pushes programs according to the emotion of the user is achieved; that is to say, the user emotion is judged to determine that the user intention is the playing content which needs to be adapted to the user emotion, so that the terminal is more intelligent and humanized, and the user experience is improved.

Further, extracting keywords from the voice recognition result to obtain keywords; matching the keywords with emotion score word banks of all emotion categories in the preset emotion word bank to determine emotion scores of the voice recognition results for all emotion categories according to matching results, wherein the higher the matching degree of the keywords with the emotion score word banks is, the higher the emotion scores are; and determining the emotion category with the highest emotion score as the first emotion result. According to the technical scheme, keyword extraction processing is carried out on a voice recognition result of a user, and the emotion category with the highest emotion score is determined as the emotion of the user according to comparison between the keyword and words in a preset emotion word bank; namely, the emotion of the user is judged from the semantic level of the voice data, and the pushing accuracy of the playing content is guaranteed.

Further, performing facial recognition on the received facial image to determine a second emotion result; and pushing corresponding playing content according to the first emotion result and the second emotion result. According to the technical scheme, the emotion judged in the user voice data is combined with the emotion judged by the face of the user, the current emotion of the user is comprehensively judged, and the emotion judgment accuracy of the user is improved, so that the pushing accuracy of the played content is improved, and the intelligent degree and the humanization degree of the terminal are further improved.

Drawings

Fig. 1 is a flowchart of a terminal control method based on emotion according to an embodiment of the present invention;

FIG. 2 is a flow diagram of one implementation of step S102 shown in FIG. 1;

fig. 3 is a flowchart of another emotion-based terminal control method according to an embodiment of the present invention;

FIG. 4 is a flowchart of one implementation of step S305 shown in FIG. 3;

FIG. 5 is a flow chart of another implementation of step S305 shown in FIG. 3;

fig. 6 is a schematic structural diagram of a terminal control device based on emotion according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of the first emotion result determination unit 602 shown in fig. 6;

fig. 8 is a schematic structural diagram of another emotion-based terminal control device according to an embodiment of the present invention.

Detailed Description

As described in the background, the related art gesture and voice control is to control a terminal based on a user operation. When a user selects a program, how to analyze the user intention and accurately recommend the program is a problem to be solved urgently.

The inventor creatively provides that the emotion of the user during speaking is judged at least based on the voice data of the user, and then the emotion is utilized to push programs for the user, so that the analysis of the intention of the user and the accuracy of the analysis are realized.

According to the embodiment of the invention, the voice data of the user is identified and matched with the preset emotion word bank, so that a first emotion result corresponding to the voice data can be determined, and further the emotion of the user when the user sends the voice data can be determined; then, playing content related to the emotion of the user is pushed for the user according to the emotion of the user, so that the playing content of the terminal is matched with the emotion of the user, and the technical effect that the terminal automatically pushes programs according to the emotion of the user is achieved; that is to say, the user emotion is judged to determine that the user intention is the playing content which needs to be adapted to the user emotion, so that the terminal is more intelligent and humanized, and the user experience is improved.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Fig. 1 is a flowchart of a terminal control method based on emotion according to an embodiment of the present invention.

The emotion-based terminal control method shown in fig. 1 may include the steps of:

step S101: carrying out voice recognition processing on the received voice data to obtain a voice recognition result;

step S102: determining a first emotion result based on the voice recognition result and a preset emotion word bank;

step S103: and determining the emotion of the user at least according to the first emotion result, and pushing the playing content related to the emotion of the user.

In a specific implementation, in step S101, the voice data in the voice format may be converted into the voice recognition result in the text format, so as to perform the subsequent processing. It should be understood by those skilled in the art that the speech recognition processing of the speech data may be implemented in any practicable manner, and the embodiment of the present invention is not limited thereto.

In a specific implementation, the preset emotion lexicon can be configured in advance. The preset emotion word bank can store words corresponding to various emotions. In step S102, the speech recognition result is matched with words in a preset emotion word bank, and an emotion corresponding to the matched word is used as a first emotion result. That is, the first emotional result may characterize an emotion of the user carried by the voice data.

Further in step S103, a user emotion is determined at least according to the first emotion result. That is, the emotion of the user borne by the voice data may be determined as the emotion of the user, or the emotion of the user borne by the voice data may be combined with the emotion of the user determined in any other implementable manner to determine the emotion of the user. After the user emotion is determined, playing content associated with the user emotion can be pushed according to the user emotion. For example, when the user is happy, the pushed playing content may be comedy, love; when the emotion of the user is too out, the pushed playing content can be comedy and cure system programs; when the emotion of the user is fear, the pushed playing content can be comedy, comprehensive programs and the like.

Preferably, in step S103, a dialog box may be sent to prompt whether to play the playing content, or a voice prompt may be sent to prompt whether to play the playing content. Specifically, the user may determine whether to play the playback content by performing a selection operation on the terminal. For example, selection may be by way of gesture recognition; the selection can also be made by means of voice control.

Preferably, the emotion-based terminal control method shown in fig. 1 may further include: matching the voice recognition result with a preset instruction library to obtain a first control instruction; and controlling the terminal to execute corresponding operation according to the first control instruction, wherein the corresponding operation does not comprise pushing and playing content. In this embodiment, since the voice data of the user may further include a control instruction for the terminal, the voice recognition result may be further processed. Specifically, the voice recognition result is matched with a preset instruction library, and after a first control instruction is obtained through matching, the terminal is controlled according to the first control instruction, and the terminal can respond to the first control instruction to execute corresponding operation. Such as switching channels, adjusting volume, etc.

It is understood that the predetermined instruction library may be preconfigured; and the terminal executes corresponding operation limited to the operation pointed by the first control instruction in the preset instruction library.

Preferably, the emotion-based terminal control method shown in fig. 1 may include: matching the received gesture action data with a preset action library to obtain a second control instruction; and controlling the terminal to execute corresponding operation according to the second control instruction, wherein the corresponding operation does not comprise pushing and playing content. In this embodiment, the gesture of the user may also be detected, and after the gesture is detected, the gesture is further processed to determine whether to further control the terminal. Specifically, the gesture action is matched with a preset action library, and when the gesture action is matched with the action in the preset action library, a second control instruction corresponding to the action is determined; and controlling the terminal according to the second control instruction, wherein the terminal can respond to the second control instruction to execute corresponding operation. Such as switching channels, adjusting volume, etc.

It is to be understood that the preset action library may be preconfigured; and the terminal executes corresponding operation limited to the operation pointed by the second control instruction corresponding to the action in the preset action library.

Preferably, reference may be made to fig. 2 for one implementation of step S102, and fig. 2 is a flowchart for one specific implementation of step S102 shown in fig. 1.

Step S102 may include the steps of:

step S201: extracting keywords from the voice recognition result to obtain keywords;

step S202: matching the keywords with emotion score word banks of all emotion categories in the preset emotion word bank so as to determine emotion scores of the voice recognition results for all emotion categories according to matching results;

step S203: and determining the emotion category with the highest emotion score as the first emotion result.

In a specific implementation, in step S201, a keyword of the speech recognition result is obtained by performing keyword extraction on the speech recognition result. In particular, the keywords of the speech recognition result may be used to characterize the semantics of the speech data, and then the emotion of the user carried by the speech data may be determined according to the keywords.

In specific implementation, the preset emotion word library may store a plurality of emotion categories and emotion score word libraries corresponding to the emotion categories. In particular, each mood category may represent a mood; each mood category may have multiple mood scores, with higher mood scores indicating closer to the mood category. The emotion score word bank of each emotion category may store words corresponding to a plurality of emotion scores for the emotion category. Then in step S202, the keywords of the speech recognition result may be matched with words in the emotion score word bank of each emotion category, and the emotion score of the speech recognition result for each emotion category may be determined according to the matched words. The higher the matching degree of the keywords and the emotion score word bank is, the higher the emotion score is.

Specifically, the speech recognition result may be matched with an emotion score lexicon corresponding to all emotion categories in the preset emotion lexicon, and an emotion score of the speech recognition result for all emotion categories is obtained. For example, the emotion score determined by the speech recognition result is: "difficult" is very, six points "dislike", two points "surprise", "angry" is one point, "happy" is zero point.

In a specific implementation, after the emotion scores of the speech recognition results for the respective emotion categories are determined, since a higher emotion score indicates that the keyword is closer to the emotion category corresponding to the emotion score, in step S203, the emotion category with the highest emotion score is determined as the first emotion result.

For example, a user inputs "tremble me is just looking for department comedy tablets", keywords "tremble" and "comedy tablets" can be obtained; after matching with words in the emotion score word bank of each emotion category, determining that the user is divided into two parts, namely 'angry' and one part, namely 'happy', so that the first emotion result can be determined as 'angry'. It is understood that if the emotion scores of the voice recognition result for the respective emotion categories are all zero or the emotion scores of the voice recognition result for the respective emotion categories cannot be obtained in step S202, the first emotion result is determined as no emotion.

With reference to fig. 1, in step S103, the playing content may be pushed according to the determination result of step S203, that is, the playing content may be pushed according to the emotion category corresponding to the first emotion result. Wherein, the playing content can correspond to the emotion categories one to one. For example, when the user is happy, the pushed playing content may be a love; when the emotion of the user is too late, the pushed playing content can be comedy; when the emotion of the user is fear, the pushed playing content can be a comprehensive program and the like.

It should be noted that, the corresponding relationship between the played content and the emotion category may be adaptively configured in a user-defined manner according to the actual application environment, which is not limited in this embodiment of the present invention.

It is to be understood that the played content associated with the user emotion may be played content with content tags associated with emotion types, at least one content tag being associated with each emotion type, at least one content tag being associated with each played content, and the pushed content should be played content with a content tag associated with the first emotional result. For example, when the user is happy, the pushed playing content may be playing content with content tags such as comedy, romance, and the like; when the emotion of the user is too out, the pushed playing content can be the playing content of content labels such as comedy and cure system programs; when the emotion of the user is fear, the pushed playing content can be the playing content with content labels of comedy, art programs and the like. If there are multiple pieces of playing content associated with the emotion of the user that can be pushed, one piece of playing is randomly selected, or one piece of playing is selected according to adaptive custom configuration of an actual application environment, or all pieces of playing are pushed for the user to select playing, and the like.

The embodiment of the invention extracts keywords from the voice recognition result of the user, compares the keywords with words in a preset emotion word bank, and determines the emotion category with the highest emotion score as the emotion of the user; namely, the emotion of the user is judged from the semantic level of the voice data, and the pushing accuracy of the playing content is guaranteed.

Fig. 3 is a flowchart of another emotion-based terminal control method according to an embodiment of the present invention.

In this embodiment, the facial image of the user may be collected in advance to be used for the judgment of facial emotion. After determining the first emotional result, facial recognition may also be performed on the received facial image to determine a second emotional result.

In a specific implementation, the emotion-based terminal control method shown in fig. 3 may include the following steps:

step S301: carrying out voice recognition processing on the received voice data to obtain a voice recognition result;

step S302: determining a first emotion result based on the voice recognition result and a preset emotion word bank;

step S303: matching the facial image with a preset image to determine emotion scores of the facial image for all emotion categories;

step S304: determining the mood category with the highest mood score as the second mood result;

step S305: and pushing corresponding playing content according to the first emotion result and the second emotion result.

In this embodiment, the specific implementation of step S301 and step S302 can refer to step S101 and step S102 in the embodiment shown in fig. 1, and details are not repeated here.

In particular implementations, the facial image may include a user's face and may include a facial expression of the user, which may characterize the user's mood. Specifically, when the emotions of the users are different, the corresponding user images are different. Before step S303, preset images may be preset, each preset image may represent one emotion category, and each emotion category may represent one emotion of the user. Then, in performing step S303, the face image may be matched with a preset image, and the emotion score of the face image for each emotion category may be determined according to the degree of matching of the face image with the preset image. That is, the higher the matching degree of the face image with the preset image is, the higher the emotion score of the face image for the emotion category corresponding to the preset image is; the higher the emotion score of the face image, the closer the emotion carried by the face image is to the emotion category to which the emotion score corresponds.

Further, in step S304, the emotion classification with the highest emotion score may be the second emotion result. That is, the second emotion result represents an emotion judged from the face image of the user.

For example, the user may make a frown expression, and after matching the facial image with the preset image, it may be determined that the user is "hard to cross" for two minutes, and "hard to heart" for one minute, and then it may be determined that the second emotional result is "hard to cross".

It is to be understood that, if in step S303, the emotion scores of the face images for the respective emotion categories are all zero, or the emotion scores of the face images for the respective emotion categories are not available, the second emotion result is determined as no emotion.

In a specific implementation, in step S305, the corresponding playing content is pushed according to the first emotion result and the second emotion result. Compared to step S103 shown in fig. 1, this step pushes the program to the user by combining the first emotional result and the second emotional result. According to the embodiment of the invention, the current emotion of the user is comprehensively judged by combining the emotion judged in the voice data of the user with the emotion judged by the face of the user, so that the emotion judgment accuracy of the user is improved, the push accuracy of the playing content is improved, and the intelligence and humanization of the terminal are further improved.

Preferably, in step S305, a dialog box may be sent to prompt whether to play the playing content, or a voice prompt may be sent to prompt whether to play the playing content. Specifically, the user may determine whether to play the playback content by performing a selection operation on the terminal. For example, selection may be by way of gesture recognition; the selection can also be made by means of voice control.

Preferably, one embodiment of step S305 can refer to fig. 4, and fig. 4 is a flowchart of one specific implementation of step S305 shown in fig. 3.

Step S305 may include the steps of:

step S401: when the first emotion result is an emotion, pushing the playing content according to the emotion category corresponding to the first emotion result;

step S402: and when the first emotion result is not emotion, pushing the playing content according to the emotion category corresponding to the second emotion result.

In this embodiment, the first emotion result may represent an emotion determined according to the user voice data; the second emotion result may represent an emotion judged from the face of the user. When the second emotion result is determined, the difficulty in acquiring and configuring the preset image completely corresponding to the emotion type is high, and the accuracy of judgment of the second emotion result is affected. The embodiment prioritizes the first emotional result when combining the first emotional result with the second emotional result.

In a specific implementation, in step S401 and step S402, if a first emotion result is an emotion, the emotion category corresponding to the first emotion result may be determined as a user emotion, and the playing content associated with the user emotion is pushed; if the first emotion result is no emotion, the emotion category corresponding to the second emotion result can be determined as a user emotion, and the playing content associated with the user emotion is pushed.

The embodiment of the invention provides an implementation mode for combining the first emotion result and the second emotion result with the push playing content, and the emotion judgment accuracy of a user is ensured in a mode of giving priority to the first emotion result; and when the first emotion result is no emotion, the judgment result of the emotion of the user is still obtained, the playing content is pushed, and the intelligent degree and the humanized degree of the terminal are further improved.

Preferably, another implementation of step S305 can refer to fig. 5, and fig. 5 is a flowchart of another specific implementation of step S305 shown in fig. 3.

Step S305 may include the steps of:

step S501: when the emotion types corresponding to the first emotion result and the second emotion result are consistent, pushing the playing content according to the emotion types;

step S502: when the emotion categories corresponding to the first emotion result and the second emotion result are not consistent, pushing the playing content by the emotion category with the highest emotion score;

step S503: and when the first emotion result is not emotion, pushing the playing content by the emotion category corresponding to the second emotion result.

In this embodiment, the first emotion result may represent an emotion determined according to the user voice data; the second emotion result may represent an emotion judged from the face of the user. Since the first emotion result and the second emotion result are determined according to the emotion scores when the first emotion result and the second emotion result are obtained, the embodiment can determine the final emotion of the user according to the emotion scores when the first emotion result and the second emotion result are combined.

In a specific implementation, in step S501 and step S502, if the emotion categories corresponding to the first emotion result and the second emotion result are consistent, the emotion category may be determined as a user emotion, and the playing content associated with the user emotion is pushed; and if the emotion categories corresponding to the first emotion result and the second emotion result are not consistent, determining the emotion category with the highest emotion score as the emotion of the user, and pushing the playing content related to the emotion of the user.

In specific implementation, the first emotion result is obtained by matching the keyword of the voice recognition result with the preset emotion word bank, so that the first emotion result is emotion-free when the keyword cannot be obtained by matching the preset emotion word bank. And the second emotional result is obtained by matching the face image with the preset image, and no emotional condition usually occurs. Therefore, in step S503, when the first emotion result is no emotion, the emotion category corresponding to the second emotion result may be determined as the user emotion, and the playing content associated with the user emotion may be pushed.

The embodiment of the invention provides another implementation mode when the first emotion result and the second emotion result are combined to push the playing content, the final emotion of the user is determined according to the emotion score, and the emotion judgment accuracy of the user is ensured; and when the first emotion result is no emotion, the judgment result of the emotion of the user is still obtained, the playing content is pushed, and the intelligent degree and the humanized degree of the terminal are further improved.

Fig. 6 is a schematic structural diagram of a terminal control device based on emotion according to an embodiment of the present invention.

The emotion-based terminal control apparatus 60 shown in fig. 6 may include a voice recognition unit 601, a first emotion result determination unit 602, and a push unit 603.

The voice recognition unit 601 is configured to perform voice recognition processing on the received voice data to obtain a voice recognition result; the first emotion result determination unit 602 is configured to determine a first emotion result based on the speech recognition result and a preset emotion word bank; the pushing unit 603 is configured to determine a user emotion according to at least the first emotion result, and push the playing content associated with the user emotion.

In a specific implementation, the voice recognition unit 601 may convert voice data in a voice format into a voice recognition result in a text format, so as to perform subsequent processing. It should be understood by those skilled in the art that the speech recognition processing of the speech data may be implemented in any practicable manner, and the embodiment of the present invention is not limited thereto.

In a specific implementation, the preset emotion lexicon can be configured in advance. The preset emotion word bank can store words corresponding to various emotions. The first emotion result determination unit 602 may match the speech recognition result with words in a preset emotion word library, and use an emotion corresponding to the matched word as a first emotion result. That is, the first emotional result may characterize an emotion of the user carried by the voice data.

The push unit 603 may further determine the user emotion from at least the first emotion result. That is, the emotion of the user borne by the voice data may be determined as the emotion of the user, or the emotion of the user borne by the voice data may be combined with the emotion of the user determined in any other implementable manner to determine the emotion of the user. After the user emotion is determined, playing content associated with the user emotion can be pushed according to the user emotion. For example, when the user is happy, the pushed playing content may be comedy, love; when the emotion of the user is too out, the pushed playing content can be comedy and cure system programs; when the emotion of the user is fear, the pushed playing content can be comedy, comprehensive programs and the like.

Preferably, the pushing unit 603 may send a dialog box prompt whether to play the playing content, or may send a voice prompt whether to play the playing content. Specifically, the user may determine whether to play the playback content by performing a selection operation on the terminal. For example, selection may be by way of gesture recognition; the selection can also be made by means of voice control.

Preferably, the terminal control device for emotion 60 shown in fig. 6 may further include a first matching unit (not shown) and a first control unit (not shown). The first matching unit can obtain a first control instruction based on the matching of the voice recognition result and a preset instruction library; the first control unit may control the terminal to perform a corresponding operation according to the first control instruction, where the corresponding operation does not include pushing and playing content. In this embodiment, since the voice data of the user may further include a control instruction for the terminal, the voice recognition result may be further processed. Specifically, the first matching unit matches the voice recognition result with a preset instruction library, and after a first control instruction is obtained through matching, the first control unit controls the terminal according to the first control instruction, and the terminal can execute corresponding operation in response to the first control instruction. Such as switching channels, adjusting volume, etc.

Preferably, the terminal control device for emotion 60 shown in fig. 6 may further include a second matching unit (not shown) and a second control unit (not shown). The second matching unit matches the received gesture action data with a preset action library to obtain a second control instruction; and the second control unit controls the terminal to execute corresponding operation according to the second control instruction, wherein the corresponding operation does not comprise pushing and playing content. In this embodiment, the gesture of the user may also be detected, and after the gesture is detected, the gesture is further processed to determine whether to further control the terminal. Specifically, the second matching unit matches the gesture action with a preset action library, and when the gesture action is matched with the action in the preset action library, a second control instruction corresponding to the action is determined; the second control unit controls the terminal according to the second control instruction, and the terminal can respond to the second control instruction to execute corresponding operation. Such as switching channels, adjusting volume, etc.

Preferably, an embodiment of the first emotion result determination unit 602 may refer to fig. 7, and fig. 7 is a schematic structural diagram of the first emotion result determination unit 602 shown in fig. 6.

The first emotion result determination unit 602 may include the keyword extraction subunit 71, the first matching subunit 72, and the first determination subunit 73.

The keyword extraction subunit 71 is configured to perform keyword extraction processing on the voice recognition result to obtain a keyword; the first matching subunit 72 is configured to match the keyword with an emotion score thesaurus of each emotion category in the preset emotion thesaurus, so as to determine an emotion score of the voice recognition result for each emotion category according to a matching result, where the higher the matching degree of the keyword with the emotion score thesaurus is, the higher the emotion score is, and the preset emotion thesaurus is used to store the emotion thesaurus corresponding to the emotion category; the first determining subunit 73 is configured to determine the emotion category with the highest emotion score as the first emotion result.

In a specific implementation, the keyword extraction subunit 71 may perform keyword extraction on the speech recognition result to obtain a keyword of the speech recognition result. In particular, the keywords of the speech recognition result may be used to characterize the semantics of the speech data, and then the emotion of the user carried by the speech data may be determined according to the keywords.

In specific implementation, the preset emotion word library may store a plurality of emotion categories and emotion score word libraries corresponding to the emotion categories. In particular, each mood category may represent a mood; each mood category may have multiple mood scores, with higher mood scores indicating closer to the mood category. The emotion score word bank of each emotion category may store words corresponding to a plurality of emotion scores for the emotion category. The first matching subunit 72 may match the keyword of the speech recognition result with the words in the emotion score word banks of the respective emotion categories and determine emotion scores of the speech recognition result for the respective emotion categories based on the matched words. The higher the matching degree of the keywords and the emotion score word bank is, the higher the emotion score is.

In a specific implementation, after determining the emotion scores of the speech recognition results for the respective emotion categories, since a higher emotion score indicates that the keyword is closer to the emotion category corresponding to the emotion score, the first determining subunit 73 may determine the emotion category with the highest emotion score as the first emotion result. For example, the emotion score determined by the speech recognition result is: and if the emotion is too much, the emotion is too little, and the emotion is too little.

It is understood that if the emotion scores of the voice recognition result for the respective emotion categories are all zero or the emotion scores of the voice recognition result for the respective emotion categories cannot be obtained, the first emotion result is determined as no emotion.

With reference to fig. 6, the pushing unit 603 may push the playing content according to the judgment result of the first determining subunit 73, that is, push the playing content according to the emotion category corresponding to the first emotion result. Wherein, the playing content can correspond to the emotion categories one to one. For example, when the user is happy, the pushed playing content may be a love; when the emotion of the user is too late, the pushed playing content can be comedy; when the emotion of the user is fear, the pushed playing content can be a comprehensive program and the like.

The emotion-based terminal control apparatus 80 shown in fig. 8 may include a voice recognition unit 801, a first emotion result determination unit 802, a face recognition unit 803, and a push unit 804; the face recognition unit 803 may include a second matching subunit 8031 and a second determining subunit 8032.

The voice recognition unit 801 is configured to perform voice recognition processing on the received voice data to obtain a voice recognition result; the first emotion result determination unit 802 is configured to determine a first emotion result based on the speech recognition result and a preset emotion word bank; a face recognition unit 803 is configured to perform face recognition on the received face image to determine a second emotion result; the pushing unit 804 is configured to determine a user emotion according to at least the first emotion result, and push the playing content associated with the user emotion.

Specifically, the second matching subunit 8031 is configured to match the face image with a preset image to determine an emotion score of the face image for each emotion category.

The second determining subunit 8032 is configured to determine the emotion classification with the highest emotion score as the second emotion result.

In this embodiment, the facial image of the user may be collected in advance to be used for the judgment of facial emotion. After determining the first emotion result, the face recognition unit 803 may also perform face recognition on the received face image to determine a second emotion result.

The specific implementation of the speech recognition unit 801 and the first emotion result determination unit 802 in this embodiment may refer to the speech recognition unit 601 and the first emotion result determination unit 602 in the embodiment shown in fig. 6, and details are not repeated here.

In particular implementations, the facial image may include a user's face and may include a facial expression of the user, which may characterize the user's mood. Specifically, when the emotions of the users are different, the corresponding user images are different. Preset images can be preset, each preset image can represent one emotion category, and each emotion category can represent one emotion of a user. The second matching subunit 8031 may match the face image with a preset image, and may determine the emotion score of the face image for each emotion category according to the degree of matching of the face image with the preset image. That is, the higher the matching degree of the face image with the preset image is, the higher the emotion score of the face image for the emotion category corresponding to the preset image is; the higher the emotion score of the face image, the closer the emotion carried by the face image is to the emotion category to which the emotion score corresponds.

And the second determining subunit 8032 may further set the emotion category with the highest emotion score as the second emotion result. That is, the second emotion result represents an emotion judged from the face image of the user.

It is understood that the second emotion result is determined as no emotion if the emotion scores of the face image for the respective emotion categories are all zero or the emotion scores of the face image for the respective emotion categories are not available.

In a specific implementation, the pushing unit 804 may push the corresponding playing content according to the first emotion result and the second emotion result. In contrast to the pushing unit 603 shown in fig. 6, the pushing unit 804 pushes the program to the user by combining the first emotional result and the second emotional result. According to the embodiment of the invention, the current emotion of the user is comprehensively judged by combining the emotion judged in the voice data of the user with the emotion judged by the face of the user, so that the emotion judgment accuracy of the user is improved, the push accuracy of the playing content is improved, and the intelligent degree and the humanization degree of the terminal are further improved.

Preferably, the pushing unit 804 may send a dialog box prompt to indicate whether the playing content is played or may send a voice prompt to indicate whether the playing content is played. Specifically, the user may determine whether to play the playback content by performing a selection operation on the terminal. For example, selection may be by way of gesture recognition; the selection can also be made by means of voice control.

Preferably, the pushing unit 804 may include a first pushing subunit (not shown) and a second pushing subunit (not shown). And the first pushing subunit is used for pushing the playing content according to the emotion category corresponding to the first emotion result when the first emotion result is emotional.

And the second pushing subunit is used for pushing the playing content according to the emotion category corresponding to the second emotion result when the first emotion result is not emotion.

In a specific implementation, if a first emotion result is an emotion, the first pushing subunit may determine the emotion category corresponding to the first emotion result as a user emotion, and push the playing content associated with the user emotion; if the first emotion result is no emotion, the second pushing subunit may determine the emotion category corresponding to the second emotion result as a user emotion, and push the play content associated with the user emotion.

The embodiment of the invention provides an implementation mode for combining the first emotion result and the second emotion result with the push playing content, and the emotion judgment accuracy of a user is ensured in a mode of giving priority to the first emotion result; and when the first emotion result is no emotion, the judgment result of the emotion of the user is still obtained, the playing content is pushed, and the intellectualization and humanization of the terminal are further improved.

In this embodiment, the first emotion result may represent an emotion determined according to the user voice data; the second emotion result may represent an emotion judged from the face of the user. Since the first emotion result and the second emotion result are determined according to the emotion scores when the first emotion result and the second emotion result are obtained, the present embodiment can determine the final emotion of the user according to the emotion scores when the first emotion result and the second emotion result are combined.

Preferably, the pushing unit 804 may include a third pushing subunit (not shown), a fourth pushing subunit (not shown), and a fifth pushing subunit (not shown).

And the third pushing subunit is used for pushing the playing content according to the emotion category when the emotion categories corresponding to the first emotion result and the second emotion result are consistent.

And the fourth pushing subunit is used for pushing the playing content according to the emotion category with the highest emotion score when the emotion categories corresponding to the first emotion result and the second emotion result are not consistent.

And the fifth pushing subunit is used for pushing the playing content according to the emotion category corresponding to the second emotion result when the first emotion result is not emotion.

In a specific implementation, if the emotion categories corresponding to the first emotion result and the second emotion result are consistent, the third pushing subunit may determine the emotion category as a user emotion, and push the playing content associated with the user emotion; if the emotion categories corresponding to the first emotion result and the second emotion result are not consistent, the fourth pushing subunit may determine the emotion category with the highest emotion score as a user emotion, and push the play content associated with the user emotion.

In specific implementation, the first emotion result is obtained by matching the keyword of the voice recognition result with the preset emotion word bank, so that the first emotion result is emotion-free when the keyword cannot be obtained by matching the preset emotion word bank. And the second emotional result is obtained by matching the face image with the preset image, and no emotional condition usually occurs. Therefore, the fifth pushing subunit may determine, when the first emotion result is not emotion, the emotion category corresponding to the second emotion result as a user emotion, and push the play content associated with the user emotion.

The embodiment of the invention provides another implementation mode when the first emotion result and the second emotion result are combined to push the playing content, the final emotion of the user is determined according to the emotion score, and the emotion judgment accuracy of the user is ensured; and when the first emotion result is no emotion, the judgment result of the emotion of the user is still obtained, the playing content is pushed, and the intellectualization and humanization of the terminal are further improved.

The embodiment of the invention also discloses a terminal which can comprise the terminal control device 60 based on the emotion shown in figure 6 or the terminal control device 80 based on the emotion shown in figure 8. The terminal includes but is not limited to a television, a computer, a mobile phone, a notebook, a tablet computer, etc.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A terminal control method based on emotion is characterized by comprising the following steps:

carrying out voice recognition processing on the received voice data to obtain a voice recognition result;

determining a first emotion result based on the voice recognition result and a preset emotion word bank;

determining the emotion of the user at least according to the first emotion result, and pushing playing content related to the emotion of the user, wherein each emotion category is related to at least one content tag, each playing content is provided with at least one content tag, and the pushed content is the playing content with the content tag related to the first emotion result; if a plurality of playing contents associated with the emotion of the user exist, carrying out adaptive custom configuration according to the actual application environment to select one playing;

the determining a first emotion result based on the speech recognition result and a preset emotion lexicon comprises:

extracting keywords from the voice recognition result to obtain keywords; the keywords are used for representing the semantics of the voice data;

matching the keywords with emotion score word banks of all emotion categories in the preset emotion word bank to determine emotion scores of the voice recognition results for all emotion categories according to matching results, wherein the higher the matching degree of the keywords with the emotion score word banks is, the higher the emotion scores are;

determining the emotion category with the highest emotion score as the first emotion result;

the preset emotion word bank stores a plurality of emotion categories and emotion score word banks corresponding to the emotion categories, each emotion category represents an emotion, each emotion category has a plurality of emotion scores, and the emotion score word bank of each emotion category stores words corresponding to the emotion scores of the emotion category;

performing facial recognition on the received facial image to determine a second emotion result; the performing face recognition on the received face image includes: matching the facial image with a preset image to determine emotion scores of the facial image for all emotion categories; determining the mood category with the highest mood score as the second mood result; each preset image represents one emotion category, and each emotion category represents the emotion of one user;

the pushing the playing content associated with the user emotion comprises: when the first emotion result is not emotion, pushing the playing content by the emotion category corresponding to the second emotion result;

judging the first emotion result to be free of emotion under the condition that the emotion scores of the voice recognition result aiming at all emotion categories are zero or the emotion scores of the voice recognition result aiming at all emotion categories cannot be obtained; judging the second emotion result to be emotionless under the condition that the emotion scores of the face image for all emotion categories are zero or the emotion scores of the face image for all emotion categories cannot be obtained;

matching the keywords with the emotion score word banks of the emotion categories in the preset emotion word bank, and determining the emotion scores of the voice recognition results for the emotion categories according to matching results specifically comprises: and matching the keywords of the voice recognition result with words in the emotion score word bank of each emotion category, and determining the emotion score of the voice recognition result for each emotion category according to the matched words.

2. The terminal control method according to claim 1, wherein the pushing the play content associated with the user emotion comprises:

and pushing the playing content according to the emotion type corresponding to the first emotion result, wherein the playing content corresponds to the emotion type one to one.

3. The terminal control method according to claim 1, wherein the pushing the play content associated with the user emotion comprises:

and pushing corresponding playing content according to the first emotion result and the second emotion result.

4. The terminal control method according to claim 1, wherein the pushing the play content associated with the user emotion comprises:

and when the first emotion result is an emotion, pushing the playing content according to the emotion category corresponding to the first emotion result.

5. The terminal control method according to claim 1, wherein the playing content is pushed in one or more of the following manners:

and sending a dialog box to prompt whether the playing content is played or not, and sending a voice prompt whether the playing content is played or not.

6. The terminal control method according to any one of claims 1 to 5, further comprising:

matching the voice recognition result with a preset instruction library to obtain a first control instruction;

and controlling the terminal to execute corresponding operation according to the first control instruction, wherein the corresponding operation does not comprise pushing and playing content.

7. The terminal control method according to any one of claims 1 to 5, further comprising:

matching the received gesture action data with a preset action library to obtain a second control instruction;

and controlling the terminal to execute corresponding operation according to the second control instruction, wherein the corresponding operation does not comprise pushing and playing content.

8. A terminal control apparatus based on emotion, characterized by comprising:

the voice recognition unit is used for carrying out voice recognition processing on the received voice data to obtain a voice recognition result;

the first emotion result determining unit is used for determining a first emotion result based on the voice recognition result and a preset emotion word bank;

the pushing unit is used for determining the emotion of the user at least according to the first emotion result and pushing the playing content related to the emotion of the user; each emotion category is associated with at least one content tag, each playing content is provided with at least one content tag, and the pushed content is the playing content provided with the content tag associated with the first emotion result; if a plurality of playing contents associated with the emotion of the user exist, the pushing unit performs adaptive custom configuration according to the actual application environment to select one playing;

a face recognition unit for performing face recognition on the received face image to determine a second emotion result; the face recognition unit includes: the second matching subunit is used for matching the face image with a preset image so as to determine the emotion score of the face image for each emotion category;

a second determining subunit, configured to determine the emotion category with the highest emotion score as the second emotion result; each preset image represents one emotion category, and each emotion category represents the emotion of one user;

the first emotion result determination unit includes:

a keyword extraction subunit, configured to perform keyword extraction processing on the voice recognition result to obtain a keyword; the keywords are used for representing the semantics of the voice data;

the first matching subunit is used for matching the keywords with the emotion score word banks of all emotion categories in the preset emotion word bank so as to determine emotion scores of the voice recognition result for all emotion categories according to the matching result, wherein the higher the matching degree of the keywords with the emotion score word banks is, the higher the emotion scores are, and the preset emotion word bank is used for storing the emotion word banks corresponding to the emotion categories;

a first determining subunit, configured to determine the emotion category with the highest emotion score as the first emotion result;

9. The terminal control device according to claim 8, wherein the pushing unit pushes the playback content according to the emotion category corresponding to the first emotion result, and the playback content corresponds to the emotion category one to one.

10. The terminal control device according to claim 8, wherein the pushing unit pushes the corresponding play content according to the first emotion result and the second emotion result.

11. The terminal control device according to claim 8, wherein the pushing unit includes:

the first pushing subunit is used for pushing the playing content according to the emotion category corresponding to the first emotion result when the first emotion result is emotional;

12. The terminal control device according to claim 8, wherein the pushing unit pushes the playing content in one or more of the following manners:

13. The terminal control device according to any one of claims 8 to 12, characterized by further comprising:

the first matching unit is used for obtaining a first control instruction based on the matching of the voice recognition result and a preset instruction library, wherein the preset instruction library is used for storing the first control instruction, and the first control instruction comprises at least one word;

and the first control unit is used for controlling the terminal to execute corresponding operation according to the first control instruction, wherein the corresponding operation does not comprise push playing content.

14. The terminal control device according to any one of claims 8 to 12, characterized by further comprising:

the second matching unit is used for matching the received gesture action data with a preset action library to obtain a second control instruction, and the preset action library is used for storing the second control instruction corresponding to the gesture action;

and the second control unit is used for controlling the terminal to execute corresponding operation according to the second control instruction, wherein the corresponding operation does not comprise pushing and playing content.

15. A terminal characterized by comprising an emotion-based terminal control apparatus as claimed in any of claims 8 to 14.