CN116343788A - Interaction method, interaction device, electronic equipment and storage medium - Google Patents

Interaction method, interaction device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116343788A
CN116343788A CN202310214804.0A CN202310214804A CN116343788A CN 116343788 A CN116343788 A CN 116343788A CN 202310214804 A CN202310214804 A CN 202310214804A CN 116343788 A CN116343788 A CN 116343788A
Authority
CN
China
Prior art keywords
interaction
information
data
reply
terminal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310214804.0A
Other languages
Chinese (zh)
Inventor
李守毅
朱翠玲
刘庆升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Toycloud Technology Co Ltd
Original Assignee
Anhui Toycloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Toycloud Technology Co Ltd filed Critical Anhui Toycloud Technology Co Ltd
Priority to CN202310214804.0A priority Critical patent/CN116343788A/en
Publication of CN116343788A publication Critical patent/CN116343788A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides an interaction method, an interaction device, electronic equipment and a storage medium, wherein the interaction method comprises the following steps: acquiring interaction data, and determining semantic information and emotion information based on the interaction data; determining reply data used by the terminal for replying to the interaction data based on the semantic information, the emotion information and the equipment information of the terminal for interaction; the device information includes a profile class and/or a supportable action set; based on the reply data, interactions are performed. The method, the device, the electronic equipment and the storage medium provided by the invention determine the reply data used by the terminal for replying the interaction data to carry out interaction based on the interaction data and the equipment information comprising the appearance type and/or the supportable action set, can be more suitable for the image of the terminal itself during reply, provide more anthropomorphic man-machine interaction, and further improve the experience of the user and the interest of the interaction with the terminal through a flexible, interesting and strong interaction mode especially when the terminal temporarily cannot answer the user.

Description

Interaction method, interaction device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of man-machine interaction technologies, and in particular, to an interaction method, an interaction device, an electronic device, and a storage medium.
Background
In the process of interaction with a person, the existing terminal often encounters a situation that a user inquires a question which can not be answered temporarily. Currently, when this situation occurs at the terminal, the answer is usually made according to a preset voice, for example, "the question me is not learned for a while, and i learn to answer you later.
However, the answer is carried out according to the preset voice, and the processing mode enables the interactive content between the human and the machine to be stiff and hard, and is easy to feel tired, and the anthropomorphic property is poor, so that the user experience is reduced.
Disclosure of Invention
The invention provides an interaction method, an interaction device, electronic equipment and a storage medium, which are used for solving the defects of stiff, hard, easy annoyance and poor anthropomorphic property of man-machine interaction content in the prior art.
The invention provides an interaction method, which comprises the following steps:
acquiring interaction data, and determining semantic information and emotion information based on the interaction data;
determining reply data used by the terminal for replying to the interaction data based on the semantic information, the emotion information and the equipment information of the terminal for interaction; the device information includes a profile class and/or a supportable action set;
And carrying out interaction based on the reply data.
According to the interaction method provided by the invention, the reply data used by the terminal for replying to the interaction data is determined based on the semantic information, the emotion information and the equipment information of the terminal for interaction, and the method comprises the following steps:
determining state information when the terminal replies to the interactive data based on whether reply text corresponding to the interactive data exists or not, and the semantic information, the emotion information and the equipment information;
and determining the reply data based on the state information.
According to the interaction method provided by the invention, in the absence of the reply text, the determining the reply data based on the state information comprises the following steps:
determining candidate resources for replying to the interaction data based on the interaction field to which the interaction data belongs;
based on the status information, reply data is determined that includes directing a user to query the candidate resource.
According to the interaction method provided by the invention, when the reply text exists, the method for determining the reply data based on the state information comprises the following steps:
and generating reply data corresponding to the state information based on the reply text when the reply text exists.
According to the interaction method provided by the invention, the determining the state information of the terminal when replying to the interaction data based on whether the reply text corresponding to the interaction data exists or not, and the semantic information, the emotion information and the equipment information comprises the following steps:
and determining the state information based on whether reply text corresponding to the interaction data exists or not, and the semantic information, the emotion information, the equipment information and the identity information of the user corresponding to the interaction data.
According to the interaction method provided by the invention, the step of determining whether the reply text corresponding to the interaction data exists comprises the following steps:
acquiring the voice quality of user voice in the interactive data;
determining that no reply text corresponding to the interactive data exists under the condition that the number of times that the voice quality does not meet the preset condition is larger than the preset number of times;
and judging whether a reply text corresponding to the user voice exists or not under the condition that the voice quality meets the preset condition.
According to the interaction method provided by the invention, the equipment information also comprises at least one of equipment emotion information, equipment character class and man-machine affinity.
The invention also provides an interaction device, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit acquires interaction data and determines semantic information and emotion information based on the interaction data;
determining a reply data unit, and determining reply data used by the terminal for replying the interaction data based on the semantic information, the emotion information and the equipment information of the terminal for interaction; the device information includes a profile class and/or a supportable action set;
and the interaction unit is used for carrying out interaction based on the reply data.
The invention also provides a terminal comprising a shell, a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the memory and the processor are arranged inside the shell, the processor realizes the interaction method when executing the program, and the appearance type and/or the supportable action set are determined based on the shell.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an interaction method as described in any of the above.
The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the interaction method as described in any of the above.
According to the interaction method, the device, the electronic equipment and the storage medium, based on the interaction data and the equipment information comprising the appearance type and/or the supportable action set, the reply data used by the terminal for replying the interaction data are determined to carry out interaction, the image of the terminal can be more attached to the terminal during reply, more anthropomorphic man-machine interaction is provided, and particularly when the terminal temporarily cannot answer to a user, the experience of the user and the interest of the interaction with the terminal are further improved through a flexible, interesting and strong interaction mode.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an interaction method provided by the invention;
FIG. 2 is a flow chart of a method for determining reply data according to the present invention;
FIG. 3 is a flow chart of a method for determining reply text provided by the invention;
FIG. 4 is a schematic diagram of an interactive device according to the present invention;
fig. 5 is a schematic structural diagram of a terminal provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the interaction process of the terminal and the person, especially when the terminal encounters a problem that the terminal cannot answer temporarily, the terminal answers according to preset voice, and the processing mode enables interaction contents between the person and the machine to be stiff and hard, and the interaction contents are easy to irritate the person, so that the anthropomorphic property is poor, and further the user experience is reduced.
Aiming at the problems, the invention provides an interaction method for realizing more vivid man-machine interaction. FIG. 1 is a schematic flow chart of an interaction method provided by the present invention, and as shown in the figure, the method includes:
Step 110, acquiring interaction data, and determining semantic information and emotion information based on the interaction data;
specifically, in the human-computer interaction process, the terminal may acquire interaction data of the user, where the interaction data may include at least one of voice data, image data, and haptic data. Further, the voice data can be picked up by voice acquisition equipment such as a microphone and the like; the image data can be acquired through image acquisition equipment such as a camera and the like to obtain the image data; the tactile data may be acquired by a pressure sensor or the like.
After the interactive data is obtained, semantic information and emotion information contained in the interactive data are determined through the interactive data. Here, for the acquisition of semantic information, an interactive text can be obtained through voice transcription, and semantic information corresponding to the interactive data is obtained through semantic extraction of the interactive text; for the acquisition of the emotion information, the emotion classification can be directly carried out on the emotion information through a text emotion classifier to obtain the emotion information of the interaction data, or the emotion information of the interaction data can be directly obtained through an emotion classification model through the interaction voice in the interaction data; or, the face recognition can be carried out on the image or the video in the image data, the expression recognition can be carried out on the basis of the face region obtained by recognition, and the facial expression is applied to determine the emotion information; the method can also be combined with the tactile data in the interactive data, and the emotion information of the user in the interactive data can be reflected in an auxiliary way through the information such as the force, the stress point position and the like in the tactile data. For example, the larger the force extracted by the touch sensor is, the more stress points are, and the more the emotion of the user is understood to be strong; conversely, the smaller the force extracted by the touch sensor, the fewer the stress points, which can be understood as the calmer emotion of the user.
Step 120, determining reply data used by the terminal to reply to the interaction data based on the semantic information, the emotion information and the equipment information of the terminal for interaction; the device information includes a profile class and/or a supportable action set;
specifically, the semantic information can reflect the interaction intention of the user, and the interaction intention of the user can directly influence the reply content or the reply state when the terminal replies to the interaction data. The reply content may be specific content of voice, text or image in the reply data, for example, the semantic information of the user covers a problem in the aviation field, the reply content may be an answer to the problem, or when the terminal cannot answer the problem, the reply content may be content for guiding the user to query a relevant knowledge base in the aviation field, or expressing that the user can learn more about the knowledge in the aviation field. The reply state can include emotion color or action expression carried by the reply data, for example, when the content expressed by the semantic information is relatively light, the reply state can be relatively light and pleasant, and the reply data can carry relatively light and large-amplitude mechanical actions, such as waving hands and jumping; when the content contained in the semantic information is serious and objective, the reply state can also be serious, and small-amplitude actions such as slight head shaking, head nodding and the like can be carried in the reply data.
The emotion information can reflect the interactive emotion of the user, and can be specifically emotion type, emotion intensity and the like. The emotion information also directly affects the reply state of the terminal when replying the interactive data. For example, the reply state may be determined by presetting a rule corresponding to the emotion information, and further reply data may be generated based on the reply state. For example, when emotion information in the interaction data is a heart injury, the reply state of the terminal is pacifying, and thus generated reply data contains a statement and an action of pacifying emotion. Such as a touch.
The device information of the terminal is used to reflect the characteristics, status, etc. of the terminal itself, and may for example comprise the profile class and/or supportable action set of the terminal.
The outline categories may be classification of the outline of the terminal, for example, animal, cartoon character, etc., and may be specifically subdivided types, for example, cat robot, dog robot, egg robot. When the reply data used for replying the interactive data is determined, the unique attribute corresponding to the appearance type of the terminal can be combined with the semantic information and the emotion information of the interactive data, the unique attribute corresponding to the appearance type of the terminal can comprise the character characteristics of the image corresponding to the appearance type, and the character characteristics of the image corresponding to the appearance type can be used as the expression style corresponding to the reply text in the reply data. For example, kittens are relatively mild, and the reply data of the terminal reply user can comprise a soft voice and a soft action set; the unique attribute corresponding to the appearance category of the terminal can also comprise the exclusive spoken phrase corresponding to the image of the appearance category, and the exclusive spoken phrase can be added into the reply text in the reply data or the corresponding word in the reply text is subjected to entity replacement. For example, the terminal can use the exclusive spoken words of the kittens and the puppies, namely the 'cat', 'Wang', in the interactive spoken words of the kittens and the puppies when replying to the user.
The supportable action set of the terminal comprises a set of all actions that the terminal can support to execute under the current appearance, wherein the actions can be clapping, blinking eyes, nodding, shaking head and the like. In order to enrich the expression forms of human-computer interaction, reply data can contain reply actions, specifically, different reply actions can be matched in advance based on different reply states, different reply actions can be matched based on different emotion information, or different reply actions can be matched by combining character features of the images corresponding to the shape types. For example, when a clapping exists in the reply text, the terminal can clap hands when replying, and when the emotion information of the interactive data is a heart injury, the terminal can perform a head-down action when replying.
It can be understood that the device information of the terminal is taken as a factor for determining the reply data, so that the reply data can refer to the appearance characteristic of the terminal, the image of the terminal can be attached to the terminal in reply, the personification is stronger, the interactivity with the user interaction can be improved, and the use experience of the user is further improved.
And 130, performing interaction based on the reply data.
Specifically, after obtaining reply data of the terminal for replying the interactive data of the user, the terminal can specifically display according to the expression form contained in the reply data, and when the reply data contains voice data, the terminal can play the voice through the voice simulator; when the reply data contains image data, the terminal can play the image data through the multimedia playing equipment; when the reply data contains action data, the terminal can display actions through a preset action control system.
According to the method provided by the embodiment of the invention, based on semantic information and emotion information in the interaction data and equipment information comprising the appearance type and/or supportable action set, the reply data used by the terminal for replying the interaction data are determined to interact, the image of the terminal can be more attached to the reply data during reply, more anthropomorphic man-machine interaction is provided, and particularly when the terminal temporarily cannot answer to a user, the experience of the user and the interest of the interaction with the terminal are further improved through a flexible, interesting and strong interaction mode.
In the current man-machine interaction, when the situation that the terminal temporarily cannot reply to the interaction data of the user occurs, the interaction is usually performed in a fixed and tedious answer mode at present, so that the interaction between the user and the terminal is not personified enough, and the experience is poor. In view of this problem, based on any one of the above embodiments, fig. 2 is a flowchart of a method for determining reply data provided by the present invention, as shown in fig. 2, in step 120, the determining reply data for the terminal to reply to the interaction data based on the semantic information and the emotion information, and the device information of the terminal for interaction includes:
Step 210, determining state information when the terminal replies to the interactive data based on whether reply text corresponding to the interactive data exists, and the semantic information, the emotion information and the equipment information;
here, the answer text may be an answer text obtained for the semantic information in the interactive data when the semantic information in the interactive data is successfully obtained, and the answer text is used as the answer text, for example, by inputting the question text in the semantic information to a preset question-answering system, and the corresponding answer text is obtained and used as the answer text. It can be understood that, for the semantic information in the interactive data, the corresponding answer text cannot be obtained temporarily as the reply text, or when the semantic information in the interactive data is not successfully obtained, the reply text corresponding to the interactive data cannot be obtained, and at this time, the situation that the reply text does not exist is avoided.
Whether a reply text exists or not can directly influence the state information when the terminal replies the interactive data. The state information herein is used for reflecting the device state, i.e. the reply state, when the terminal replies, and may specifically include emotion colors and/or action expressions carried by the reply data. It can be appreciated that, in the case where there is a reply text, based on semantic information and emotion information in the interaction data, device information of the terminal, and the reply text itself, state information of the terminal can be determined together, for example, when the reply text is detailed and accurate science popularization knowledge, emotion contained in the state information can be serious and serious; when the reply text is daily questionable session content, the emotion contained in the state information can be relaxed and pleasant; in the case where there is no reply text, the terminal may include moods such as shame, shy, etc. because it cannot give the user an appropriate reply.
Step 220, determining the reply data based on the status information.
Specifically, when the state information of the terminal for replying to the interactive data is obtained, the reply data of the terminal for replying to the interactive data can be determined based on the state information. For example, when the state information includes a relaxed and pleasant emotion, the reply data can carry a relatively relaxed and large-amplitude mechanical action, the voice in the reply data can be a relaxed and pleasant emotion, and the expression displayed on the terminal screen in the reply data can also be a relaxed and pleasant expression; for another example, when the state information includes serious emotion, serious small-amplitude actions, such as slight shaking, nodding, and the like, can be carried in the reply data, serious mood can be displayed in the reply data, and the expression displayed on the terminal screen in the reply data can also be a carefully focused expression.
According to the method provided by the embodiment of the invention, the state information of the terminal when replying the interactive data is determined based on whether the reply text corresponding to the interactive data exists or not, and semantic information, emotion information and equipment information; based on the state information, the reply data is determined, flexible reply data aiming at whether the terminal interaction data can be answered or not is provided, and particularly, the flexibility and anthropomorphic property of the reply data are improved under the condition that the terminal can not answer the interaction data of a user temporarily.
Based on any of the above embodiments, in the absence of the reply text, step 220 includes:
determining candidate resources for replying to the interaction data based on the interaction field to which the interaction data belongs;
based on the status information, reply data is determined that includes directing a user to query the candidate resource.
Specifically, in the case where the reply text does not exist, it is often caused by the inability to acquire answer text corresponding to semantic information in the interaction data. In this case, although the terminal cannot give an accurate answer corresponding to the interactive data, guidance for solving the problem posed by the interactive data may be supplemented in the reply data.
Specifically, the interaction field to which the interaction data belongs may be obtained first, for example, the field classification may be performed on the keywords by extracting the keywords in the semantic information of the interaction data, so as to obtain the interaction field to which the interaction data belongs, where the interaction field may reflect the field to which the information focused by the interaction data belongs, for example, may be a relatively wide field about sports, photography, aviation, etc., or the wide fields may be divided into finer granularity. Then, candidate resources related to the interaction field can be selected from the massive resources based on the interaction field, for example, the interaction field can be used as a keyword, and the resources related to the interaction field can be searched in the massive resources, for example, the candidate resources in the sports field can be 'sports encyclopedia'. It can be understood that the determination of the candidate resources for replying to the interactive data by the interactive field can be the guidance, the supplementary learning and the like of the questions presented in the interactive data in the interactive field, so that the user is prevented from being replied by using a single answer, and poor experience is brought to the user.
Further, the state information of the interaction data replied by the terminal can be combined with the reply phone operation when different candidate resources are preset, so that the reply phone operation is displayed in a form corresponding to the state information of the terminal, and reply data are obtained. The answer phone operation can be used for guiding the user to inquire corresponding candidate resources, such as inquiring corresponding books, namely 'you can see whether writing is performed on the sports encyclopedia' or inquiring family friends, and the like; it can be understood that the state information of the terminal can express the emotion of the terminal when the terminal cannot accurately reply the interactive data of the user, so that the terminal can reply through voice, and especially the emotion of the terminal at the moment can be expressed through actions, and the humanization of the terminal in reply is improved. For example, it may be a swinging, a hand-shake, or a two-hand-case type of facial motion. Meanwhile, the voice may include a mood word corresponding to emotion, for example, the mood word corresponding to shy may be "alow".
According to the method provided by the embodiment of the invention, the candidate resources for replying the interactive data are determined based on the interactive field to which the interactive data belong; based on the state information, the reply data including the candidate resource for guiding the user to inquire is determined, so that when the terminal temporarily cannot answer the interactive data of the user, the reply which is related to the interactive data and has more humanization is provided, and compared with the prior reply which is not related to the interactive data and is single, the reply is more flexible and diversified, and is close to the interactive data of the user, and the experience of the user is improved.
Based on any of the above embodiments, in the presence of the reply text, the determining the reply data based on the status information includes:
and generating reply data corresponding to the state information based on the reply text when the reply text exists.
The reply data may be that the terminal packages the reply text by voice, image and action, and displays the boring reply text in the form of voice, image and action. For example, after the reply text is obtained, the reply text can be converted into voice, or the reply text is displayed in an image mode, or the reply text is displayed in an action mode, so that reply data which can answer interaction data and can match the emotion of a user in an interaction scene and the equipment information of the terminal can be obtained.
Based on any of the above embodiments, step 210 includes:
and determining the state information based on whether reply text corresponding to the interaction data exists or not, and the semantic information, the emotion information, the equipment information and the identity information of the user corresponding to the interaction data.
The interaction data corresponds to the identity information of the user, and can reflect the information such as age, knowledge mastering level, preference and the like of the user. For example, the age of the user can be determined according to the sound characteristics by performing voice recognition on voice data in the interaction data of the user. For example, the human face image of the user can be collected, the age of the user can be determined through a human face recognition algorithm, personal information such as the age, knowledge mastering level, preference and the like of the user can be input in advance through a user information input mode, so that user identity information of the user is formed, and the user identity information can be updated continuously along with the human-computer interaction process. Further, the state information of the corresponding terminal for different user interactions can be determined through the state information of the preset user identity information corresponding terminal when the interaction data is replied. For example, for a user of an infant, the state information of the terminal may be in a form of combining voice and image; the state information of the terminal can be in the form of voice only or action added in the interaction process for older children; for another example, for a user who prefers quiet, the state information of the terminal may be biased to a mild emotion, and for a user who prefers novel foods, the state information of the terminal may be biased to an active emotion, which is not particularly limited in the embodiment of the present invention. It can be understood that, based on whether reply text corresponding to the interaction data exists or not, and on the basis of semantic information, emotion information and equipment information, the user's identity information is further added, so that the user's interaction preference and receiving capability can be better known, and especially when the terminal cannot answer the user temporarily, the interaction experience of the user is improved by replying the reply data conforming to the age, knowledge mastering level and preference of the user.
Based on any one of the above embodiments, fig. 3 is a flowchart of a method for determining reply text according to the present invention, as shown in fig. 3, in step 210, the determining whether the reply text corresponding to the interaction data exists includes:
acquiring the voice quality of user voice in the interactive data;
determining that no reply text corresponding to the interactive data exists under the condition that the number of times that the voice quality does not meet the preset condition is larger than the preset number of times;
and judging whether a reply text corresponding to the user voice exists or not under the condition that the voice quality meets the preset condition.
Specifically, the voice quality of the user voice in the interactive data can be determined through a voice quality evaluation model or through a voice recognition technology, and can be classified by grades, scores and the like; then, the voice quality of the user voice is evaluated, if the voice quality is low, the voice quality of the user voice is judged not to meet the preset condition of interaction, and if the voice quality is medium or high, the voice quality of the user voice is judged to meet the preset condition of interaction.
When the voice quality of the user voice does not meet the preset condition of interaction, the voice quality of the user voice is evaluated by re-acquiring the user voice generally until the user voice with the voice quality meeting the preset condition can be acquired, however, the user voice is repeatedly acquired for many times, which may cause user dislike and affect the user experience. Based on this, in the embodiment of the present invention, the number of times of reacquiring the user's voice is recorded, and the maximum number of times of acquiring the user's voice, that is, the preset number of times, may be generally two times or three times.
Further, under the condition that the number of times that the acquired voice quality does not meet the preset condition is larger than the preset number of times, namely the terminal acquires the preset number of times that the voice of each user does not meet the preset condition, interaction cannot be performed based on the voice of the user, at the moment, it can be determined that no reply text corresponding to the interaction data exists, the user is not required to repeatedly ask questions, and the reply data can be generated based on the historical interaction data and the equipment information directly and in the condition that no reply text exists.
In addition, under the condition that the voice quality meets the preset condition, whether a reply text corresponding to the user voice exists or not can be further judged based on the user voice, namely, answer information in the semantic information can be searched in a preset question-answer library according to the semantic information obtained by the user voice, if the answer information is found, the reply text corresponding to the user voice exists, and if the answer information does not exist, the reply text corresponding to the user voice does not exist.
The method provided by the embodiment of the invention obtains the voice quality of the user voice in the interactive data; under the condition that the number of times that the voice quality does not meet the preset condition is larger than the preset number of times, determining that no reply text corresponding to the interactive data exists, namely, under the scene that the voice of the user cannot be accurately collected, avoiding repeated asking the user, and directly switching to a mode without the reply text to operate, thereby avoiding the user dislike caused by repeated asking and improving the user experience.
Based on any of the above embodiments, the device information further includes at least one of device mood information, device personality category, and man-machine affinity.
Here, the device emotion information of the terminal may be obtained only through other device information of the terminal, for example, the device emotion information of the terminal may be determined through the power of the terminal, for example, when the power is higher than 80%, the device emotion information of the terminal is happy, and when the power of the terminal is lower than 30%, the device emotion information of the terminal is sad. For another example, the device emotion information of the terminal can be determined through the device state information of the terminal, for example, after the terminal is awakened successfully, the device emotion information of the terminal is happy, and when the terminal is in a dormant state, the device emotion information of the terminal is wounded; or, the emotion information of the user is obtained only through the emotion information of the user, for example, a rule that the emotion information of the user corresponds to the equipment emotion information of the terminal can be preset, for example, when the emotion information of the user is happy, the equipment emotion information of the terminal is happy; or, the information of the other devices of the terminal and the emotion information of the user are obtained together, for example, the emotion corresponding to the score can be determined by weighting and scoring the information of the other devices of the terminal and the emotion information of the user, and the obtained preset score is the information of the emotion of the device of the terminal.
It will be appreciated that the device mood information may directly influence the mood at the time of reply by the terminal, and thus influence the mood colour in the reply data. For example, by presetting different device emotion information to correspond to different voice tones, actions and image styles, for example, when the device emotion is happy, the voice tones in the reply data can be biased to be happy, the actions can be clapping, and the image can contain smiling face expressions.
Here, the device personality category may be personality information conforming to an outline category of the terminal, such as when the outline of the terminal is a dolphin, the device personality category may be lively and smart. Alternatively, the device personality category may be a default setting or may be set by the user. It may be mentioned that the device personality type may directly affect the expression style in the reply data. For example, when the device character class is lively, the overall expression style of the reply data may be biased towards the lively style, e.g., the voice comparison nuance in the reply data, and the action may be waving a hand.
In addition, the affinity of the man-machine can reflect the affinity degree between the terminal and the user, and the affinity of the man-machine can be determined through the historical interaction times and the historical interaction time recorded by the terminal. For example, the affinity score can be calculated according to different weights on the historical interaction times and the historical interaction time, and the higher the affinity score is, the higher the affinity of the man-machine is, and the lower the affinity score is, the lower the affinity of the man-machine is.
It will be appreciated that the affinity of the man-machine may influence the affinity exhibited by the reply data, for example, when the affinity of the man-machine is higher, the reply data may include a name indicating that the relationship is close, such as "good friends", or an action indicating that an open arm indicates hug.
The method provided by the embodiment of the invention also determines the reply data for replying the interaction data based on at least one of the equipment emotion information, the equipment character class and the man-machine affinity, so that the terminal fuses the emotion, character and affinity of the human in the interaction process, and further the man-machine interaction is more personified and flexible.
Based on any one of the above embodiments, the present invention further provides an interaction method, which includes:
firstly, under the condition that the voice quality meets the preset condition, when no reply text corresponding to the voice of the user exists, determining state information when the terminal replies to the interactive data based on the reply text corresponding to the non-existence interactive data, semantic information and emotion information determined based on the interactive data and equipment information. The device information herein includes at least one of a profile category and/or supportable action set, device mood information, device personality category, and man-machine affinity.
Then, based on the interaction field to which the interaction data belongs, determining candidate resources for replying to the interaction data, wherein the candidate resources can be books, such as encyclopedias; characters of different identities, such as parents, friends, teacher, etc.
Further, reply data including information directing the user to query the candidate resource is determined based on the state information of the terminal.
Or under the condition that the voice quality does not meet the preset conditions, the user is not required to repeatedly ask questions, and the reply data is generated under the condition that no reply text exists directly through semantic information, emotion information and current equipment information in the history interaction data.
Finally, based on the reply data, interaction is performed.
For example, the user asks an "egg, you know why dad does not accompany me to the ocean" and when the appearance type of the terminal is egg-shaped, the user can output "alow" through audio, and the user can display a shy expression on the display screen of the terminal to interact as reply data; or when the appearance type of the terminal is humanoid, the head part can be swung left and right or the hand can be swung; or when the appearance class of the terminal is a puppy, the terminal can be lowered, the terminal can be held by both hands, and the voice output is called wang.
Based on any of the above embodiments, fig. 4 is a schematic structural diagram of an interaction device provided by the present invention, as shown in fig. 4, where the device includes:
an acquisition unit 410 that acquires interaction data, and determines semantic information and emotion information based on the interaction data;
a reply data determining unit 420, configured to determine reply data used by the terminal to reply to the interactive data based on the semantic information and the emotion information, and the device information of the terminal for interaction; the device information includes a profile class and/or a supportable action set;
and an interaction unit 430 for performing interaction based on the reply data.
The device provided by the embodiment of the invention determines the reply data used by the terminal for replying the interaction data to interact based on the semantic information and the emotion information in the interaction data and the equipment information comprising the appearance type and/or the supportable action set, can be more attached to the image of the terminal during reply, provides more anthropomorphic man-machine interaction, and particularly further improves the experience of the user and the interest of the interaction with the terminal through a flexible, interesting and strong interaction mode when the terminal temporarily cannot answer the user.
Based on any of the above embodiments, determining that the reply data unit is specifically for:
determining state information when the terminal replies to the interactive data based on whether reply text corresponding to the interactive data exists or not, and the semantic information, the emotion information and the equipment information;
and determining the reply data based on the state information.
Based on any of the above embodiments, determining that the reply data unit is specifically for:
determining candidate resources for replying to the interaction data based on the interaction field to which the interaction data belongs;
based on the status information, reply data is determined that includes directing a user to query the candidate resource.
Based on any of the above embodiments, determining that the reply data unit is specifically for:
and generating reply data corresponding to the state information based on the reply text when the reply text exists.
Based on any of the above embodiments, determining that the reply data unit is specifically for:
and determining the state information based on whether reply text corresponding to the interaction data exists or not, and the semantic information, the emotion information, the equipment information and the identity information of the user corresponding to the interaction data.
Based on any of the above embodiments, determining that the reply data unit is specifically for:
acquiring the voice quality of user voice in the interactive data;
determining that no reply text corresponding to the interactive data exists under the condition that the number of times that the voice quality does not meet the preset condition is larger than the preset number of times;
and judging whether a reply text corresponding to the user voice exists or not under the condition that the voice quality meets the preset condition.
Based on any of the above embodiments, the device information further includes at least one of device mood information, device personality category, and man-machine affinity.
Fig. 5 illustrates a physical structure diagram of a terminal, and as shown in fig. 5, the terminal may include: the device comprises a shell 500, a processor (processor) 510, a communication interface (Communications Interface) 520, a memory (memory) 530 and a communication bus 540, wherein the processor 510 and the memory 530 are arranged in the shell 500, and the processor 510, the communication interface 520 and the memory 530 are communicated with each other through the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform an interaction method comprising: acquiring interaction data, and determining semantic information and emotion information based on the interaction data; determining reply data used by the terminal for replying to the interaction data based on the semantic information, the emotion information and the equipment information of the terminal for interaction; the device information includes a profile class and/or a supportable action set; and carrying out interaction based on the reply data. And, a profile class and/or the supportable action set is determined based on the housing 500.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, is capable of performing the interaction method provided by the methods described above, the method comprising: acquiring interaction data, and determining semantic information and emotion information based on the interaction data; determining reply data used by the terminal for replying to the interaction data based on the semantic information, the emotion information and the equipment information of the terminal for interaction; the device information includes a profile class and/or a supportable action set; and carrying out interaction based on the reply data.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the interaction method provided by the above methods, the method comprising: acquiring interaction data, and determining semantic information and emotion information based on the interaction data; determining reply data used by the terminal for replying to the interaction data based on the semantic information, the emotion information and the equipment information of the terminal for interaction; the device information includes a profile class and/or a supportable action set; and carrying out interaction based on the reply data.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An interaction method, comprising:
acquiring interaction data, and determining semantic information and emotion information based on the interaction data;
determining reply data used by the terminal for replying to the interaction data based on the semantic information, the emotion information and the equipment information of the terminal for interaction; the device information includes a profile class and/or a supportable action set;
and carrying out interaction based on the reply data.
2. The method according to claim 1, wherein the determining reply data for the terminal to reply to the interaction data based on the semantic information and the emotion information, and the device information of the terminal for interaction, comprises:
determining state information when the terminal replies to the interactive data based on whether reply text corresponding to the interactive data exists or not, and the semantic information, the emotion information and the equipment information;
and determining the reply data based on the state information.
3. The interaction method of claim 2, wherein in the absence of the reply text, the determining the reply data based on the status information comprises:
Determining candidate resources for replying to the interaction data based on the interaction field to which the interaction data belongs;
based on the status information, reply data is determined that includes directing a user to query the candidate resource.
4. The interaction method of claim 2, wherein in the case where the reply text exists, the determining the reply data based on the status information comprises:
and generating reply data corresponding to the state information based on the reply text when the reply text exists.
5. The method according to claim 2, wherein the determining state information of the terminal when replying to the interactive data based on whether there is a reply text corresponding to the interactive data, and the semantic information, the emotion information, and the device information, comprises:
and determining the state information based on whether reply text corresponding to the interaction data exists or not, and the semantic information, the emotion information, the equipment information and the identity information of the user corresponding to the interaction data.
6. The interactive method according to claim 2, wherein the determining whether there is a reply text corresponding to the interactive data comprises:
Acquiring the voice quality of user voice in the interactive data;
determining that no reply text corresponding to the interactive data exists under the condition that the number of times that the voice quality does not meet the preset condition is larger than the preset number of times;
and judging whether a reply text corresponding to the user voice exists or not under the condition that the voice quality meets the preset condition.
7. The interaction method of any of claims 1 to 6, wherein the device information further comprises at least one of device mood information, device personality category, and man-machine affinity.
8. An interactive apparatus, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit acquires interaction data and determines semantic information and emotion information based on the interaction data;
determining a reply data unit, and determining reply data used by the terminal for replying the interaction data based on the semantic information, the emotion information and the equipment information of the terminal for interaction; the device information includes a profile class and/or a supportable action set;
and the interaction unit is used for carrying out interaction based on the reply data.
9. A terminal comprising a housing, a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the memory, the processor are arranged inside the housing, the processor implementing the interaction method according to any of claims 1 to 7 when executing the program, wherein the profile class and/or the supportable action set is determined based on the housing.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the interaction method according to any of claims 1 to 7.
CN202310214804.0A 2023-03-02 2023-03-02 Interaction method, interaction device, electronic equipment and storage medium Pending CN116343788A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310214804.0A CN116343788A (en) 2023-03-02 2023-03-02 Interaction method, interaction device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310214804.0A CN116343788A (en) 2023-03-02 2023-03-02 Interaction method, interaction device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116343788A true CN116343788A (en) 2023-06-27

Family

ID=86887042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310214804.0A Pending CN116343788A (en) 2023-03-02 2023-03-02 Interaction method, interaction device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116343788A (en)

Similar Documents

Publication Publication Date Title
CN106683672B (en) Intelligent dialogue method and system based on emotion and semantics
CN106548773B (en) Child user searching method and device based on artificial intelligence
CN108536802B (en) Interaction method and device based on child emotion
WO2020024582A1 (en) Speech synthesis method and related device
EP1415218B1 (en) Environment-responsive user interface / entertainment device that simulates personal interaction
US6731307B1 (en) User interface/entertainment device that simulates personal interaction and responds to user's mental state and/or personality
US6728679B1 (en) Self-updating user interface/entertainment device that simulates personal interaction
EP1332491B1 (en) User interface for the administration of an external database
CN108537321A (en) A kind of robot teaching's method, apparatus, server and storage medium
CN116704085B (en) Avatar generation method, apparatus, electronic device, and storage medium
CN110442867A (en) Image processing method, device, terminal and computer storage medium
CN109961152B (en) Personalized interaction method and system of virtual idol, terminal equipment and storage medium
CN116343788A (en) Interaction method, interaction device, electronic equipment and storage medium
Hasani et al. Immersive experience with non-player characters dynamic dialogue
US20230377238A1 (en) Autonomous animation in embodied agents
CN115617976B (en) Question answering method and device, electronic equipment and storage medium
JP6759907B2 (en) Information presentation device and program
CN116895087A (en) Face five sense organs screening method and device and face five sense organs screening system
CN114998959A (en) Picture information acquisition method and device for assisting vision-impaired user
CN117462950A (en) Virtual character control method and device, electronic equipment and storage medium
CN117315101A (en) Virtual object action generation method and device and electronic equipment
CN117241077A (en) Voice interaction method and device and electronic equipment
CN116074550A (en) Live broadcast interaction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination