CN115328303A - User interaction method and device, electronic equipment and computer-readable storage medium - Google Patents

User interaction method and device, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN115328303A
CN115328303A CN202210900586.1A CN202210900586A CN115328303A CN 115328303 A CN115328303 A CN 115328303A CN 202210900586 A CN202210900586 A CN 202210900586A CN 115328303 A CN115328303 A CN 115328303A
Authority
CN
China
Prior art keywords
information
user
response
emotion
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210900586.1A
Other languages
Chinese (zh)
Inventor
简仁贤
沈奕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Emotibot Technologies Ltd
Original Assignee
Emotibot Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Emotibot Technologies Ltd filed Critical Emotibot Technologies Ltd
Priority to CN202210900586.1A priority Critical patent/CN115328303A/en
Publication of CN115328303A publication Critical patent/CN115328303A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Abstract

The application belongs to the technical field of artificial intelligence and discloses a method, a device, electronic equipment and a computer readable storage medium for user interaction, wherein the method comprises the steps of analyzing emotion information of a user based on interaction information of the user; determining response action information of the virtual image according to the interaction information and the emotion information; generating an avatar response video based on the response action information; and playing the virtual image response video so as to respond to the user through the virtual image. Thus, real-time avatar responses are guaranteed, and consumed labor cost, time cost and resource cost are reduced.

Description

User interaction method and device, electronic equipment and computer-readable storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for user interaction, an electronic device, and a computer-readable storage medium.
Background
With the development of internet technology and intelligent devices, when a user interacts with the intelligent device, the intelligent device can respond to the user through the virtual image. The virtual image refers to an image which exists in a non-physical world and is created and used by a computer means, and the virtual image can be a digital human with multiple human characteristics, a digital animal with animal characteristics and the like.
In the prior art, an intelligent device (e.g., an intelligent sound device provided with a display screen) generally determines a response text according to voice content of a user and the like, collects actions of a performer in real time, and drives an avatar to respond according to the collected actions of the performer.
However, in this way, a professional performer is required to respond in real time, which consumes a lot of labor, time, and resource costs.
Disclosure of Invention
Embodiments of the present application provide a method, an apparatus, an electronic device, and a computer-readable storage medium for user interaction, so as to reduce consumed labor cost, time cost, and resource cost when interacting with a user through an avatar.
In one aspect, a method of user interaction is provided, including:
analyzing emotion information of the user based on the interaction information of the user;
determining response action information of the virtual image according to the interaction information and the emotion information;
generating an avatar response video based on the response action information;
and playing the avatar response video so as to respond to the user through the avatar.
In the implementation process, emotion analysis is carried out on the user interaction information so as to respond according to the interaction information and emotion information of the user, the accuracy of virtual image response is improved, and a virtual image response video is generated based on the interaction information and emotion information of the user, so that the virtual image is more vivid and natural, real-time virtual image response is guaranteed, and consumed labor cost, time cost and resource cost are reduced.
In one embodiment, before analyzing emotional information of a user based on interaction information of the user, the method further comprises:
when any one of the following interaction triggering conditions is determined to be met, acquiring interaction information:
detecting a biometric input operation of a user; detecting a touch input operation of a user on a touch screen; and detecting a key input operation of the user.
In the implementation process, the interactive information can be acquired in various ways.
In one embodiment, analyzing emotional information of a user based on interaction information of the user includes:
if the interactive information only contains one input information, determining the emotion vector of the user based on the interactive information, and determining the emotion vector as the emotion information;
and if the interactive information is determined to contain at least two kinds of input information, determining the emotion vector of each input information in the interactive information respectively, and performing weighted summation on the emotion vectors to obtain the emotion information.
In the implementation process, various input information can be comprehensively processed to determine the emotion of the user.
In one embodiment, the interaction information includes at least one of the following input information: biometric information, touch information and key information; determining an emotion vector of the user based on the interaction information, including:
if the interactive information is determined to be biological characteristic information, performing biological characteristic analysis on the biological characteristic information to obtain an emotion vector;
if the interactive information is determined to be touch information, determining the touch frequency, touch pressure and touch area of the user according to the touch information, and determining an emotion vector according to the touch frequency, touch pressure and touch area;
and if the interactive information is determined to be the key information, determining the key frequency and the key pressure of the user according to the key information, and determining the emotion vector according to the key frequency and the key pressure.
In the implementation process, different emotion analyses can be performed according to different input information, and the emotion analysis accuracy is improved.
In one embodiment, the biometric information includes at least one of the following features: speech information, face image and iris image carry out biological feature analysis to biological feature information, obtain the emotion vector, include:
if the interactive information is determined to be voice information, performing text conversion on the voice information to obtain a voice text, extracting keywords in the voice text, and determining an emotion vector according to the keywords and the tone of the voice information;
if the interactive information is determined to be the face image, performing expression analysis on the face image, and determining an emotion vector according to an expression analysis result;
if the interactive information is determined to be the iris image, comparing and analyzing the emotion of the iris image in an image matching mode, and determining an emotion vector.
In the implementation process, different manners can be adopted for analyzing different biological characteristic information of the user to obtain an accurate emotion vector.
In one embodiment, the response action information includes at least one of the following: a limb action tag for indicating a limb action of the avatar, a face action tag for indicating a facial expression action of the avatar, and a lip action tag for indicating a lip action of the avatar, the response action information of the avatar being determined according to the interaction information and the emotion information, including:
determining a response text, a limb action tag and a response emotion tag according to the interaction information and the emotion information;
acquiring a lip action label set for the response text;
a face action tag set for the responding emotion tag is acquired.
In the implementation, a limb action tag for indicating a limb action of the avatar, a face action tag for indicating a facial expression action of the avatar, and a lip action tag for indicating a lip action of the avatar, which are matched with the interaction information and the emotion information, are obtained to drive the limb action, the lip action, and the face action of the avatar in subsequent steps.
In one embodiment, generating an avatar response video based on response action information includes:
generating a response audio based on the response text;
acquiring a multimedia card corresponding to the response text;
and generating an avatar response video based on the response audio, the multimedia card, the limb action tag, the face action tag and the lip action tag.
In the implementation process, the virtual image video of natural and vivid images is generated based on the response audio, the multimedia card and each action label, and rich information can be transmitted for users.
In one aspect, an apparatus for user interaction is provided, including:
the analysis unit is used for analyzing the emotion information of the user based on the interaction information of the user;
the determining unit is used for determining the response action information of the virtual image according to the interaction information and the emotion information;
a generating unit for generating an avatar response video based on the response action information;
and a playing unit for playing the avatar response video so that a response is made to the user through the avatar.
In one embodiment, the analysis unit is further configured to:
when any one of the following interaction triggering conditions is determined to be met, acquiring interaction information:
detecting a biometric input operation of a user; detecting a touch input operation of a user on a touch screen; and detecting a key input operation of the user.
In one embodiment, the analysis unit is configured to:
if the interactive information only contains one input information, determining the emotion vector of the user based on the interactive information, and determining the emotion vector as the emotion information;
and if the interactive information is determined to contain at least two kinds of input information, respectively determining the emotion vector of each input information in the interactive information, and performing weighted summation on each emotion vector to obtain emotion information.
In one embodiment, the interaction information includes at least one of the following input information: biometric information, touch information and key information; the determination unit is to:
if the interactive information is determined to be biological characteristic information, performing biological characteristic analysis on the biological characteristic information to obtain an emotion vector;
if the interactive information is determined to be touch information, determining the touch frequency, the touch pressure and the touch area of the user according to the touch information, and determining an emotion vector according to the touch frequency, the touch pressure and the touch area;
and if the interactive information is determined to be the key information, determining the key frequency and the key pressure of the user according to the key information, and determining the emotion vector according to the key frequency and the key pressure.
In one embodiment, the biometric information includes at least one of the following features: the voice information, the face image and the iris image, the determining unit is used for:
if the interactive information is determined to be voice information, performing text conversion on the voice information to obtain a voice text, extracting keywords in the voice text, and determining an emotion vector according to the keywords and the tone of the voice information;
if the interactive information is determined to be the face image, performing expression analysis on the face image, and determining an emotion vector according to an expression analysis result;
if the interactive information is determined to be the iris image, comparing and analyzing the emotion of the iris image in an image matching mode, and determining an emotion vector.
In one embodiment, the response action information includes at least one of the following: a limb action tag for indicating a limb action of the avatar, a face action tag for indicating a facial expressive action of the avatar, and a lip action tag for indicating a lip action of the avatar, the determination unit being for:
determining a response text, a limb action tag and a response emotion tag according to the interaction information and the emotion information;
acquiring a lip action label set for the response text;
a face action tag set for a responding emotion tag is acquired.
In one embodiment, the generating unit is configured to:
generating a response audio based on the response text;
acquiring a multimedia card corresponding to the response text;
and generating an avatar response video based on the response audio, the multimedia card, the limb action tag, the face action tag and the lip action tag.
In one aspect, an electronic device is provided, comprising a processor and a memory, the memory storing computer readable instructions which, when executed by the processor, perform the steps of the method provided in any of the various alternative implementations of user interaction described above.
In one aspect, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, performs the steps of the method as provided in any of the various alternative implementations of user interaction described above.
In one aspect, a computer program product is provided which, when run on a computer, causes the computer to perform the steps of the method as provided in the various alternative implementations of any of the user interactions described above.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart of a method for user interaction according to an embodiment of the present disclosure;
FIG. 2 is a flowchart of a method for user voice interaction according to an embodiment of the present application;
fig. 3 is a flowchart of a method for user touch interaction according to an embodiment of the present disclosure;
FIG. 4 is a block diagram illustrating an exemplary user interaction apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
First, some terms referred to in the embodiments of the present application will be described to facilitate understanding by those skilled in the art.
A terminal device: may be a mobile terminal, a fixed terminal, or a portable terminal such as a mobile handset, station, unit, device, multimedia computer, multimedia tablet, internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, personal communication system device, personal navigation device, personal digital assistant, audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, gaming device, or any combination thereof, including the accessories and peripherals of these devices, or any combination thereof. It is also contemplated that the terminal device can support any type of interface to the user (e.g., wearable device), and the like.
A server: the cloud server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, big data and artificial intelligence platform and the like.
In order to reduce consumed labor cost, time cost and resource cost when interacting with a user through an avatar, the embodiment of the application provides a method, a device, an electronic device and a computer-readable storage medium for user interaction.
Referring to fig. 1, a flowchart of a user interaction method provided in the embodiment of the present application is applied to an electronic device, where the electronic device may be a server or a terminal device, and the specific implementation flow of the method is as follows:
step 101: and analyzing emotion information of the user based on the interaction information of the user.
As an example, a terminal device (e.g., a speaker) collects interaction information of a user and sends the interaction information of the user to an electronic device (e.g., a server).
As another example, an electronic device (e.g., a speaker) collects user interaction information of a user.
In one embodiment, the following steps may be adopted when collecting the user interaction information of the user:
when any one of the following interaction triggering conditions is determined to be met, acquiring interaction information:
detecting a biometric input operation of a user; detecting a touch input operation of a user on a touch screen; and detecting a key input operation of the user.
Further, the interaction triggering condition may also be: determining that a set interaction time point is reached (e.g., alarm clock timing), determining that a set event is complete (e.g., file download is complete), or determining that environmental information reaches a set environmental condition (e.g., weather is rainy today, as well as temperature reaches a high temperature threshold).
In practical application, the interaction triggering condition may be set according to a practical application scenario, which is not limited herein.
In one embodiment, the interaction information includes at least one of the following input information: biometric information, touch information, and key information.
Optionally, the interaction information may be acquired in real time by a sensor or other device (e.g., a camera device).
In practical application, the interaction information may be set according to a practical application scenario, which is not limited herein.
In one embodiment, the biometric information includes at least one of the following features: voice information, face images, and iris images.
Further, the biometric information may also be other characteristic information such as finger veins, which is not limited herein.
In practical applications, the biometric information may be set according to practical application scenarios, which is not limited herein.
As one example, it is determined that a biometric input operation of a user is detected, biometric information of the user is collected.
For example, the speaker device detects a voice command of a user, the cell access control device detects a face image of the user, and the lab access control device detects an iris image of the user.
As one example, when it is determined that a touch input operation of a user on a touch screen is detected, touch information of the user is collected.
As another example, it is determined that a key input operation by the user is detected, key information of the user is collected.
Therefore, when the interaction triggering condition is determined to be met, at least one piece of input information of the user can be collected, and the interaction information can be obtained.
In one embodiment, when step 101 is executed, any one of the following manners may be adopted:
mode 1: and if the interactive information only contains one input information, determining the emotion vector of the user based on the interactive information, and determining the emotion vector as the emotion information.
Mode 2: and if the interactive information is determined to contain at least two kinds of input information, respectively determining the emotion vector of each input information in the interactive information, and performing weighted summation on each emotion vector to obtain emotion information.
In one embodiment, the performing of weighted summation of the emotion vectors in the embodiment 2 to obtain emotion information includes: and carrying out weighted summation on the emotion vectors, and taking the obtained weighted summation result as emotion information.
Therefore, when multiple kinds of input information are obtained simultaneously, the emotions corresponding to the input information are integrated to obtain the comprehensively determined emotions.
The specific technical means for determining the emotion vector in the method 1 and the method 2 are the same.
In one embodiment, the emotion vector of a certain input information is determined by the following methods:
mode 1: if the interactive information is determined to be touch information, determining the touch frequency, the touch pressure and the touch area of the user according to the touch information, and determining the emotion vector according to the touch frequency, the touch pressure and the touch area.
As an example, the touch information is instruction information with a keyword as a content. If the user clicks a digital person (i.e., an avatar) displayed in the electronic device, the electronic device obtains touch information corresponding to the click operation, i.e., touches the digital person. The faster the touch frequency, the greater the touch pressure, and the greater the touch area, the more anxious the user is determined to be.
In one embodiment, a user interacts with a terminal device in a touch manner. The terminal equipment determines the touch frequency, the touch pressure and the touch area of a user according to the touch information, and sends the touch frequency, the touch pressure and the touch area to the electronic equipment. The electronic equipment determines an emotion vector according to the touch frequency, the touch pressure and the touch area.
As an example, a user contacts an electronic device by touch.
In practical applications, the emotion vector may be determined according to at least one of a touch frequency, a touch pressure, and a touch area.
Mode 2: and if the interactive information is determined to be the key information, determining the key frequency and the key pressure of the user according to the key information, and determining the emotion vector according to the key frequency and the key pressure.
For example, if the user's key press frequency has a certain rhythm and the key press pressure is relatively flat, the user is determined to be relatively flat.
Mode 3: and if the interactive information is determined to be the biological characteristic information, performing biological characteristic analysis on the biological characteristic information to obtain an emotion vector.
In one embodiment, performing the biometric analysis on the biometric information to obtain the emotion vector may include the following ways:
mode 1: and if the interactive information is determined to be voice information, performing text conversion on the voice information to obtain a voice text, extracting keywords in the voice text, and determining an emotion vector according to the keywords and the tone of the voice information.
Further, the terminal device can also perform text conversion on the voice information to obtain a voice text, and send the voice information and the voice text to the electronic device. The electronic equipment extracts keywords in the voice text and determines an emotion vector according to the keywords and the tone of the voice information.
As one example, a user interacts with a terminal device through speech. The terminal device uses an Automatic Speech Recognition (ASR) model to perform text conversion on a user's Speech stream (i.e., speech information) to obtain a Speech text, e.g., a real-time bar! And transmits the voice stream and the voice text to the electronic device. The electronic device performs emotion analysis on the voice stream and the voice text to obtain an emotion vector (e.g., an emotion vector that characterizes the happy mood).
Mode 2: and if the interactive information is determined to be the face image, performing expression analysis on the face image, and determining an emotion vector according to an expression analysis result.
In one embodiment, a facial image is input into an expression analysis model to obtain an emotion vector.
For example, it may be determined whether the user is smiling or angry based on the radian of the lips in the face image.
Mode 3: if the interactive information is determined to be the iris image, comparing and analyzing the emotion of the iris image in an image matching mode, and determining an emotion vector.
In one embodiment, the iris image and each image sample in the image library are subjected to expression matching respectively to obtain emotion vectors correspondingly arranged in the matched image.
Step 102: and determining response action information of the virtual image according to the interaction information and the emotion information.
Specifically, the response action information includes at least one of the following information: the method comprises the steps of obtaining interaction information of an avatar, extracting key words in the interaction information, and obtaining response action information matched with the key words and interest information.
In one embodiment, an intellectual Intelligence (AI) algorithm is used to determine a response text, a limb action tag and a response emotion tag matched with the interaction information and the emotion information, and a lip action tag set for the response text and a face action tag set for the response emotion tag are obtained.
For example, from the speech text (e.g., your true stick!) and the mood information, the answer text is determined to be: the lottery ticket can be obtained, the true result is too happy, the lip-shaped action label is laugh, and the limb action label is as follows: clapping hands, and the responding emotion label is: laughing. The facial action tags are: and (5) laughing.
Step 103: and generating an avatar response video based on the response action information.
Specifically, a response audio is generated based on the response text, and an avatar response video is generated based on the response audio, the limb action tag, the face action tag, and the lip action tag.
In one embodiment, the response audio is generated based on the response text; acquiring a multimedia card correspondingly set in the response text; and generating an avatar response video based on the response audio, the multimedia card, the limb action tag, the face action tag and the lip action tag.
In one embodiment, the following steps may be adopted when step 103 is executed:
s1031: based on the response text, response audio is generated, and based on the lip motion tag, a lip motion image sequence is obtained.
As an example, the lip motion image sequence is a plurality of lip motion images sorted in time.
S1032: and obtaining a limb action image sequence based on the limb action label.
S1033: based on the facial motion labels, a sequence of facial motion images is obtained.
As an example, from the motion library, a lip motion image sequence matched with a lip motion tag, a limb motion image sequence matched with a limb motion tag, and a face motion image sequence matched with a face motion tag are acquired, respectively.
Thus, it is possible to perform the lip motion matched in response to the audio playback by the lip motion image sequence-driven avatar, perform the limb motion by the limb motion image sequence-driven avatar, and perform the facial expression motion by the facial motion image sequence-driven avatar in the subsequent steps.
S1034: and generating an avatar response video based on the response audio, the lip motion image sequence, the limb motion image sequence and the face motion image sequence.
As an example, the touch information acquired by the electronic device is: touching the digital person, the matched answer text is: hi, the limb action tag is: call, facial action label is: smile, and lip action label is: and calling.
Therefore, the avatar response video can be generated through each image sequence and the response audio, so that when the avatar responds to the user, the avatar can have vivid and natural limb actions and facial expressions while broadcasting the voice.
Furthermore, a multimedia card can be obtained based on the response text, and the multimedia card and the avatar response video are fused to obtain a fused avatar response video.
As an example, the multimedia card may be a graphic card or a video card, etc.
Therefore, the multimedia card and the virtual image response video can be fused, the visual presentation effect is enriched, more information can be efficiently transmitted to a user, and the user experience is improved.
In practical applications, the avatar response video may also be generated in other manners based on the response audio, the multimedia card, the limb action tag, the face action tag, and the lip action tag, for example, the avatar response video is generated based on the response audio, the multimedia card, the action library correspondingly set to the limb action tag, the action library correspondingly set to the face action tag, and the action library corresponding to the lip action tag, which is not limited herein.
Step 104: and playing the virtual image response video so as to respond to the user through the virtual image.
In one embodiment, the electronic device plays an avatar answer video.
Furthermore, the electronic device can also return the avatar response video to the terminal device after receiving the interactive information of the terminal device, and play the avatar response video through the terminal device.
Referring to fig. 2, a flowchart of a method for user voice interaction according to an embodiment of the present application is shown, and the method for user interaction in fig. 1 is described with reference to fig. 2. The specific implementation flow of the method is as follows:
step 200: and when the terminal equipment detects the voice information input operation of the user, acquiring the voice information of the user.
Step 201: and the terminal equipment performs text conversion on the voice information to obtain a voice text.
Step 202: the terminal equipment sends the voice information and the voice text to the electronic equipment.
Step 203: the electronic equipment extracts the keywords in the voice text and determines emotional information according to the keywords and the tone of the voice information.
Step 204: the electronic device determines the response text, the body action tag of the avatar, the face action tag, and the lip action tag according to the voice information and the emotion information.
Step 205: and the electronic equipment generates a response audio based on the response text and acquires the multimedia card correspondingly set in the response text.
Step 206: the electronic device generates an avatar response video based on the response audio, the multimedia card, the limb action tag, the face action tag, and the lip action tag.
Step 207: and the terminal equipment receives and plays the virtual image video sent by the electronic equipment.
Referring to fig. 3, a flowchart of a method for user touch interaction provided in an embodiment of the present application is shown, and the method for user interaction in fig. 1 is described with reference to fig. 3. The specific implementation flow of the method is as follows:
step 300: when the terminal equipment detects the touch input operation of a user, the touch information of the user is obtained.
Step 301: and the terminal equipment determines the touch frequency, the touch pressure and the touch area of the user according to the touch information.
Step 302: and the terminal equipment determines the touch frequency, the touch pressure and the touch area of the user according to the touch information and sends the touch frequency, the touch pressure and the touch area to the electronic equipment.
Step 303: the electronic equipment determines emotion information according to the touch frequency, the touch pressure and the touch area.
Step 304: and the electronic equipment determines a response text, a limb action label of the virtual image, a face action label and a lip action label according to the touch information and the emotion information.
Step 305: and the electronic equipment generates a response audio based on the response text and acquires the multimedia card correspondingly set in the response text.
Step 306: the electronic device generates an avatar response video based on the response audio, the multimedia card, the limb action tag, the face action tag, and the lip action tag.
Step 307: and the terminal equipment receives and plays the virtual image video sent by the electronic equipment.
In the embodiment of the application, emotion analysis is performed on the user interaction information, so that response is performed according to the interaction information and emotion information of the user, the accuracy of avatar response is improved, and an avatar response video is generated based on the interaction information and emotion information of the user, so that an avatar is more vivid and natural, real-time avatar response is ensured, and consumed human cost, time cost and resource cost are reduced.
Based on the same inventive concept, the embodiment of the present application further provides a user interaction device, and as the principle of the device and the apparatus for solving the problem is similar to that of a user interaction method, the implementation of the device can refer to the implementation of the method, and repeated details are omitted.
As shown in fig. 4, which is a schematic structural diagram of an apparatus for user interaction according to an embodiment of the present application, including:
an analysis unit 401 configured to analyze emotion information of the user based on the interaction information of the user;
a determining unit 402, configured to determine response action information of the avatar according to the interaction information and the emotion information;
a generating unit 403 for generating an avatar response video based on the response action information;
a playing unit 404 for playing the avatar response video so that the user is responded by the avatar.
In one embodiment, the analysis unit 401 is further configured to:
when any one of the following interaction triggering conditions is determined to be met, acquiring interaction information:
detecting a biometric input operation of a user; detecting a touch input operation of a user on a touch screen; and detecting a key input operation of the user.
In one embodiment, the analysis unit 401 is configured to:
if the interactive information only contains one input information, determining the emotion vector of the user based on the interactive information, and determining the emotion vector as the emotion information;
and if the interactive information is determined to contain at least two kinds of input information, respectively determining the emotion vector of each input information in the interactive information, and performing weighted summation on each emotion vector to obtain emotion information.
In one embodiment, the interaction information includes at least one of the following input information: biometric information, touch information and key information; the determining unit 402 is configured to:
if the interactive information is determined to be the biological characteristic information, performing biological characteristic analysis on the biological characteristic information to obtain an emotion vector;
if the interactive information is determined to be touch information, determining the touch frequency, touch pressure and touch area of the user according to the touch information, and determining an emotion vector according to the touch frequency, touch pressure and touch area;
and if the interactive information is determined to be the key information, determining the key frequency and the key pressure of the user according to the key information, and determining the emotion vector according to the key frequency and the key pressure.
In one embodiment, the biometric information includes at least one of the following features: the voice information, the face image, and the iris image, and the determining unit 402 is configured to:
if the interactive information is determined to be voice information, performing text conversion on the voice information to obtain a voice text, extracting keywords in the voice text, and determining an emotion vector according to the keywords and the tone of the voice information;
if the interactive information is determined to be the face image, performing expression analysis on the face image, and determining an emotion vector according to an expression analysis result;
if the interactive information is determined to be the iris image, comparing and analyzing the emotion of the iris image in an image matching mode, and determining an emotion vector.
In one embodiment, the response action information includes at least one of the following: a limb action tag for indicating a limb action of the avatar, a face action tag for indicating a facial expressive action of the avatar, and a lip action tag for indicating a lip action of the avatar, the determination unit 402 is configured to:
determining a response text, a limb action tag and a response emotion tag according to the interaction information and the emotion information;
acquiring a lip action label set for the response text;
a face action tag set for the responding emotion tag is acquired.
In one embodiment, the generating unit 403 is configured to:
generating a response audio based on the response text;
acquiring a multimedia card correspondingly set in the response text;
and generating an avatar response video based on the response audio, the multimedia card, the limb action tag, the face action tag and the lip action tag.
In the method, the device, the electronic equipment and the computer-readable storage medium for user interaction provided by the embodiment of the application, the emotion information of a user is analyzed based on the interaction information of the user; determining response action information of the virtual image according to the interaction information and the emotion information; generating an avatar response video based on the response action information; and playing the virtual image response video so as to respond to the user through the virtual image. In this way, by performing emotion analysis on the user interaction information to respond according to the interaction information and emotion information of the user, the accuracy of avatar response is improved, and an avatar response video is generated based on the interaction information and emotion information of the user, so that the avatar is more vivid and natural, real-time avatar response is ensured, and the consumed human cost, time cost and resource cost are reduced.
Fig. 5 shows a schematic structural diagram of an electronic device 5000. Referring to fig. 5, the electronic device 5000 includes: the processor 5010 and the memory 5020 can optionally include a power supply 5030, a display unit 5040, and an input unit 5050.
The processor 5010 is a control center of the electronic apparatus 5000, connects various components using various interfaces and lines, and performs various functions of the electronic apparatus 5000 by running or executing software programs and/or data stored in the memory 5020, thereby monitoring the electronic apparatus 5000 as a whole.
In the embodiment of the present application, the processor 5010 executes each step in the above embodiments when calling a computer program stored in the memory 5020.
Optionally, the processor 5010 can include one or more processing units; preferably, the processor 5010 may integrate an application processor, which mainly handles operating systems, user interfaces, applications, etc., and a modem processor, which mainly handles wireless communications. It is to be appreciated that the modem processor described above may not be integrated within processor 5010. In some embodiments, the processor, memory, and/or memory may be implemented on a single chip, or in some embodiments, they may be implemented separately on separate chips.
The memory 5020 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, various applications, and the like; the storage data area may store data created according to the use of the electronic device 5000, and the like. Further, the memory 5020 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other volatile solid state storage device.
The electronic device 5000 also includes a power supply 5030 (e.g., a battery) that provides power to the various components and that may be logically coupled to the processor 5010 through a power management system to enable management of charging, discharging, and power consumption functions through the power management system.
The display unit 5040 may be configured to display information input by a user or information provided to the user, and various menus of the electronic device 5000, and in this embodiment of the present invention, the display unit is mainly configured to display a display interface of each application in the electronic device 5000 and objects such as texts and pictures displayed in the display interface. The display unit 5040 may include a display panel 5041. The Display panel 5041 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
The input unit 5050 may be used to receive information such as numbers or characters input by a user. Input units 5050 may include touch panel 5051 as well as other input devices 5052. Among other things, the touch panel 5051, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 5051 (e.g., operations by a user on or near the touch panel 5051 using a finger, a stylus, or any other suitable object or attachment).
Specifically, the touch panel 5051 can detect a touch operation by a user, detect signals resulting from the touch operation, convert the signals into touch point coordinates, transmit the touch point coordinates to the processor 5010, and receive and execute a command transmitted from the processor 5010. In addition, the touch panel 5051 may be implemented using various types of resistive, capacitive, infrared, and surface acoustic waves. Other input devices 5052 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, power on/off keys, etc.), a trackball, a mouse, a joystick, etc.
Of course, the touch panel 5051 may cover the display panel 5041, and when the touch panel 5051 detects a touch operation thereon or thereabout, it is transferred to the processor 5010 to determine the type of touch event, and then the processor 5010 provides a corresponding visual output on the display panel 5041 in accordance with the type of touch event. Although in fig. 5, the touch panel 5051 and the display panel 5041 are implemented as two separate components to implement input and output functions of the electronic device 5000, in some embodiments, the touch panel 5051 and the display panel 5041 may be integrated to implement input and output functions of the electronic device 5000.
The electronic device 5000 may also include one or more sensors, such as pressure sensors, gravitational acceleration sensors, proximity light sensors, and the like. Of course, the electronic device 5000 may further include other components such as a camera according to the requirements of a specific application, and these components are not shown in fig. 5 and are not described in detail since they are not components used in this embodiment of the present application.
Those skilled in the art will appreciate that fig. 5 is merely an example of an electronic device and is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or different components.
In an embodiment of the present application, a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the communication device may be enabled to execute the steps in the above embodiments.
For convenience of description, the above parts are described separately as modules (or units) according to functions. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A method of user interaction, comprising:
analyzing emotion information of a user based on interaction information of the user;
determining response action information of the virtual image according to the interaction information and the emotion information;
generating an avatar response video based on the response action information;
and playing the avatar response video so as to respond to the user through the avatar.
2. The method of claim 1, wherein prior to the analyzing the emotional information of the user based on the interaction information of the user, the method further comprises:
when any one of the following interaction triggering conditions is determined to be met, acquiring the interaction information:
detecting a biometric input operation of a user; detecting a touch input operation of a user on a touch screen; and detecting a key input operation of the user.
3. The method of claim 1, wherein analyzing the emotional information of the user based on the interaction information of the user comprises:
if the interactive information only contains one input information, determining the emotion vector of the user based on the interactive information, and determining the emotion vector as the emotion information;
and if the interactive information is determined to contain at least two kinds of input information, respectively determining the emotion vector of each input information in the interactive information, and performing weighted summation on each emotion vector to obtain the emotion information.
4. The method of claim 3, wherein the interaction information comprises at least one of the following input information: biometric information, touch information and key information; the determining an emotion vector of the user based on the interaction information comprises:
if the interactive information is determined to be biological characteristic information, performing biological characteristic analysis on the biological characteristic information to obtain the emotion vector;
if the interactive information is determined to be touch information, determining touch frequency, touch pressure and touch area of the user according to the touch information, and determining the emotion vector according to the touch frequency, the touch pressure and the touch area;
and if the interactive information is determined to be key information, determining the key frequency and the key pressure of the user according to the key information, and determining the emotion vector according to the key frequency and the key pressure.
5. The method of claim 4, wherein the biometric information comprises at least one of the following feature information: speech information, face image and iris image, it is right to carry out the biological feature analysis to biological feature information, obtain the mood vector includes:
if the interactive information is determined to be voice information, performing text conversion on the voice information to obtain a voice text, extracting keywords in the voice text, and determining the emotion vector according to the keywords and the tone of the voice information;
if the interactive information is determined to be a face image, performing expression analysis on the face image, and determining the emotion vector according to an expression analysis result;
and if the interactive information is determined to be the iris image, comparing and analyzing the emotion of the iris image in an image matching mode, and determining the emotion vector.
6. The method of any one of claims 1-5, wherein the response action information includes at least one of: a limbs action label for instructing the limbs action of avatar, a facial action label for instructing the facial expression action of avatar, and a lip action label for instructing the lip action of avatar, according to mutual information and emotion information, confirm the response action information of avatar, include:
determining a response text, a limb action tag and a response emotion tag according to the interaction information and the emotion information;
acquiring a lip action label set for the response text;
and acquiring a face action label set for the response emotion label.
7. The method of claim 6, wherein generating an avatar response video based on the response action information comprises:
generating a response audio based on the response text;
acquiring a multimedia card correspondingly set to the response text;
generating the avatar response video based on the response audio, the multimedia card, the limb action tag, the face action tag, and the lip action tag.
8. An apparatus for user interaction, comprising:
the analysis unit is used for analyzing the emotion information of the user based on the interaction information of the user;
the determining unit is used for determining response action information of the virtual image according to the interaction information and the emotion information;
a generating unit for generating an avatar response video based on the response action information;
a playing unit for playing the avatar response video so that the user is responded by the avatar.
9. An electronic device comprising a processor and a memory, the memory storing computer readable instructions that, when executed by the processor, perform the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202210900586.1A 2022-07-28 2022-07-28 User interaction method and device, electronic equipment and computer-readable storage medium Pending CN115328303A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210900586.1A CN115328303A (en) 2022-07-28 2022-07-28 User interaction method and device, electronic equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210900586.1A CN115328303A (en) 2022-07-28 2022-07-28 User interaction method and device, electronic equipment and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN115328303A true CN115328303A (en) 2022-11-11

Family

ID=83919994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210900586.1A Pending CN115328303A (en) 2022-07-28 2022-07-28 User interaction method and device, electronic equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN115328303A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116758908A (en) * 2023-08-18 2023-09-15 中国工业互联网研究院 Interaction method, device, equipment and storage medium based on artificial intelligence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116758908A (en) * 2023-08-18 2023-09-15 中国工业互联网研究院 Interaction method, device, equipment and storage medium based on artificial intelligence
CN116758908B (en) * 2023-08-18 2023-11-07 中国工业互联网研究院 Interaction method, device, equipment and storage medium based on artificial intelligence

Similar Documents

Publication Publication Date Title
WO2022078102A1 (en) Entity identification method and apparatus, device and storage medium
CN110807388B (en) Interaction method, interaction device, terminal equipment and storage medium
CN111277706B (en) Application recommendation method and device, storage medium and electronic equipment
WO2021232930A1 (en) Application screen splitting method and apparatus, storage medium and electric device
US9547471B2 (en) Generating computer responses to social conversational inputs
CN107977928B (en) Expression generation method and device, terminal and storage medium
CN110765294B (en) Image searching method and device, terminal equipment and storage medium
CN110598046A (en) Artificial intelligence-based identification method and related device for title party
CN111027419B (en) Method, device, equipment and medium for detecting video irrelevant content
CN111491123A (en) Video background processing method and device and electronic equipment
CN114049892A (en) Voice control method and device and electronic equipment
CN111798259A (en) Application recommendation method and device, storage medium and electronic equipment
CN114357278A (en) Topic recommendation method, device and equipment
CN115328303A (en) User interaction method and device, electronic equipment and computer-readable storage medium
CN114428842A (en) Method and device for expanding question-answer library, electronic equipment and readable storage medium
CN112843681B (en) Virtual scene control method and device, electronic equipment and storage medium
CN111796926A (en) Instruction execution method and device, storage medium and electronic equipment
CN106777066B (en) Method and device for image recognition and media file matching
CN114547242A (en) Questionnaire investigation method and device, electronic equipment and readable storage medium
CN112364649B (en) Named entity identification method and device, computer equipment and storage medium
CN110750193B (en) Scene topology determination method and device based on artificial intelligence
CN112752155B (en) Media data display method and related equipment
CN113392686A (en) Video analysis method, device and storage medium
CN114416931A (en) Label generation method and device and related equipment
CN111723783A (en) Content identification method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination