CN111506184A

CN111506184A - Avatar presenting method and electronic equipment

Info

Publication number: CN111506184A
Application number: CN201910099285.1A
Authority: CN
Inventors: 齐晓宇; 宋睿华; 李笛; 崔庆才
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2020-08-07
Also published as: WO2020159621A1

Abstract

The present disclosure relates to an avatar presentation method and an electronic device, the avatar presentation method including presenting an avatar related to a service mode; acquiring related information when a user interacts with an avatar; obtaining an analysis result obtained by analyzing the related information by using the trained neural network model; and updating the avatar according to the analysis result, and presenting the updated avatar. Based on the technical scheme of presenting the avatar by using the trained neural network model to analyze the related information of the user in the interaction with the avatar, the avatar can respond to the user's needs more accurately by applying the expression, the action, the environment of the user, the chat information, the interaction duration and the like of the user and the avatar to the scheme of presenting the avatar according to the combination of various analysis results.

Description

Avatar presenting method and electronic equipment

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly, to an avatar rendering method and an electronic device.

Background

With the development of artificial intelligence technology, artificial intelligence chat robots have been widely used in various industries. In practical application, a user issues an instruction to the intelligent robot through a voice input mode, and the intelligent robot receives and recognizes the voice instruction through a voice recognition technology to provide various services for the user or make a corresponding response. In order to make the interaction between the user and the artificial intelligence chat robot more accurate, the development of the interaction mode between the user and the intelligent robot is needed.

Disclosure of Invention

The summary of the embodiments of the disclosure is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This disclosure is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

The technical scheme of the present disclosure for presenting the avatar based on the analysis result of the trained neural network model for the related information of the user interacting with the avatar enables the avatar to respond more accurately to the user's needs by combining various analysis results by applying the expression of the user, the posture of the user, the environment where the user is located, the chat information, the interaction duration, etc. of the user and the avatar to the scheme for presenting the avatar.

The foregoing description is only an overview of the technical solutions of the present disclosure, and the embodiments of the present disclosure are described below in order to make the technical means of the present disclosure more clearly understood and to make the above and other objects, features, and advantages of the present disclosure more clearly understandable.

Drawings

FIG. 1 is a schematic diagram of an avatar rendering solution of an embodiment of the present disclosure;

FIG. 2 is a flow diagram of an avatar rendering method of an embodiment of the present disclosure;

FIG. 3 is a diagram of an exemplary scenario of presenting an updated avatar according to an expression of a user in accordance with an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a neural network model of an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an exemplary scenario for presenting an avatar according to environmental information, in accordance with an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an exemplary scenario for adjusting the background of an avatar according to environmental information, according to an embodiment of the present disclosure;

FIG. 7 is a diagram of an exemplary scene graph presenting a combined image of a user and an updated avatar in the context of the updated avatar of an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of an exemplary scenario of rendering an updated avatar according to a user's gesture, according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of an exemplary scenario in which an avatar controls an external device according to chat information, according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of rendering an avatar according to changes in interaction duration according to an embodiment of the present disclosure; and

fig. 11 is a block diagram of an electronic device of an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a schematic diagram of an avatar presentation technical solution of an embodiment of the present disclosure. It should be understood that the schematic diagram shown in fig. 1 is merely exemplary, and should not be construed to limit the functionality or scope of the implementations described in this disclosure in any way. As shown in fig. 1, the electronic device provides a service required by the user to the user 110, for example, may obtain service mode information including a service selected by the user, present an avatar 120 corresponding to the selected service for the user 110 according to the obtained service mode information indicating the service selected by the user, update the avatar 120 according to related information when interacting with the user 110, and present the updated avatar.

It should be noted that the avatar in the present disclosure may be an avatar presented in a screen of an electronic device, or an avatar presented by VR technology, or may be an entity robot, and a specific implementation manner of the avatar in the present disclosure is not limited at all.

The following describes the avatar rendering scheme of the embodiments of the present disclosure in further detail with reference to the accompanying drawings.

Fig. 2 is a flowchart of an avatar presentation method of an embodiment of the present disclosure. Referring to fig. 2, the avatar rendering method of the embodiment of the present disclosure specifically includes the following steps:

in step S210, an avatar associated with the service mode is presented.

In step S210, the user selects a desired service mode by means of voice input, text input, touch input, hover input, or the like, and the electronic device receives an instruction of the user including an indication of the selected service mode and presents an avatar corresponding to the service mode selected by the user to the user according to the service mode selected by the user.

The service mode may include a chat application service, a music application service, a news application service, etc., which the electronic device presents through the avatar.

In one embodiment, the user selects a chat service mode by way of voice instructions, and the electronic device presents an avatar related to the chat service to the user according to the user selected chat service mode, such as a chat application presented in an avatar manner.

Step S220: the relevant information is obtained while the user interacts with the avatar.

After the electronic device presents the avatar related to the user-selected service mode to the user, the electronic device may further acquire related information while the user interacts with the avatar in step S220. The related information may refer to an image of the user (a still image or a moving image of the user) during the avatar interaction, chat information when the user interacts with the avatar, and an interaction duration of the user interacting with the avatar.

The image of the user may reflect the expression of the user, the posture of the user, and the environment information of the user.

The expression of the user may be, for example, neutral, smiling, grin, sad, surprised, disgusted, angry, slight, fear, laugh, and painful, etc., of the user when interacting with the avatar, which can represent the emotion of the user.

The user's gestures may be, for example, head gestures (e.g., nodding or shaking the head, etc.), hand gestures (e.g., waving the hand, etc.), and body gestures (e.g., hugging, etc.) of the user when interacting with the avatar.

The environment information may be, for example, information of an environment in which the user is located, for example, an indoor environment (e.g., bedroom, etc.) or an outdoor environment (e.g., park, etc.). Further, the environmental information may also include information that may indicate weather in the environment in which the user is located (e.g., sun, snow, rain, etc. that appear in the environment in which the user is located).

Step S230: and obtaining an analysis result obtained by analyzing the related information by using the trained neural network model.

In step S230, after providing the avatar corresponding to the service mode selected by the user to the user, the obtained related information may be analyzed using the neural network model to predict the behavior, attitude, emotion, intention, and the like of the user so as to respond according to the analysis result.

Specifically, the trained neural network model may be stored locally in the electronic device, or may be stored in the cloud server. When the trained neural network model can be stored locally in the electronic equipment, relevant information is input into the locally trained neural network model for analysis and identification, so that an analysis result is obtained. When the trained neural network model can be stored in a cloud server, the terminal device sends an analysis request to the cloud server, the analysis request comprises relevant information, the cloud server responds to the analysis request, the relevant information is input into the trained neural network model for analysis and recognition, an analysis result is obtained, and the analysis result is used as feedback information of the analysis request to be returned to the electronic device.

Step S240: and updating the avatar according to the analysis result, and presenting the updated avatar.

In step S240, the electronic device responds to the user according to the analysis result, for example, whether the expression of the user is happy or not, whether the environment is indoors or outdoors, the meaning expressed by the action of the user, and the like, updates the avatar, and presents the updated avatar to the user.

According to the method for presenting the avatar, the avatar can be updated by analyzing the related information and integrating various factors such as the behavior, attitude, emotion and intention of the user, so that the method can respond to the user more accurately and intelligently, and the immersion of the interaction between the user and the avatar is improved.

The specific flow of the embodiment of the present disclosure is introduced above by referring to the flowchart of the avatar presenting method of fig. 2, and the following further refers to fig. 3 to fig. 10 to respectively describe in detail the technical solutions for presenting the updated avatar.

Fig. 3 is a schematic diagram of an exemplary scene in which an updated avatar is presented according to an expression of a user according to an embodiment of the present disclosure.

In the exemplary application scenario of FIG. 3, the avatar 320 is an avatar of, for example, a girl that may be displayed within the electronic device 330. During the interaction between the user 310 and the avatar 320, images of the user 310 may be acquired by a camera of the electronic device 330, and the acquired images of the user 310 may be analyzed by the trained neural network model to identify the expression of the user. For example, if the user 310 is presenting a smiling expression when interacting with the avatar 320, the camera of the electronic device 330 may take a picture or video of the user's smile, analyze the taken picture or video using a trained neural network model provided within the electronic device or from a cloud server, recognize that the user's expression in the picture or video is "smile", retrieving an expression pattern corresponding to the expression of the smile from an expression database provided in the electronic device or from a cloud server according to the recognized expression of the smile, updating the expression of the avatar to the smile or an expression corresponding to the expression of the smile, and present to the user an avatar having an expression corresponding to the user's "smile" expression, e.g., present to the user 310 an avatar having a smile expression as shown in fig. 3.

Further, in a chat interaction between the user and the avatar, the electronic device may capture an image of the user in real-time and adjust the expression presented by the avatar based on a real-time analysis of the image. For example, when an expression of "anger" of the user is predicted from an image that the user is angry, an expression pattern corresponding to the expression of "anger" may be acquired from an expression database provided within the electronic device or from an external server, the expression of the avatar may be updated to an expression corresponding to an expression of "anger" of the user, and the corresponding expression may be presented to the user, for example, the avatar may present an expression of "fear" corresponding to "anger".

Further, in an embodiment, the action mode (tremor) corresponding to the expression of the "fear" may be further obtained from an action database in the electronic device or from the cloud server, so that the expression and the action of the avatar are updated according to the expression mode (fear) and the action mode (tremor) corresponding to the expression of the "anger", and the updated avatar is presented to the user, for example, the updated avatar may not only have the expression of the "fear", but also have the action of the "tremor" indicating the meaning of the "fear".

Further, in one embodiment, in identifying the expression of the user, a Convolutional Neural Network (CNN) model may be specifically used to identify the expression of the user. Fig. 4 is a schematic diagram of a neural network model of an embodiment of the present disclosure.

As shown in fig. 4, the convolutional neural network model includes an input layer, convolutional layers (e.g., convolutional layers 420 or 440), max-pooling layer 430, fully-connected layers (e.g., fully-connected layers 450 and 460). Firstly, a facial expression image of a face is extracted from an expression database in advance, the extracted facial expression image is preprocessed, for example, the facial expression image of the face is normalized to obtain an image sample with key feature points of the face, the obtained image sample is used as a training sample and is input to a convolutional layer of a CNN (neural network) to complete the training of a CNN model for expression recognition through the adjustment of the weight of the neural network model. After the camera of the electronic device captures an expression photo of the user, the expression photo is input into the trained CNN model for prediction, and if the predicted value for recognizing that the expression in the captured photo of the user (for example, the photo of the user smiling) is smiling is greater than a certain threshold value, the expression of the user is determined to be smiling.

After the expression of the user is recognized, the electronic device may extract an expression pattern corresponding to the expression of the user from an expression database in the electronic device or from a cloud server and present the expression through an avatar.

Further, the manner in which the electronic device specifically presents the expression may include: basic facial elements corresponding to facial expressions of a human being are provided, for example, basic facial elements of different sizes and shapes of eyebrows, nose, eyes, upper lip, lower lip, teeth, cheek contours, under nose, and the like may be provided. When the expression pattern corresponding to the expression of the user is acquired from the expression database, all or some of the basic facial elements may be combined according to the acquired corresponding expression, so that the expression pattern is presented. For example, the avatar's face may be composed of the above-described basic facial elements, and when the expression of "smiling" of the user is recognized, the expression of smiling of the avatar may be presented by adjusting only the facial basic elements of both the upper and lower lips. Optionally, the electronic device may further draw the expression of the avatar in real time according to the recognized expression of the user, and present the real-time drawn expression to the user.

Further, the electronic device may also present an avatar associated with the avatar of the user according to the image of the user. For example, the electronic device may generate an avatar that is similar to the avatar of the user. Optionally, the embodiment of the disclosure may also generate a favorite avatar of the user according to the preference of the user, so that the user feels that the updated avatar is more intimate and real, thereby improving the user experience.

Further, the electronic device may further analyze environmental information in the related information and present an avatar corresponding to the environmental information, and the following describes in detail a technical solution of presenting an update according to the environmental information with reference to fig. 5.

Fig. 5 is a schematic diagram of an exemplary scenario of presenting an avatar corresponding to environmental information, according to an embodiment of the present disclosure.

Referring to fig. 5, the environment in which the user 510 is located is an outdoor snowy environment, and thus the user 510 has a feature of "down jacket" 540, and the environment in which the user is located has environmental information of "snowflakes" 530. The avatar is an animated character of a girl, for example, that may be displayed within a display of the electronic device. In this embodiment, an image of a user is captured by a camera of an electronic device, and the captured image of the user is analyzed using a trained neural network model, environmental information such as "snow" 530 or "snow" corresponding thereto is recognized, and it is determined that the user 510 is located in a cold outdoor environment based on the environmental information such as "snow" 530 or "snow" corresponding thereto, so as to add an environmental feature corresponding to the cold environment to the avatar 520, such as adding an environmental feature "scarf" 550 as shown in fig. 5 to the avatar. Furthermore, embodiments of the present disclosure may also add an environmental feature "scarf" 550 to the avatar in conjunction with the identified environmental feature of the user's "down jacket" 540, and present the updated avatar 520 with the "scarf" 550 to the user 510.

Further, if the electronic device recognizes that the environment in which the user is located is an outdoor rainy environment, an environmental feature "umbrella" corresponding to "raindrops" or its corresponding "rainy" environmental information may be added to the avatar.

Specifically, when the electronic device identifies the environment in which the user is located, the environment in which the user is located may be specifically identified through the neural network model. An object-detected CNN model may be trained, for example, by referring to the expression-recognized CNN model in fig. 4, for example. The accuracy of object detection is improved by providing training data to the CNN model and adjusting the weights of the neural network model by back propagation. For example, after an image of a user is captured using a camera of an electronic device, an object of "snowflake" 530 may be extracted, features (e.g., colors such as textures, shapes, etc.) of the object "snowflake" 530 may be captured, and the extracted features may be input into a trained object-detected CNN model, if a result of the prediction exceeds a prediction threshold, the recognized object may be confirmed as "snowflake", and a picture associated with the object "snowflake" 530 may be extracted from the electronic device or an image database from a cloud server or generated in real time, and an avatar having an environmental feature of "snowflake" 530 or an environmental feature associated with "snowflake" in an updated background may be presented.

It should be noted that, in the technical solution of the present disclosure, a plurality of neural network models may also be trained respectively based on different training data to respectively recognize the gesture of the user, the object associated with the gesture of the user, and the like, and specifically, the structure of the neural network model for recognizing an expression of fig. 4 may be referred to for training.

Further, the electronic device may further analyze environmental information in the related information and adjust the environment of the avatar according to the identified environmental information, and the following describes in detail a technical solution of adjusting the background of the avatar according to the environmental information of the user with reference to fig. 6.

FIG. 6 is a schematic diagram of an example scenario of adjusting a background of an avatar according to environmental information, according to an embodiment of the present disclosure.

Referring to FIG. 6, the environment in which the user 610 is located is an outdoor environment under the sun 630, while the background of the avatar 620 is an indoor bedroom environment, including objects common in bedroom scenes such as beds, windows, photo frames, etc. in the background of the avatar 620.

During the interaction of the user 610 with the avatar 620, an image of the user 610 is captured by a camera of the electronic device, environmental information of the user 610 is recognized from the captured image of the user 610 through the trained neural network model, for example, environmental information "sun" 630 is recognized from the image of the user 610, it is determined that the background of the avatar 620 also has environmental information "sun" 630, and an image 640 corresponding to the environmental information "sun" 630 is added to the background of the avatar 620, for example, the image 640 corresponding to "sun" 630 is added outside the window of the bedroom background of the avatar 620 of fig. 6, so that the user and the avatar's environment are kept consistent.

Further, in one embodiment, if the electronic device recognizes that the environment information "snow" or "rain" from the image of the user of FIG. 5 above, it may be determined that the weather of the environment in which the user is located is "rain/snow", the electronic device may add "snow" and "rain" into the background of the avatar with the environment information representing "rain/snow" added, e.g., add "snow" and "rain" outside the window in the bedroom background of the avatar of FIG. 6.

In the embodiment of the disclosure, the avatar and the background of the avatar are adjusted according to the environment information of the user, so that the avatar and the background of the avatar are consistent with the environment of the user, the association between the avatar and the background thereof and the user is higher, the interactivity is better, and the user experience is improved.

Further, embodiments of the present disclosure may also present a combined image of the user and the updated avatar in the context of the updated avatar. This scheme will be described in detail below in conjunction with fig. 7.

FIG. 7 is a schematic diagram of an exemplary scene graph presenting a combined image of a user and an updated avatar in the background of the updated avatar of an embodiment of the present disclosure.

Referring to FIG. 7, after the electronic device obtains an image of the user 710, a group photo of the user 710 with the avatar 720 may be generated and added to the photo frame 730 of the bedroom background of the avatar 720, making the user's interaction with the avatar more realistic so that the user may have a feeling of being personally on the scene.

Further, after the electronic device acquires the image of the user, the trained neural network model may be used to analyze the image of the user to recognize the posture of the user, update the posture of the avatar according to the posture of the user, present the posture of the updated avatar, recognize an object associated with the posture of the user, update the avatar according to the posture of the user and the object associated with the posture of the user, and present the posture of the updated avatar and the associated object, which will be described in detail below with reference to fig. 8.

FIG. 8 is a schematic diagram of an exemplary scenario for rendering an updated avatar according to a user's gesture, according to an embodiment of the present disclosure. In the scenario shown in fig. 8, a user 810 holds a gift in his hand, when the user 810 approaches the display 830 of the electronic device and makes an action of approaching the gift in his hand to the display 830, the electronic device recognizes the action of the user 810 approaching the display and extending his hand to the display and an object "gift" 840 associated with the action of the user 810 using the trained neural network model, and may combine the voice information "happy birthday" of the user, analyze that the intention of the user 810 is "send a birthday gift", the electronic device acquires an action corresponding to the intention of "send a birthday gift" from a video database in the electronic device or from an external server according to the analyzed intention of the user, updates the avatar according to the acquired action, and presents the updated avatar to the user, for example, the electronic device may present an avatar receiving the action of the "gift" 850 "of the user and simultaneously generate voice information" thank for voice information Thank you "and play the voice message. The embodiment of the disclosure can enable the interaction between the user and the avatar to be more real by presenting the updated avatar according to the gesture of the user and the object corresponding to the gesture, and enhances the immersion of the user and the avatar interaction.

In one embodiment, when the electronic device recognizes a gesture of the user (e.g., a waving gesture), it may appear that the avatar makes a responsive action corresponding to the waving gesture. For example, depending on the context of the user and avatar interaction, when a hand waving gesture of the user is recognized (e.g., the hand waving gesture represents the user calling or faring away from the avatar), information may be presented that calls up or fares away from the user by hand waving or replies to the user "hello" or "goodbye" by way of voice input.

In one embodiment, if the electronic device recognizes a user's "hug" action, the avatar may also present a corresponding "hug" action, and while recognizing the "hug" action, may also recognize a user's expression (e.g., smiling expression), and while responding to the user's "hug" action, present a "smiling" expression to the user.

In one embodiment, the electronic device may also detect, from a gesture of the user, a meaning represented by the gesture. For example, when a number-related expression is involved in the user's interaction with the avatar, the number may be expressed using gestures, in addition to being expressed by means of voice input or text input. For example, when a user presents a "five fingers open" gesture to an avatar, the electronic device recognizes that the gesture represents the number "five"; when the user presents a gesture of "open thumb and index finger and close little finger, ring finger, and middle finger" to the avatar, the gesture is recognized as representing the number "eight", and when the user presents a gesture of "clenching fist" to the avatar, the gesture is recognized as representing the number "ten". After the gesture of the user expressing Chinese numbers is recognized, the interaction between the user and the avatar can be supplemented by combining the chat information of the user, the intention of the user can be more accurately understood, and the avatar presented to the user is more accurate and intelligent.

In one embodiment, during user interaction with an avatar, a user may engage in game interaction with the avatar, and may engage in game play with the user by recognizing gestures of the user. For example, in a traditional game of "scissors, stone, and cloth", the electronic device may recognize three gestures respectively corresponding to "scissors", "stone", and "cloth" in the game of "scissors, stone, and cloth", and respond according to the gesture of the user.

In one embodiment, when a head pose of a user is identified using a trained neural network model (e.g., a nodding or shaking gesture, where nodding corresponds to an active attitude represented by the user and shaking corresponds to a passive attitude represented by the user), an avatar that may be rendered responds corresponding to the head pose of the user. For example, when it is necessary to respond to the user according to the attitude of the user, the head attitude of the user may be detected by a camera or the like of the electronic device when the response of the avatar to the user is inaccurate due to the influence of noise of the environment where the user is located or insufficient and inaccurate data obtained by the avatar, and if the head attitude of the user is detected to be a head nodding attitude of the user, it indicates that the attitude of the user is a positive attitude, and if the head nodding attitude of the user is detected, it indicates that the attitude of the user is a negative attitude. The user's intent can be more accurately understood based on the chat information of the context in which the user interacts with the avatar and in combination with the attitude expressed by the user's head pose, so that a more accurate and intelligent avatar can be presented.

Further, the related information may further include chatting information, and the method of presenting an avatar may further include acquiring an intention of the avatar analyzed by the neural network model for the chatting information, and controlling the external device to perform an operation corresponding to the intention. The technical scheme of the avatar controlling the outside according to the chat information will be described in detail below with reference to fig. 9.

FIG. 9 is a schematic diagram of an exemplary scenario for presenting an avatar based on chat information, according to an embodiment of the present disclosure.

In the above-described exemplary scenario of fig. 8, after the electronic device presents a situation that the avatar receives the "present" of the user, in order to express thank you to the user, the "send user a thank you card" is mentioned in the chat information, the chat information is analyzed by the trained neural network model and the intention of the avatar "send user a thank you card" is extracted from the chat information, and an operation corresponding to the "send user a thank you card" is performed according to the intention of the avatar, for example, an external device (e.g., the printer 920 in fig. 9) is controlled to provide the "thank you card" 930 to the user through the network. Optionally, embodiments of the present disclosure may also display the card 910 in the background of the avatar 900.

In one embodiment, in the environment of the smart home system, the electronic device may further analyze chat information when the user interacts with the avatar using the trained neural network model, extract an intention of the avatar to control various home appliances, generate a control instruction for controlling home electronics, and control the home appliances in the smart home system, such as an air conditioner, a television, a light bulb, and the like, according to the control instruction.

Further, according to the technical scheme of the disclosure, the updated avatar can be updated and presented according to the analysis result and the interaction duration of the user and the avatar, and the updated avatar has the age characteristic corresponding to the interaction duration. The following describes in detail a technical solution for presenting an updated avatar according to an interaction duration with reference to fig. 10.

FIG. 10 is a schematic diagram of rendering an avatar according to a change in interaction duration according to an embodiment of the present disclosure.

In one embodiment, the electronic device may record the duration of the user's interaction with the avatar, and configure the avatar with different age characteristics based on changes in the duration of the interaction, where the age characteristics may reflect the apparent growth of the avatar. In this embodiment, the following mapping of the interaction duration and the avatar age may be set, for example:

the duration of interaction of 0.5 hours corresponds to an avatar aged 2 years;

an interaction duration of 1 hour corresponds to a 3 year old avatar;

……

an activity duration of 10 hours corresponds to an avatar aged 17 years;

……

and different age characteristics are configured for the avatars of different ages, for example, as shown in fig. 10, the avatar 1010 in the left figure is a 3-year-old avatar with an interaction duration of 1 hour, and the avatar 1020 in the right figure is a 17-year-old avatar with an interaction duration of 10 hours, wherein the avatar 1010 may be configured with appearance characteristics reflecting the age corresponding to a 3-year-old child, such as short hair, jagged teeth, short and small height, and young clothes worn by the child; and avatar 1020 may be configured with age-reflecting appearance characteristics corresponding to a 17 year old teenager, such as long hair, new teeth that have grown, higher height, and more mature clothing worn.

Further, in the embodiment of the present disclosure, the electronic device may also invoke a chat model corresponding to the age characteristic to present updated chat information of the avatar, so that the idea of the avatar also grows as the interaction duration changes.

Referring to fig. 10, for example, the user may continue to configure the ideological feature corresponding to the appearance feature while configuring the appearance feature of the corresponding age for avatars having different interaction durations. That is, a mapping of the duration of the interaction, the age of the avatar, appearance characteristics corresponding to the age, and ideological characteristics corresponding to the age may be established. For example, it may be preconfigured to: the interaction duration is 1 hour, the avatar is a child aged 3 years old, the avatar corresponding to the age of 3 years is the avatar 1010 in the left figure of fig. 10, and the corresponding thought features are a primary language library, which may include common words, vocabularies and sentences of the child aged 3 years, wherein the common sentences may have some grammatical errors due to the limitation of the learning ability of the child aged 3 years. Or in embodiments of the present disclosure, expressions that include grammatical errors may be preconfigured in the primary language library such that the language expression of the avatar 1010 is more consistent with the identity of a 3 year old child. For example, when the user asks the avatar for age, the avatar 1010 in FIG. 10 answers "3 years old, I, this year" with a substantially clear expression but with obvious grammatical errors.

In the embodiment of the present disclosure, it is also possible to configure in advance: the avatar corresponding to the avatar with the avatar interaction time of 10 hours has an age of 17 years, the avatar feature corresponding to the age of 17 years is the avatar 1020 in the right diagram of fig. 10, and the corresponding thought feature is a high-level language library, which may include common words, vocabulary and common complete sentences of the common age of 17 years. For example, when the user asks the avatar's age, the avatar 1020 in FIG. 10 answers "I'm 17 years of this year" which is more clearly expressed than the expression of the avatar 1010 and without any grammatical errors.

It should be noted that although the embodiments of the present disclosure use avatars in girl figures, in the embodiments of the present disclosure, there is no limitation on the gender of the avatar, and the embodiments of the present disclosure also do not have any limitation on the mapping of the interaction duration to the age characteristics of the avatar, and avatars with different age characteristics may be presented by the change of the interaction duration according to actual needs.

By adjusting the appearance and the idea of the avatar according to the change of the interaction time, the avatar can be more natural and vivid to express, so that the user can feel that the avatar grows along with the growth of the user, and the immersion of the interaction between the user and the avatar is improved.

Fig. 11 is a block diagram of an electronic device 1100 that includes a memory 1110 and a processor 1120 of an embodiment of the disclosure. Memory 1110, among other things, is used for executing programs of the above-described operations and may also be configured to store other data to support operations on electronic device 1100, examples of which include instructions for any application or method operating on electronic device 1100, contact data, messages, pictures, videos, and so forth.

Memory 1110 may be volatile memory (e.g., registers, cache, Random Access Memory (RAM)), non-volatile memory (e.g., Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory), or some combination thereof. The memory may include one or more program modules configured to perform the functions of the various implementations described herein. The computer modules may be accessed and executed by a processing unit to perform the corresponding functions. The storage devices may be removable or non-removable media and may include machine-readable media that can be used to store information and/or data and that can be accessed within the electronic device.

The processor 1120 may be a real or virtual processor and may be capable of performing various processes according to programs stored in a memory. The processing unit may also be referred to as a Central Processing Unit (CPU), microprocessor, controller, microcontroller.

The memory 1110 is coupled to the processor 1120 and stores instructions that, when executed by the processor 1120, cause the electronic device 1100 to: presenting an avatar related to the service mode; acquiring related information when a user interacts with an avatar; obtaining an analysis result obtained by analyzing the related information by using the trained neural network model; and updating the avatar according to the analysis result, and presenting the updated avatar.

The related information may be an image of the user, chat information, and interaction duration between the user and the avatar.

In one embodiment, the electronic equipment acquires user expressions obtained by analyzing images of a user by using a neural network model; and updating the avatar according to the expression of the user.

In one embodiment, the electronic device obtains environment information obtained by analyzing the image of the user by using the neural network model, and updates the avatar according to the environment information.

In one embodiment, the electronic device corresponds the avatar's context to the environmental information.

In one embodiment, the electronic device obtains a gesture of the user obtained by analyzing the image of the user with the neural network model, and updates the avatar according to the gesture of the user.

In one embodiment, the electronic device also obtains an object associated with the gesture of the user from analysis of the image of the user using the neural network model, and renders the object.

In one embodiment, the electronic device acquires an intention of an avatar analyzed from chatting information using a neural network model, and controls the external device to perform an operation corresponding to the intention.

In one embodiment, the electronic device updates the avatar according to the analysis result and the interaction duration of the user and the avatar, and the updated avatar has an age characteristic corresponding to the interaction duration.

In one embodiment, the electronic device invokes a chat model corresponding to the age characteristic to present chat information for the updated avatar.

In one embodiment, the electronic device generates a combined image of the user and the updated avatar and presents the combined image in the background of the updated avatar.

The above processing operations are described in detail in the foregoing method embodiments and embodiments of specific application scenarios, and the details of the above processing operations are also applicable to the electronic device 1100, that is, the specific processing operations mentioned in the foregoing embodiments may be written in the processor 1110 in a program manner and executed by the processor 1120.

Further, in some embodiments, the electronic device is implemented as various user terminals or service terminals. The service terminal can be a server or a large electronic device provided by various service parties. The user terminal, such as any type of mobile terminal, fixed terminal, or portable terminal, includes a mobile handset, multimedia computer, multimedia tablet, internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, Personal Communication System (PCS) device, personal navigation device, Personal Digital Assistant (PDA), audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, game device, or any combination thereof, including accessories and peripherals of these devices, or any combination thereof. Further, the electronic device can support any type of interface to the user (such as a wearable device).

Further, in some embodiments, as shown in fig. 11, the electronic device 1110 may also include a communication module 1130, a power module 1140, an audio module 1150, a display 1160, a bus 1170, a camera module 1180, and other components. It should be noted that fig. 11 only schematically shows some of the components, and does not mean that the electronic device 1100 only includes the components shown in fig. 11.

The communication module 1130 is configured to facilitate wired or wireless communication between the electronic device 1100 and other devices, the electronic device 1100 may access a wireless network based on a communication standard, such as WiFi, L TE, 5G, and the like, or combinations thereof.

A power module 1140 that provides power to the various components of the electronic device 1100. The power module 1140 may include a power management system, one or more power supplies, and its components associated with generating, managing, and distributing power to electronic devices (e.g., a charger that powers the electronic devices wirelessly or by wire).

The audio module 1150 is configured to output and/or input an audio signal. For example, the audio module 1150 includes a Microphone (MIC) configured to receive audio signals from a user when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1110 or transmitted via the communication module 1130. In some embodiments, the audio module 1150 also includes a speaker for outputting audio signals of the user interacting with the avatar.

The display 1160 includes a screen, which may be configured to present an avatar, which may include a liquid crystal display (L CD) and a touch panel.

A camera module 1180 configured to provide images of the user to the avatar.

The memory 1110, processor 1120, communication module 1130, power module 1140, audio module 1150, display 1160, and camera module 1180 described above may be coupled to the bus 1170. The bus 1170 may provide an interface between the processor 1120 and the rest of the components in the electronic device 1100. The bus 1170 may also provide an interface for the various components in the electronic device 1100 to access the memory 1110 and a communication interface for the various components to access each other.

For example, without limitation, exemplary types of hardware logic that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex programmable logic devices (CP L D), and so forth.

Exemplary clauses

An avatar rendering method, comprising:

presenting an avatar related to the service mode;

acquiring related information when a user interacts with the avatar;

obtaining an analysis result obtained by analyzing the related information by using the trained neural network model; and

and updating the avatar according to the analysis result, and presenting the updated avatar.

In one embodiment, the related information includes an image of the user; obtaining the analysis result includes: acquiring the expression of the user obtained by analyzing the image of the user by using the neural network model; updating the avatar includes: and updating the avatar according to the expression of the user. In one embodiment, the related information includes an image of the user; obtaining the analysis result includes: acquiring environmental information obtained by analyzing the image of the user by using the neural network model; updating the avatar includes: and updating the avatar according to the environment information.

In one embodiment, presenting the updated avatar includes: the avatar's context is made to correspond to the environmental information.

In one embodiment, the related information includes an image of the user; obtaining the analysis result includes: acquiring the gesture of the user obtained by analyzing the image of the user by using the neural network model; updating the avatar includes: updating the avatar according to the user's gesture. In one embodiment, an object associated with the user's gesture, which is obtained by analyzing the user's image using the neural network model, is also obtained; presenting the updated avatar includes: the object is presented.

In one embodiment, the related information includes chat information; obtaining the analysis result includes: obtaining the intention of the avatar obtained by analyzing the chat information by using the neural network model; the method further comprises the following steps: controlling the external device to perform an operation corresponding to the intention.

In one embodiment, updating the avatar according to the analysis result includes: and updating the avatar according to the analysis result and the interaction duration of the user and the avatar, wherein the updated avatar has age characteristics corresponding to the interaction duration.

In one embodiment, the method further comprises: invoking a chat model corresponding to the age characteristic to present chat information of the updated avatar.

In one embodiment, the method further comprises: generating a combined image of the user and the updated avatar, and presenting the combined image in a background of the updated avatar.

An electronic device, comprising: a processor; and a memory coupled to the processor and storing instructions that, when executed by the processor, cause the electronic device to: presenting an avatar related to the service mode; acquiring related information when a user interacts with the avatar; obtaining an analysis result obtained by analyzing the related information by using the trained neural network model; and updating the avatar according to the analysis result, and presenting the updated avatar.

In one embodiment, the related information includes an image of the user; obtaining the analysis result includes: acquiring the expression of the user obtained by analyzing the image of the user by using the neural network model; updating the avatar includes: and updating the avatar according to the expression of the user.

In one embodiment, the related information includes an image of the user;

obtaining the analysis result includes: acquiring environmental information obtained by analyzing the image of the user by using the neural network model; updating the avatar includes: and updating the avatar according to the environment information.

In one embodiment, the related information includes an image of the user; obtaining the analysis result includes: acquiring the gesture of the user obtained by analyzing the image of the user by using the neural network model; updating the avatar includes: updating the avatar according to the user's gesture.

In one embodiment, the instructions, when executed by the processor, cause the electronic device to further obtain an object associated with a gesture of the user analyzed an image of the user with the neural network model; presenting the updated avatar includes: the object is presented.

In one embodiment, the related information includes chat information; obtaining the analysis result includes: obtaining the intention of the avatar obtained by analyzing the chat information by using the neural network model; the instructions, when executed by the processor, cause the electronic device to control an external device to perform an operation corresponding to the intent.

In one embodiment, the instructions, when executed by the processor, cause the electronic device to: invoking a chat model corresponding to the age characteristic to present chat information of the updated avatar.

In one embodiment, the instructions, when executed by the processor, cause the electronic device to: generating a combined image of the user and the updated avatar, and presenting the combined image in a background of the updated avatar.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of embodiments of the present disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a flash memory, an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Those skilled in the art will appreciate that the operations, methods, steps in the processes, acts and technical solutions discussed in the embodiments of the present invention may be replaced, changed, combined or deleted. Moreover, other steps, measures and solutions in the operations, methods, procedures discussed in the present invention may be replaced, changed, rearranged, combined or deleted. Furthermore, the prior art having the operations, methods, procedures discussed in this disclosure may be replaced, changed, rearranged, combined, and removed.

Therefore, although the disclosure has been described with respect to specific embodiments, it will be understood that the disclosure is intended to cover all modifications and equivalents within the scope of the following claims.

Claims

1. An avatar rendering method, comprising:

presenting an avatar related to the service mode;

acquiring related information when a user interacts with the avatar;

2. The avatar rendering method of claim 1,

the related information comprises an image of the user;

obtaining the analysis result includes: acquiring the expression of the user obtained by analyzing the image of the user by using the neural network model;

updating the avatar includes: and updating the avatar according to the expression of the user.

3. The avatar rendering method of claim 1,

the related information comprises an image of the user;

obtaining the analysis result includes: acquiring environmental information obtained by analyzing the image of the user by using the neural network model;

updating the avatar includes: and updating the avatar according to the environment information.

4. The avatar rendering method of claim 3,

presenting the updated avatar includes: the avatar's context is made to correspond to the environmental information.

5. The avatar rendering method of claim 1,

the related information comprises an image of the user;

obtaining the analysis result includes: acquiring the gesture of the user obtained by analyzing the image of the user by using the neural network model;

updating the avatar includes: updating the avatar according to the user's gesture.

6. The avatar rendering method of claim 5, wherein,

further acquiring an object associated with the user's gesture, which is obtained by analyzing the user's image using the neural network model;

presenting the updated avatar includes: the object is presented.

7. The avatar rendering method of claim 1,

the related information comprises chat information;

obtaining the analysis result includes: obtaining the intention of the avatar obtained by analyzing the chat information by using the neural network model;

the method further comprises the following steps: controlling the external device to perform an operation corresponding to the intention.

8. The avatar rendering method of claim 1, wherein updating the avatar according to the analysis result comprises:

and updating the avatar according to the analysis result and the interaction duration of the user and the avatar, wherein the updated avatar has age characteristics corresponding to the interaction duration.

9. The avatar rendering method of claim 8, further comprising:

invoking a chat model corresponding to the age characteristic to present chat information of the updated avatar.

10. The avatar rendering method of claim 1, further comprising:

generating a combined image of the user and the updated avatar, and presenting the combined image in a background of the updated avatar.

11. An electronic device, comprising:

a processor; and

a memory coupled to the processor and storing instructions that, when executed by the processor, cause the electronic device to:

presenting an avatar related to the service mode;

acquiring related information when a user interacts with the avatar;

12. The electronic device of claim 11,

the related information comprises an image of the user;

13. The electronic device of claim 11,

the related information comprises an image of the user;

14. The electronic device of claim 13,

15. The electronic device of claim 11,

the related information comprises an image of the user;

16. The electronic device of claim 15, wherein the instructions, when executed by the processor, cause the electronic device to further obtain an object associated with a gesture of the user analyzed an image of the user with the neural network model;

presenting the updated avatar includes: the object is presented.

17. The electronic device of claim 11,

the related information comprises chat information;

the instructions, when executed by the processor, cause the electronic device to control an external device to perform an operation corresponding to the intent.

18. The electronic device of claim 11, wherein updating the avatar according to the analysis results comprises:

19. The electronic device of claim 18, wherein the instructions, when executed by the processor, cause the electronic device to:

20. The electronic device of claim 11, wherein the instructions, when executed by the processor, cause the electronic device to: generating a combined image of the user and the updated avatar, and presenting the combined image in a background of the updated avatar.