CN112987932B

CN112987932B - Human-computer interaction and control method and device based on virtual image

Info

Publication number: CN112987932B
Application number: CN202110316646.0A
Authority: CN
Inventors: 陈睿智; 赵晨; 章生
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2023-04-18
Anticipated expiration: 2041-03-24
Also published as: CN112987932A

Abstract

The utility model discloses a human-computer interaction method based on virtual image, which relates to the virtual reality field, in particular to the fields of human-computer interaction, artificial intelligence, deep learning, internet of things, voice technology and the like. The specific implementation scheme is as follows: displaying the avatar on the smart device; controlling the virtual image to carry out communication and interaction with the user; and controlling the virtual image to carry out consumption recommendation on the user in the process of communication and interaction between the virtual image and the user.

Description

Human-computer interaction and control method and device based on virtual image

Technical Field

The present disclosure relates to the field of virtual reality, and more particularly to the fields of human-computer interaction, artificial intelligence, deep learning, internet of things, and voice technology. And more particularly, to an avatar-based human-machine interaction method and apparatus, an avatar-based control method and apparatus, another avatar-based human-machine interaction method and apparatus, an electronic device, a non-transitory computer-readable storage medium storing computer instructions, and a computer program product.

Background

In future augmented reality systems, the avatar will be the main bearer for human-computer interaction.

Currently, the avatar generation App on the market generally requires the user to upload a photo and then automatically generate a primary avatar based on the portrait in the photo. However, to realize the final avatar, the user is required to manually pinch the face of the primary avatar by using the face pinching function in the App.

Disclosure of Invention

The present disclosure provides a human-computer interaction based on an avatar, a control method, an apparatus, a device, a storage medium and a computer program product.

According to an aspect of the present disclosure, there is provided an avatar-based human-machine interaction method, including: displaying the avatar on the smart device; controlling the virtual image to carry out communication interaction with a user; and controlling the virtual image to carry out consumption recommendation on the user in the process of communication and interaction between the virtual image and the user.

According to another aspect of the present disclosure, there is provided an avatar-based control method, including: the virtual image displayed on the intelligent equipment is remotely controlled to carry out communication interaction with a user; and issuing a marketing strategy aiming at the user to the intelligent equipment in the process of communication and interaction between the virtual image and the user so that the virtual image carries out consumption recommendation on the user based on the marketing strategy.

According to another aspect of the present disclosure, there is provided another avatar-based human-machine interaction method, including: displaying an avatar on a specific interactive device set in a specific place; and controlling the virtual image to carry out communication interaction with the user in the process of the user moving in the specific place.

According to another aspect of the present disclosure, there is provided an avatar-based human-computer interaction apparatus, including: the first display module is used for displaying the virtual image on the intelligent equipment; the first control module is used for controlling the virtual image to carry out communication and interaction with a user; and the second control module is used for controlling the virtual image to carry out consumption recommendation on the user in the process of communication and interaction between the virtual image and the user.

According to another aspect of the present disclosure, there is provided an avatar-based control apparatus, including: the third control module is used for remotely controlling the virtual image displayed on the intelligent equipment to perform communication interaction with the user; and the first sending module is used for issuing a marketing strategy aiming at the user to the intelligent equipment in the process of communication and interaction between the virtual image and the user so that the virtual image carries out consumption recommendation on the user based on the marketing strategy.

According to another aspect of the present disclosure, there is provided another avatar-based human-computer interaction apparatus, including: the second display module is used for displaying the virtual image on the specific interaction equipment arranged in the specific place; and the fourth control module is used for controlling the virtual image to carry out communication interaction with the user in the process of the user moving in the specific place.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to embodiments of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1A illustrates a system architecture of an avatar generation method and apparatus suitable for embodiments of the present disclosure;

fig. 1B illustrates a scene diagram of an avatar generation method and apparatus in which the embodiments of the present disclosure may be implemented;

FIG. 1C illustrates a flow diagram of an avatar-based human-machine interaction method in accordance with an embodiment of the present disclosure;

FIG. 1D illustrates a flow chart of an avatar-based control method according to an embodiment of the present disclosure;

FIG. 1E illustrates a flow chart of an avatar-based human-machine interaction method according to another embodiment of the present disclosure;

fig. 2 illustrates a flowchart of a generation method of an avatar according to an embodiment of the present disclosure;

FIG. 3 illustrates a schematic diagram of a semantic transformation according to an embodiment of the present disclosure;

4A-4D illustrate schematic views of avatar sliders according to embodiments of the present disclosure;

FIG. 5 illustrates a schematic diagram of a reference avatar with finished bone-to-skin binding according to an embodiment of the present disclosure;

FIG. 6 illustrates a schematic diagram of generating avatar sliders, according to an embodiment of the present disclosure;

FIG. 7 illustrates a schematic diagram of generating an avatar according to an embodiment of the present disclosure;

FIG. 8A illustrates a block diagram of an avatar-based human-machine interaction device, according to an embodiment of the present disclosure;

fig. 8B illustrates a block diagram of an avatar-based control apparatus according to an embodiment of the present disclosure;

FIG. 8C illustrates a block diagram of an avatar-based human-machine interaction device, according to another embodiment of the present disclosure; and

fig. 9 illustrates a block diagram of an electronic device for creating an avatar and an apparatus for implementing an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Currently, commercial avatar generation apps typically require a user to upload a photograph and then automatically generate a primary avatar based on the portrait in the photograph. However, to realize the final avatar, the user is required to manually pinch the face of the primary avatar by using the face pinching function in the App.

Although the traditional solution idea of automatic generation and manual customization can finally provide a virtual image result, users often have difficulty in acquiring an ideal virtual image satisfying the users. The reason is that through the traditional solution idea, the user cannot perform efficient personalized customization of the virtual image.

Illustratively, the user wants to create an avatar with a high nose bridge, large eyes, and thin chin. In the above conventional manner, first, it is highly likely that the user cannot find a real person photograph having a similar character. Secondly, even if a real person photo with similar image features can be found, with the above face pinching function, the user needs to search for the five sense organs one by one, for example, in the bases of nose shape, eye shape, face shape, and the like. Generally, an App provides dozens of shapes for a single facial organ for a user to select, so that only selecting the shape of each facial organ takes dozens of or even dozens of minutes, and many users are likely to have no time to select the shape of the facial organ which is satisfactory to the user.

In addition, the traditional avatar customization scheme is inconvenient for users to use and even causes great frustration and experience harm to the users.

The intelligent virtual image generation scheme based on language description can solve the technical problems and realize efficient personalized customization of the virtual image. The present disclosure will be described in detail below with reference to specific examples.

A system architecture of the avatar generation method and apparatus suitable for the embodiments of the present disclosure is introduced as follows.

Fig. 1A illustrates a system architecture of an avatar generation method and apparatus suitable for the embodiments of the present disclosure. It should be noted that fig. 1A is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be used in other environments or scenarios.

As shown in fig. 1A, the system architecture 100 may include: the terminal device 101. It should be noted that the terminal device 101 may be a client or a server, and the disclosure is not limited herein.

Specifically, an initial avatar may be generated as a reference avatar by the terminal device 101 or other devices. If the user wants to customize an ideal type avatar satisfying personalized needs, such as "high nose bridge, large eyes, thin chin" is ideal, the user can describe the ideal type avatar through language. After the terminal device 101 obtains the language description of the user, it may extract corresponding semantic features based on the language description, and then generate the ideal avatar based on the extracted semantic features.

In the embodiment of the present disclosure, according to the language description of the user, the terminal device 101 can accurately identify the semantic feature information of the avatar required by the user, and then intelligently generate the avatar. According to the scheme, on one hand, the production efficiency of the personalized virtual image can be improved, and more convenient use experience is provided for a user; on the other hand, the accuracy of customizing the virtual image can be improved, and an ideal virtual image is output to a user.

It should be understood that the number of terminal devices in fig. 1A is merely illustrative. There may be any number of terminal devices, as desired for implementation.

Application scenarios suitable for the avatar generation method and apparatus of the disclosed embodiments are introduced below.

It should be understood that, at present, the offline expense guidance mode is mainly shared by bloggers in a shop, but effective recommendation is difficult to achieve if consumers do not actively browse the content shared by bloggers.

It should also be appreciated that personalized avatars generally have similarities to real persons, discrimination between different avatars, and affinity enhancement through the cartoon appearance of the avatars. This can help promote user subjective acceptance of the avatar. Therefore, the virtual image is created, the human-computer interaction based on the virtual image is realized, and the consumption recommendation and the consumption accompany are carried out by the virtual image in time in the process of carrying out the human-computer interaction based on the virtual image, so that the conversion rate of the consumption recommendation can be effectively improved, and the user experience degree can be improved.

For example, in the embodiment of the present disclosure, the user may be guided to consume online in a home scene based on the avatar, or accompanied by consuming online in a consumption scene based on the avatar.

As shown in fig. 1B, a user may create a dedicated avatar through a mobile phone client, and transmit the dedicated avatar to the cloud, and the cloud issues the dedicated avatar to an intelligent device associated with the user and an interactive device (such as a mall interactive device, a supermarket interactive device, a restaurant interactive device, and the like) deployed in a public place such as a business trip visited by the user, so as to be used by the user during human-computer interaction.

It should be noted that, in the embodiment of the present disclosure, the intelligent device includes an internet of things IOT device with a display screen, such as a refrigerator, a television, an intelligent mirror, an intelligent glass, and the like. After the avatar is issued to the IOT device, the avatar can be displayed when the IOT device is triggered, and the avatar is used as the front end bearer of the intelligent voice conversation system to perform functional or meaningless conversation communication with the user in a drawing and painting manner. In the conversation communication process, the virtual image can carry out offline consumption recommendation on the user in time according to the marketing recommendation instruction issued by the cloud.

In addition, in the embodiment of the disclosure, the mall interaction device also includes an electronic device with a display screen, which is disposed in the mall and can interact with the user. Illustratively, when a user visits a shopping mall, the user identity can be determined through a face recognition technology, and the cloud is informed to issue a corresponding virtual image. For example, when the user a tries on clothes in a certain brand clothing shop, the dedicated avatar of the user a can be downloaded from the cloud, and the dedicated avatar is displayed at the edge of the try-on mirror near the user a, so that the dedicated avatar interacts with the user, such as recommending user fitting. For example, when the user B eats a single hot pot in a certain hot pot restaurant, a small display screen can be erected on the edge of the hot pot of the user B, and the exclusive virtual image of the user B is displayed on the small display screen, so that the user B can carry out meaningless interaction or game with the user B.

In one embodiment of the disclosure, a user can upload a photo, then the avatar generation App automatically generates a primary avatar according to a portrait in the photo, and then the user manually performs face pinching operation on the primary avatar to generate own exclusive avatar by using a face pinching function in the App.

In another embodiment of the present disclosure, the user may also generate his own dedicated avatar based on the language-described avatar intelligent generation scheme.

In addition, in another embodiment of the present disclosure, the cloud may also give a marketing strategy in due time by analyzing the big data of the user, so that the virtual image displayed on the smart home device completes a marketing task, in addition to receiving the virtual image created by the mobile phone client and sending the virtual image to the smart device associated with the user and the interaction device deployed in public places such as business superman visited by the user. For example, if the user does not eat the hot pot for a long time due to raining in a certain day and cold weather, the virtual image can deeply recommend a hot pot dining place to the user.

Through the embodiment of the disclosure, consumption recommendation and accompanying service are provided by using the virtual image similar to a real person, so that the user can generate a sense of identity. Through the virtual image recommendation, the user consumption can be further guided to be offline. In a real consumption scene, the exclusive virtual image of the user can accompany the user, and the user is helped to recommend commodities, accompany the user to eat and chat. Therefore, the communication between the online service and the offline service can be realized.

The present disclosure provides a human-computer interaction method based on an avatar.

As shown in FIG. 1C, the human-computer interaction method 100C can be applied to a home scenario, including the following operations S110C-S130C.

In operation S110C, an avatar is displayed on the smart device.

In operation S120C, the avatar is controlled to interact with the user.

In operation S130C, the avatar is controlled to perform consumption recommendation on the user during the interaction between the avatar and the user.

In one embodiment of the present disclosure, the smart device may comprise a smart home device. For example, in a home scenario, a dedicated avatar of a user may be displayed on a display screen of the smart home device after the smart home device is triggered. In other embodiments of the present disclosure, after the smart home device is triggered, other avatars that are not specific to the user may be displayed on the display screen of the smart home device.

The virtual image is displayed on the intelligent household equipment and can be used as the front end of the intelligent voice conversation system to bear, and functional or meaningless conversation communication, game interaction and the like are carried out with the user in a drawing and painting way. In the conversation exchange or game interaction process, the virtual image can timely carry out offline consumption recommendation on the user according to the marketing recommendation instruction (including the marketing strategy) issued by the cloud. For example, if the user has no hot pot for a long time due to raining in a certain day and cold weather, the virtual image can deeply recommend a hot pot dining place to the user.

In a home scene, a user can establish trust and recognition of an avatar through interaction with the avatar on the intelligent home equipment. Thereby making the personalized avatar the formation partner of the user. Therefore, the cloud terminal directly issues marketing recommendation instructions to the virtual image, and the possibility that consumption guidance is finished by the virtual image is high.

In the embodiment of the disclosure, through accompanying of the personalized virtual image, the trust sense and the recognition degree of the user on the virtual image can be established, so that the consumption recommendation taking the virtual image as an interactive carrier is realized, and the conversion rate of the consumption recommendation is improved.

As an alternative embodiment, controlling the avatar to make consumption recommendations to the user may include the following operations.

And acquiring a marketing strategy issued by the cloud aiming at the user.

And the control virtual image carries out consumption recommendation on the user based on the marketing strategy.

In another embodiment of the disclosure, the cloud, in addition to receiving the virtual image created by the mobile phone client and issuing the virtual image to the smart home device associated with the user and the interaction device deployed in public places such as business superintendent of the user, may also give a targeted marketing strategy in due time for a specific user by analyzing the big data of the user, and control the virtual image displayed on the smart home device to complete the marketing task. Therefore, accurate marketing can be realized, and more reasonable, accurate and satisfactory consumption recommendation is provided for users.

As an alternative embodiment, displaying the avatar on the smart device may include: displaying the exclusive avatar of the user on the smart device. The exclusive virtual image is issued to the intelligent device by the cloud.

Illustratively, a user can create own exclusive avatar through a mobile phone client, upload the exclusive avatar to a cloud, and then when the user triggers the intelligent device of the user, the cloud issues the created exclusive avatar to the intelligent device of the user for display.

In the embodiment of the disclosure, the exclusive avatar of the user is used, so that the trust and approval of the user can be obtained more easily during the formation of the avatar, and further, the consumption recommendation can be completed more easily by the avatar.

The present disclosure also provides a control method based on the avatar.

As shown in fig. 1D, the avatar-based control method 100D may be applied to a server side such as a cloud, and includes the following operations S110D to S120D.

In operation S110D, the avatar displayed on the smart device is remotely controlled to perform interactive interaction with the user.

In operation S120D, in the process of communication and interaction between the avatar and the user, a marketing strategy for the user is issued to the smart device, so that the avatar performs consumption recommendation on the user based on the marketing strategy.

In an embodiment of the present disclosure, taking a home scene as an example, the smart device includes a smart home device. After the intelligent household equipment is triggered, the cloud end can remotely control the intelligent household equipment, such as displaying the exclusive virtual image of the user on the display screen of the intelligent household equipment. In other embodiments of the present disclosure, after the smart home device is triggered, the cloud may also remotely control the smart home device, such as displaying other avatars that are not exclusive to the user on a display screen of the smart home device.

The virtual image is displayed on the intelligent household equipment and can be used as the front end of the intelligent voice conversation system to bear, and functional or meaningless conversation communication, game interaction and the like are carried out with the user in a drawing and painting way. In the conversation communication or game interaction process, the virtual image can timely carry out offline consumption recommendation on the user according to the marketing recommendation instruction (including the marketing strategy) issued by the cloud. For example, if the user does not eat the hot pot for a long time due to raining in a certain day and cold weather, the virtual image can deeply recommend a hot pot dining place to the user.

In a home scene, a user can exchange and interact with the virtual image on the intelligent home equipment, and trust of the virtual image is easier to establish, so that consumption recommendation given by the virtual image is easier to recognize, and the personalized virtual image is easy to become a formation partner object of the user. Therefore, the cloud directly issues marketing recommendation instructions to the virtual image, and the virtual image has a high possibility of completing consumption guidance.

In the embodiment of the disclosure, through accompanying of the personalized virtual image, the trust sense and the recognition degree of the user on the virtual image can be increased, so that the consumption recommendation taking the virtual image as an interactive carrier is realized, and the conversion rate of the consumption recommendation is improved.

As an alternative embodiment, the method may further include the following operations.

And acquiring a special virtual image provided by a user.

And issuing the exclusive avatar to the intelligent equipment associated with the user so that the intelligent equipment displays the exclusive avatar when facing the user and performs man-machine interaction with the user through the exclusive avatar.

Illustratively, a user can create own exclusive avatar through a mobile phone client, upload the exclusive avatar to a cloud, and then when the user triggers the intelligent device of the user, the cloud issues the created exclusive avatar to the intelligent device associated with the user for display, and the exclusive avatar interacts with the user.

In the embodiment of the disclosure, the exclusive avatar of the user is used, and the trust and approval of the user can be obtained more easily in the formation of the companion of the avatar, so that the consumption recommendation can be completed more easily by the avatar.

As an alternative embodiment, the method further comprises the following operations.

Consumption data of a user is acquired.

Based on the consumption data, a marketing strategy for the user is generated and sent to the smart device.

For example, the cloud may acquire consumption data uploaded by an interactive device, a cash register device and the like arranged in a public place such as business and supermarket, and perform big data analysis based on the acquired consumption data, thereby generating a marketing strategy for an individual user or a certain type of user group, and issuing the generated marketing strategy to an intelligent device associated with a specific user or a specific type of user, so that the virtual image displayed on the intelligent device can timely recommend the consumption of the user.

By the aid of the method and the device, the consumption habits of the users can be learned from the consumption data of the users, and then marketing strategies according with the consumption habits of the users are formulated, so that accurate marketing can be realized.

The present disclosure also provides another avatar-based control method.

As shown in fig. 1E, the avatar-based human-machine interaction method 100E may be applied to a consumption scenario, including operations S110E to S120E as follows.

In operation S110E, an avatar is displayed on a specific interactive apparatus set in a specific place.

In operation S120E, the avatar is controlled to interact with the user during the user' S activity in the specific location.

It should be noted that in the embodiments of the present disclosure, the specific location may include a public location such as a mall, a supermarket, a restaurant, and the like. The specific interactive device may comprise an electronic device with a display screen.

Illustratively, when a user visits a mall, the user identity can be determined through a face recognition technology, and the cloud is informed to issue a corresponding virtual image to the mall interaction device near the user, so that the virtual image can interact with the user in the user consumption process, and accompany the user to perform offline consumption.

Through the embodiment of the disclosure, in a consumption scene, the exclusive virtual image of the user can appear on the market interaction equipment near the user, so that the user is accompanied by the whole offline consumption process, and better consumption experience is provided for the user.

As an alternative embodiment, the controlling of the avatar to interact with the user during the user's activities in the particular location may include at least one of the following.

And in the process of shopping in a shopping mall or a supermarket, the virtual image is controlled to be communicated and interacted with the user so as to accompany the user to shop.

And controlling the virtual image to chat with the user or to play an interactive game to accompany the user to have a meal during the process that the user eats in the restaurant or the restaurant.

And in the process of consuming in the leisure and entertainment places, the virtual image is controlled to be communicated and interacted with the user so as to accompany the leisure and entertainment of the user.

For example, when the user a tries on clothes in a certain brand clothing shop, the dedicated avatar of the user a may be downloaded from the cloud, and the dedicated avatar may be displayed at the edge of the try-on mirror near the user a, so that the user a may interact with the user, for example, recommending the user to try on clothes. For example, when the user B eats a single hot pot in a certain hot pot restaurant, a small display screen can be erected on the edge of the hot pot of the user B, and the exclusive virtual image of the user B is displayed on the small display screen, so that the user B can carry out meaningless interaction or game with the user B.

Through the embodiment of the disclosure, the accompanying consumption service is provided by using the virtual image with a certain similarity to the real person, so that the user can feel identity. In a real consumption scene, the exclusive virtual image of the user accompanies the user, and the user can be helped to purchase a commodity which the user is satisfied with, accompany the user with meals, chat and the like. Therefore, the communication between the online service and the offline service can be realized.

Consumption data of a user is acquired.

And sending the consumption data to the cloud so that the cloud generates a marketing strategy for the user based on the consumption data.

Illustratively, the interactive devices, the cash register devices and the like arranged in public places such as business and super-business can acquire consumption data of users and upload the consumption data to the cloud, and then the cloud performs big data analysis based on the acquired consumption data, so that marketing strategies for individual users or certain user groups are generated, the cloud issues the marketing strategies to corresponding intelligent home devices, and the virtual images displayed on the intelligent home devices timely recommend the consumption of the users.

Through the embodiment of the disclosure, the consumption data of the user can be collected and uploaded, so that the cloud can learn the consumption habit of the user from the consumption data of the user, and then a marketing strategy according with the consumption habit of the user can be generated, thereby realizing accurate marketing.

In response to the user entering a particular venue, face recognition is performed on the user to determine the identity of the user.

A proprietary avatar of the user is obtained based on the user's identity.

Wherein the displaying the avatar on the specific interactive apparatus set in the specific place may include: displaying the exclusive avatar on a specific interactive device provided in a specific place.

It should be appreciated that personalized avatars generally have similarities to real persons, discrimination between different avatars, and affinity enhancement through the cartoon appearance of the avatars. This helps to improve the user's subjective acceptance of the avatar. Therefore, the virtual image is created, the human-computer interaction based on the virtual image is realized, and the virtual image carries out consumption recommendation and consumption accompany in time in the process of carrying out the human-computer interaction based on the virtual image, so that the conversion rate of the consumption recommendation can be effectively improved, and the user experience degree can be improved.

The user can create own exclusive virtual image through the mobile phone client, upload the exclusive virtual image to the cloud end, and then send the exclusive virtual image to the intelligent household equipment associated with the user and interaction equipment (such as market interaction equipment, supermarket interaction equipment, restaurant interaction equipment and the like) deployed in public places such as business and super visited by the user through the cloud end so as to be used when the user carries out human-computer interaction.

Illustratively, when a user visits a shopping mall, the user identity can be determined through a face recognition technology, and the cloud is informed to issue an exclusive avatar associated with the user identity to a shopping mall interaction device near the user. For example, when the user A tries on clothes in a certain brand clothing shop, the exclusive avatar of the user A can be downloaded from the cloud, and the exclusive avatar of the user A is displayed at the edge of the try-on mirror near the user A, so that the user A can communicate and interact with the user, such as recommending the user to try on clothes. For example, when the user B eats a single hot pot in a certain hot pot restaurant, a small display screen can be erected on the edge of the hot pot of the user B, and the exclusive virtual image of the user B is displayed on the small display screen, so that the user B can carry out meaningless interaction or game with the user B.

Through the embodiment of the disclosure, based on the intelligent home equipment, the market interactive equipment and the cloud service, the home/market linkage marketing of the on-line marketing and the off-line consumption scene accompanying of the home scene based on the personalized virtual image is realized, so that the consumption guide mode (or marketing mode) can be improved, and meanwhile, the consumption experience of the user can be improved, for example, the user can receive the accompanying consumption service of the personalized virtual image in an immersive manner.

According to an embodiment of the present disclosure, the present disclosure further provides a method for generating an avatar.

Fig. 2 illustrates a flowchart of a generation method of an avatar according to an embodiment of the present disclosure.

As shown in fig. 2, method 200 may include: operations S210 to S230.

In operation S210, a language description of the user for the target avatar is obtained.

In operation S220, corresponding semantic features are extracted based on the language description.

In operation S230, a target avatar is generated based on the semantic features.

It should be noted that, in operation S210, the language description may include a language description in a voice form or a language description in a text form, and the embodiment of the present disclosure is not limited herein. Wherein, for the language description in the form of speech, the semantic requirements of the user on the target avatar can be captured by automatic speech recognition ASR techniques in operation S210.

For example, if the user wants to create a target avatar of "high nose bridge, large eyes, thin chin", the user may input the following language describing "high nose bridge, large eyes, thin chin" for the target avatar. Thus, through the above operations provided by the method 200, the linguistic descriptions "high nose bridge, large eye, thin chin" may be obtained, and semantic features such as "high nose bridge, large eye, thin chin" therein are extracted. In addition, in the embodiment of the present disclosure, an avatar may be arbitrarily created in advance as a reference avatar. Then, after extracting corresponding semantic features for the target avatar each time, the deformation of the reference avatar can be controlled based on the extracted semantic features, thereby finally obtaining the target avatar desired by the user.

Through the embodiment of the disclosure, the semantic transformation of the existing reference avatar can be performed based on the semantic requirements of the user only by giving the semantic description of the target avatar by the user and without the need of additionally performing manual customization operation by the user, thereby realizing the high-efficiency and high-accuracy personalized customization of the avatar. Thereby also enhancing the user experience and acceptance of the customized avatar.

In addition, in the embodiment of the disclosure, since only the semantic description of the target avatar needs to be given by the user, and the user does not need to additionally perform manual customization operation, the use by the user can be facilitated.

As an alternative embodiment, generating the target avatar based on the semantic features may include the following operations.

A reference avatar is obtained.

Based on semantic features extracted from the semantic description of the user, controlling the reference avatar to deform to generate the target avatar.

In the embodiment of the disclosure, an avatar may be arbitrarily created in advance as a reference avatar, and in the process of personalized customization of the target avatar, the reference avatar is directly obtained, and the semantic features extracted from the semantic description of the user are utilized to drive the reference avatar to deform, so as to obtain the corresponding target avatar.

For example, an avatar model of a reference avatar may be created first, then a skeleton tree is created for the avatar model, and then skeleton covering is performed on each skeleton node in the skeleton tree, so as to associate the skeleton node with the corresponding skeleton covering mesh node to obtain a corresponding reference avatar.

Further, controlling the reference avatar deformation based on semantic features extracted from the semantic description of the user may include: firstly, obtaining an avatar slider (hereinafter referred to as a slider) with the semantic features, and then driving the skeleton node of the reference avatar to deform by using the slider, so as to drive the skin node of the reference avatar to deform, and finally obtaining the target avatar, namely the ideal avatar, desired by the user.

Illustratively, if a user wants to create a target avatar of "high nose bridge, big eye, and thin chin", the following semantic features "high nose bridge, big eye, and thin chin" may be extracted based on the semantic description, then "high nose bridge slider", "big eye slider", and "thin chin slider" 3 sliders are obtained, and the 3 sliders are used to drive the pre-created reference avatar to deform, so as to finally obtain an ideal avatar satisfying the "high nose bridge, big eye, and thin chin" features.

According to the embodiment of the disclosure, the personalized virtual image is customized by using an artificial intelligence algorithm, so that on one hand, the production efficiency of the personalized virtual image can be improved, and the user experience is more convenient; on the other hand, the accuracy of customizing the personalized virtual image can be improved.

As an alternative embodiment, controlling the pre-created reference avatar to be deformed to generate the target avatar based on semantic features extracted from the semantic description of the user may include the following operations.

And converting the extracted semantic features into professional-level semantic features.

And controlling the reference virtual image to deform based on the professional semantic features obtained by conversion.

It should be understood that in actual use, different users may have different language descriptions of the same or similar characters. For example, for a "thin chin," some users may be described as "awl face," some users may be described as "melon seed face," and even some users may have other descriptions. Also, in practical applications, it is difficult for the user to give a shape description at the "cheekbone", "chin" level. More, the user will select a more general description to depict a sense of the avatar, such as "girl-like", "grandmother-like", "sunlight", "general", etc.

Therefore, the embodiment of the present disclosure provides that semantic features extracted from semantic descriptions of users are uniformly converted into professional-level semantic features, and then the reference avatar is controlled to deform based on the converted semantic features, so as to obtain a final target avatar.

Illustratively, the semantic features extracted by the semantic converter (i.e. the general semantic features given by the user) can be converted into professional-level semantic features. The professional semantic features can be semantic features in semantic description from the perspectives of anatomy, biology and the like. In the embodiment of the present disclosure, the semantic converter may be implemented by massive data collection and deep learning regression training.

As shown in fig. 3, in the embodiment of the present disclosure, taking the facial form as an example, the semantic keyword "melon seed face" extracted from the language description of the user may be converted into the following professional-level semantic features "low cheekbone" and "narrow chin"; the semantic keyword 'Chinese face' extracted from the language description of the user can be converted into the following professional-level semantic features 'high cheekbone' and 'wide chin'; the semantic keyword "sprout" extracted from the user's linguistic description may be converted into the following professional-level semantic features "big eyes" and "round faces".

Furthermore, in the disclosed embodiments, since the general semantic description of the user may be converted into a corresponding professional-level semantic description, a slider that drives the deformation of the reference avatar may be created based on professional-level semantics. Such as high cheekbone sliders, low cheekbone sliders, narrow chin sliders, large eye sliders, round face sliders, etc. may be created. If the user inputs the melon seed face, the melon seed face can be converted into two professional semantics of low cheekbone and narrow chin, and the reference virtual image is driven to deform by directly using two sliders of the low cheekbone slider and the narrow chin slider, so that the ideal virtual image of the melon seed face is finally realized.

Through the embodiment of the disclosure, even if the user inputs a general language description in practical application, the corresponding professional language description can be obtained through semantic conversion, and further the deformation of the reference virtual image is accurately controlled, so that the ideal virtual image desired by the user is finally obtained.

As an alternative embodiment, controlling the reference avatar deformation based on professional-level semantic features may include the following operations.

Based on the professional-level semantic features, at least one slider is determined, each slider being associated with a corresponding specific semantic tag.

Based on the at least one slider, a plurality of corresponding bone nodes in a bone tree for supporting the reference avatar are driven to deform.

Based on the deformations of the plurality of corresponding bone nodes, skin mesh node deformations associated with the plurality of corresponding bone nodes are driven.

Specifically, in the embodiment of the present disclosure, after the general semantic features described by the user are converted into professional-level semantic features, at least one keyword included in the professional-level semantic features obtained by conversion may be extracted. Then at least one semantic tag containing the at least one keyword is found, and then a slider associated with each of the at least one semantic tag is found. And finally, driving a plurality of corresponding skeleton nodes in a skeleton tree for supporting the reference virtual image to deform by using the found sliders, and further driving skin grid nodes associated with the plurality of corresponding skeleton nodes to deform based on the deformation of the plurality of corresponding skeleton nodes.

For example, fig. 4A to 4D sequentially show a "wide-face slider", a "narrow-face slider", a "long-face slider", and a "short-face slider". Fig. 5 shows a reference avatar with a finished skeleton-skin binding. Illustratively, in the case where the general language description or the converted professional language description input by the user includes the "wide face" feature, the "wide face slider" shown in fig. 4A may be directly utilized to drive the reference avatar shown in fig. 5 to deform, so as to obtain the target avatar with the wide face feature.

Through the embodiment of the disclosure, the reference virtual image is driven to deform by adopting the slider with semantic information, so that the output efficiency of the target virtual image can be improved, and the accuracy of the obtained target virtual image can be improved.

It should be noted that, in order to implement low-cost deformation of an avatar, a three-dimensional model designer usually designs a skeleton tree for a face model, and establishes a weight influence relationship between each skin mesh node of a face skin (skeleton skin) and each skeleton node in the skeleton tree. And then, by controlling the rotation, translation and scaling of each bone node in the bone tree, the deformation of each bone node can be transmitted to each skin grid node of the human face skin, so that the deformation of each skin grid node is realized.

However, the design of the skeleton tree is specific to the geometric structure of the human face, and most skeleton nodes do not have actual semantic significance such as wide face and high nose bridge. Therefore, after the designer finishes the setting work of the skeleton skin, the designer needs to design the slide block, and then batch operation on all skeleton nodes in the skeleton tree is realized through the slide block, and finally semantic expression capability is realized. If the wide face sliding block is used, 8 skeleton nodes such as a left temple, a right temple, a left cheekbone, a right mandible angle, a left forehead and a right forehead can be adjusted in a batch linkage mode.

However, not only does the design of linkage with a large number of bones require labor cost for designers, but also the complex relationship between bones often results in poor expression of the designed semantic level sliders.

Accordingly, the disclosed embodiments provide an improved avatar slider design. After the skeleton skin design is completed, namely after the binding of the skeleton and the skeleton skin (also called the association of the skeleton and the skeleton skin) is completed, a designer only needs to concentrate on designing a shape model corresponding to the semantics without continuously designing a corresponding virtual image slider with semantic information. This is because, by directly inputting the reference virtual image and the corresponding shape module, which are designed by the designer and obtained by associating the skeleton with the skeleton skin, into the "slider design system," the skeleton linkage coefficient (i.e., slider information) with semantic information can be automatically output, thereby ensuring high-quality design of the slider.

As an alternative embodiment, the slider may be generated by the following operations.

A shape model associated with the target semantic tag is obtained. Wherein the target semantic label is the same as the particular semantic label associated with the slider.

Bone skinning data of the reference avatar is obtained.

And fitting the shape model based on the bone skin data to obtain a corresponding bone linkage coefficient.

And generating a slide block associated with the target semantic label based on the skeleton linkage coefficient.

The virtual image which accords with the target semantic feature can be obtained by driving the reference virtual image by utilizing the sliding block, and the target semantic label comprises the target semantic feature.

For example, taking the face shape as an example, inputting a "wide face model" (associated with a wide face label) and a reference virtual image with the binding of the skeleton and the skin being completed into a skeleton fitting coefficient solver, and outputting a "wide face skeleton linkage coefficient" (i.e., "wide face slider information"); inputting a narrow face model (associated with a narrow face label) and a reference virtual image which is bound by a skeleton and a skin into a skeleton fitting coefficient solver, and outputting a narrow face skeleton linkage coefficient (namely narrow face slider information); inputting the 'long face model' (associated with the long face label) and the reference virtual image with the bound skeleton and skin into a skeleton fitting coefficient solver, and outputting a 'long face skeleton linkage coefficient' (namely 'long face slider information'); the short face model (associated with the short face label) and the reference virtual image with the bound skeleton and skin are input into a skeleton fitting coefficient solver, and the short face skeleton linkage coefficient (namely short face slider information) can be output. The skeleton fitting coefficient solver is used for fitting each shape model based on skeleton skin data of the reference virtual image so as to obtain corresponding skeleton linkage coefficients. After the corresponding skeleton linkage coefficient is obtained, the skeleton linkage coefficient is associated with the corresponding semantic label, and a slide block with corresponding semantics can be obtained. For example, associating "wide face slider information" with "wide face tag" may result in "wide face slider".

It should be noted that, in another embodiment of the present disclosure, the bone skin data of the reference avatar and the plurality of shape models (each corresponding to a different semantic tag) may be transmitted to the bone fitting coefficient solver together, so as to automatically acquire a plurality of sliders corresponding to each semantic tag, thereby ensuring efficient production of the sliders.

For example, as shown in fig. 6, taking a face shape as an example, the "wide face model", "narrow face model", "long face model", and "short face model" are connected to the reference avatar having a skeleton bound to the skin and input into the skeleton fitting coefficient solver, so as to automatically output the "wide face slider", "narrow face slider", "long face slider", and "short face slider".

It should be understood that in the related art, the semantic slider design is realized by linking a plurality of skeleton nodes under the design of a designer. Specifically, a skeleton generally has transformation capabilities of three degrees of freedom, namely translation, rotation and scaling, a designer can set weights for skin grid nodes influenced by the skeleton, and in actual deformation, the skin grid nodes are subjected to weighted deformation according to skeleton transformation data and the set corresponding weights.

However, one slider can usually affect a plurality of skeleton nodes, and designers can make the slider have corresponding semantics by designing the influence relationship of the slider on the plurality of skeleton nodes. Such as a low-zygomatic bone slide block, a tip and jaw slide block and the like, but the semantic features can be realized only by relying on the linkage of a plurality of bone nodes.

And through this disclosed embodiment, the designer only need be concentrated on with the design of the shape model that semantic label is relevant can, follow-up can be through bone coefficient fitting solver to the shape model fit, realize the slider design. That is, according to the embodiment of the present disclosure, the bone coefficient fitting capability is integrated, the generation method and the production flow of the slider are redefined, the burden of a designer on slider design is comprehensively reduced, the designer gets rid of the tedious multi-bone linkage design, and can be more focused on the design of the shape model corresponding to the semantics, so that the designer can be liberated from the complex multi-bone node linkage design, and the production efficiency of the digital assets is improved.

As an alternative embodiment, the reference avatar may be created by the following operations.

For the reference avatar, a corresponding skeletal tree is created.

Based on the bone tree, bones are associated with a bone skin to obtain a reference avatar.

For example, taking a human face model as an example, a designer may design a skeleton tree for the human face model, associate a human face skin (skeleton skin) with each skeleton node in the skeleton tree, and implement binding of the human face skin and each skeleton node, thereby obtaining a corresponding reference virtual image.

As an alternative embodiment, fitting the shape model based on the bone skinning data to obtain the corresponding bone linkage coefficients may include the following operations.

Based on the bone covering data, from a root bone node to a leaf bone node of the bone tree, the shape model is subjected to one-by-one iteration solution to obtain a bone linkage coefficient.

Wherein the skeletal tree is created for the reference avatar.

Through the embodiment of the disclosure, the bone linkage coefficient can be obtained through a bottom-up iterative algorithm, so that the fitting calculation efficiency is higher.

As an alternative embodiment, performing one-by-one iterative solution on the shape model from a root skeleton node to a leaf skeleton node of the skeleton tree may include: and (3) from a root skeleton node of the skeleton tree, respectively fitting the rotation, translation and scaling coefficients of each skeleton node by applying a least square method step by step until the rotation, translation and scaling coefficients of all leaf skeleton nodes in the skeleton tree are solved.

It is noted that in the disclosed embodiments, the bone coefficient fitting solver may employ a bottom-up solution strategy. That is, the least square method is applied step by step from the root node of the skeleton tree, and the rotation, translation and scaling coefficients of each skeleton node of the shape model are fitted respectively until all leaf nodes in the skeleton tree are solved.

As an alternative embodiment, fitting the shape model based on the bone covering data to obtain the corresponding bone linkage coefficient may include: and based on the bone skin data, carrying out Toronto one-by-one iterative solution on the shape model from a root bone node to a leaf bone node of the bone tree to obtain a bone linkage coefficient. Wherein the skeletal tree is created for the reference avatar.

Through the embodiment of the disclosure, the bone linkage coefficient can also be obtained through a bottom-up Torontal iterative algorithm, that is, the fitting coefficients of all levels of bone tree nodes are gradually solved, so that the obtained fitting result is more accurate.

In addition, other types of bone coefficient fitting calculation algorithms can also be supported by the disclosed embodiments, and the disclosed embodiments are not limited herein.

The embodiment of the invention realizes the intelligent generation of the virtual image slider by integrating the related algorithm and semantically defining the input and output of the intelligent slider generation system.

As an alternative embodiment, fitting the shape model based on the bone covering data to obtain the corresponding bone linkage coefficient may include: and inputting the bone skin data and the shape model into a preset bone coefficient fitting solver so as to fit the shape model through the bone coefficient fitting solver, thereby obtaining a bone linkage coefficient.

Through this disclosed embodiment, the designer only need be concentrated on with semantic label relevant shape model design can, follow-up can realize the slider design through bone coefficient fitting solver to the shape model fitting.

As an alternative embodiment, the method may further comprise: the bone covering data and the generated slider are stored in the same file.

Illustratively, after obtaining the "wide face slider", if the "wide face slider" and the "reference avatar" are stored in the same file, the "wide face slider" may automatically drive the "reference avatar" to generate the "wide face avatar" after triggering the "wide face avatar" start procedure.

Through the embodiment of the disclosure, the virtual image slider and the reference virtual image are stored in the same file, and the slider can be directly utilized to drive the reference virtual image to quickly output the target virtual image described by a user when the virtual image is started.

At least one semantic tag is determined based on the semantic features.

Based on the semantic tags, a corresponding at least one accessory model and/or at least one jewelry model is determined.

On the basis of an avatar obtained by controlling the deformation of a reference avatar, at least one accessory model and/or at least one accessory model are added to obtain a target avatar.

For example, if the user wants to create a target avatar of "high nose bridge, big eyes, thin chin, long hair, student dress, white shoes", the following semantic features "high nose bridge, big eyes, thin chin" may be extracted based on the semantic description. Then 3 sliders of a high nose bridge slider, a large-eye slider and a thin chin slider are obtained, and the 3 sliders are used for driving the pre-created reference virtual image to deform, so that the virtual image meeting the characteristics of the high nose bridge, the large eyes and the thin chin is obtained. Meanwhile, the following semantic features of long hair, student uniform and white sneakers can be extracted based on the semantic description, the fitting models such as a long hair model are obtained from a hair style fitting digital asset library, the ornament models such as a girl uniform model and a white sneaker model are obtained from a clothing digital asset library comprising clothes, shoes, caps and the like, the long hair model, the girl uniform model and the white sneaker model are added to the virtual image which is created in the front and has the characteristics of high nose bridge, big eyes and thin chin, and the output virtual image is the target virtual image desired by the user.

It should be understood that in the disclosed embodiments, hair styles, beard accessory digital assets libraries can include various types of male beard models and various types of male hair style models, as well as various types of female hair style models. It should also be understood that the apparel digital asset library may include various types of jewelry models such as glasses, clothes, shoes, watches, gloves, headwear, scarves, etc. for men and women. Each model in the digital asset library is associated with a unique semantic tag, so that the corresponding model can be automatically obtained according to the semantic tag.

Illustratively, different semantic tags may be defined for each model in the digital asset library, such as white shoes, high-heeled shoes, pinkeyes, student uniform, professional uniform, and the like. According to the semantic labels such as hair styles, clothes and the like output by the semantic converter, corresponding models can be selected from the digital asset library and added into the generated virtual image.

As shown in fig. 7, the customization process of the personalized avatar may be as follows: inputting a voice description by a user; automatically performing voice recognition through an ASR technology; extracting keywords in the voice description; converting the extracted keywords through a semantic converter; acquiring a slide semantic label associated with a slide from the converted keywords; acquiring a slide block for driving bone deformation based on a slide block semantic label; acquiring a corresponding bone deformation coefficient based on the sliding block; driving the bone skin to deform in a linkage manner based on the bone deformation coefficient; obtaining model semantic labels related to hairstyle, clothing and the like from the converted keywords; acquiring a corresponding model from a digital asset library based on the model semantic tag; and adding the obtained model to an avatar generated by skeleton skin linkage deformation to obtain a final target avatar.

Through the embodiment of the disclosure, the target virtual image can be beautified and enriched through the accessories, so that the obtained target virtual image can better meet the ideal type of a user.

According to the embodiment of the disclosure, the disclosure further provides a human-computer interaction device based on the virtual image.

Fig. 8A illustrates a block diagram of an avatar-based human-machine interaction device according to an embodiment of the present disclosure.

As shown in fig. 8A, the avatar-based human-machine interaction apparatus 800A includes: a first display module 810A, a first control module 820A, and a second control module 830A.

The first display module 810A is configured to display an avatar on the smart home device.

A first control module 820A for controlling the avatar to interact with the user.

The second control module 830A is configured to control the avatar to perform consumption recommendation on the user during the process of interaction between the avatar and the user.

As an alternative embodiment, the first control module comprises: the acquisition unit is used for acquiring a marketing strategy issued by the cloud aiming at the user; and the control unit is used for controlling the virtual image to carry out consumption recommendation on the user based on the marketing strategy.

As an alternative embodiment, the first display module is further configured to: displaying the exclusive virtual image of the user on the intelligent household equipment, wherein the exclusive virtual image is issued to the intelligent household equipment by a cloud.

According to an embodiment of the present disclosure, the present disclosure also provides an avatar-based control apparatus.

Fig. 8B illustrates a block diagram of an avatar-based control apparatus according to an embodiment of the present disclosure.

As shown in fig. 8B, the avatar-based control apparatus 800B includes: a third control module 810B and a first transmit module 820B.

And the third control module 810B is used for remotely controlling the avatar displayed on the smart home device to interact with the user.

The first sending module 820B is configured to issue a marketing strategy for the user to the smart home device in a process of communication and interaction between the avatar and the user, so that the avatar performs consumption recommendation on the user based on the marketing strategy.

As an alternative embodiment, the apparatus further comprises: the first acquisition module is used for acquiring the exclusive virtual image provided by the user; and the second sending module is used for sending the exclusive virtual image to the intelligent household equipment associated with the user so that the intelligent household equipment displays the exclusive virtual image when facing the user and carries out human-computer interaction with the user through the exclusive virtual image.

As an alternative embodiment, the apparatus further comprises: the second acquisition module is used for acquiring the consumption data of the user; and the generating module is used for generating the marketing strategy aiming at the user based on the consumption data and sending the marketing strategy to the intelligent household equipment.

According to the embodiment of the disclosure, the disclosure also provides a human-computer interaction device based on the virtual image.

Fig. 8C illustrates a block diagram of an avatar-based human-machine interaction device according to an embodiment of the present disclosure.

As shown in fig. 8C, the avatar-based human-computer interaction apparatus 800C includes: a second display module 810C and a fourth control module 820C.

And a second display module 810C for displaying the avatar on a specific interactive apparatus provided in a specific place.

A fourth control module 820C for controlling the avatar to interact with the user during the user's activities in the specific location.

As an alternative embodiment, the fourth control module is further configured to perform at least one of: controlling the virtual image to interact with the user in the shopping process of the user in a market or a supermarket so as to accompany the user for shopping; controlling the virtual image to chat with the user or to play an interactive game to accompany the user to have a meal in the restaurant or restaurant; and controlling the virtual image to carry out communication and interaction with the user in the process of consuming in the leisure and entertainment place by the user so as to accompany the leisure and entertainment of the user.

As an alternative embodiment, the apparatus further comprises: the third acquisition module is used for acquiring consumption data of the user; and the third sending module is used for sending the consumption data to a cloud end so that the cloud end can generate a marketing strategy for the user based on the consumption data.

As an alternative embodiment, the apparatus further comprises: the determining module is used for responding to the user entering the specific place and performing face recognition on the user to determine the identity of the user; and a fourth obtaining module, configured to obtain an exclusive avatar of the user based on the identity of the user, wherein the second displaying module is further configured to: and displaying the exclusive avatar on the specific interactive device arranged in the specific place.

It should be understood that the embodiments of the apparatus portion of the present disclosure are the same as or similar to the embodiments of the method portion of the present disclosure, and the achieved technical effects are also the same as or similar to each other, which are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the electronic device 900 includes a computing unit 901 that can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the electronic device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the electronic device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as avatar-based human-machine interaction, control methods. For example, in some embodiments, the avatar-based human-machine interaction, control method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 900 via ROM 902 and/or communications unit 909. When loaded into RAM 903 and executed by computing unit 901, may perform one or more steps of the avatar-based human-machine interaction, control method described above. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the avatar-based human-machine interaction, control method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An avatar-based control method, comprising:

the method comprises the steps that an exclusive virtual image of a user is issued to intelligent home equipment associated with the user, so that the exclusive virtual image is displayed when the intelligent home equipment faces the user;

the exclusive virtual image displayed on the intelligent household equipment is remotely controlled to carry out communication interaction with the user;

in the process of communication and interaction between the exclusive avatar and the user, issuing a recommendation instruction aiming at the user to the intelligent household equipment so that the exclusive avatar performs consumption recommendation on the user based on the recommendation instruction;

in the process that the user moves in a specific place, issuing the exclusive avatar of the user to specific interaction equipment arranged in the specific place so as to display the exclusive avatar on the specific interaction equipment; and

remotely controlling the dedicated avatar displayed on a particular interactive device to accompany the user in a consumption scenario,

wherein the proprietary avatar is generated based on:

acquiring language description of the user aiming at the image characteristics of the exclusive virtual image;

extracting corresponding semantic features based on the language description;

converting the semantic features into professional-level semantic features;

determining a specific semantic tag based on the keywords in the professional semantic features;

determining a slider associated with the particular semantic tag; and

controlling the deformation of the reference avatar by using the slider to generate the dedicated avatar,

wherein the slider is generated based on:

obtaining a shape model associated with a target semantic tag, wherein the target semantic tag is the same as a specific semantic tag associated with a slider;

obtaining skeleton skin data of a reference virtual image;

fitting the shape model based on the bone skin data to obtain corresponding bone linkage coefficients;

generating a slider associated with the target semantic tag based on the skeletal linkage coefficient;

the method comprises the steps that a virtual image which accords with target semantic features can be obtained by driving a reference virtual image through a slider, and target semantic labels comprise the target semantic features;

the method further comprises the following steps:

acquiring consumption data of the user through the specific interaction equipment; and

and generating a marketing strategy aiming at the user based on the consumption data, and sending the marketing strategy to the intelligent household equipment.

2. The method of claim 1, wherein remotely controlling the proprietary avatar displayed on a particular interactive device accompanies the user with a consumption scenario during the user's activity at a particular location, including at least one of:

controlling the virtual image to interact with the user in the shopping process of the user in a market or a supermarket so as to accompany the user for shopping;

controlling the virtual image to chat with the user or play an interactive game to accompany the user to have a meal during the process that the user eats at a restaurant or a restaurant;

and in the process that the user consumes in the leisure and entertainment place, controlling the virtual image to carry out communication and interaction with the user so as to accompany the leisure and entertainment of the user.

3. The method of claim 1, further comprising:

performing face recognition on the user to determine the identity of the user in response to the user entering the particular place; and

obtaining a proprietary avatar of the user based on the user's identity,

wherein displaying an avatar on a specific interactive apparatus provided in the specific place, includes: displaying the exclusive avatar on the specific interactive apparatus set in the specific place.

4. An avatar-based control device, comprising:

the third control module is used for displaying an exclusive virtual image on the intelligent household equipment associated with the user and remotely controlling the exclusive virtual image displayed on the intelligent household equipment to communicate and interact with the user, wherein the exclusive virtual image is generated based on the following operations: acquiring language description of the image characteristics of the exclusive virtual image by the user; extracting corresponding semantic features based on the language description; converting the semantic features into professional-level semantic features; determining a specific semantic label based on the keywords in the professional semantic features; determining a slider associated with the particular semantic tag; and controlling the deformation of the reference avatar by using the slider to generate the exclusive avatar;

the first sending module is used for issuing a recommendation instruction aiming at the user to the intelligent household equipment in the process of communication and interaction between the exclusive avatar and the user so that the exclusive avatar carries out consumption recommendation on the user based on the recommendation instruction;

the fourth control module is used for displaying the exclusive virtual image on specific interaction equipment arranged in a specific place in the process that the user moves in the specific place, and remotely controlling the exclusive virtual image displayed on the specific interaction equipment to accompany the consumption scene of the user;

the second acquisition module is used for acquiring the consumption data of the user through the specific interaction equipment; and

a generating module for generating a marketing strategy for the user based on the consumption data, and sending the marketing strategy to the smart home device,

wherein the slider is generated based on:

obtaining skeleton skin data of a reference virtual image;

and driving the reference virtual image by using the slider to obtain the virtual image conforming to the target semantic feature, wherein the target semantic label comprises the target semantic feature.

5. The apparatus of claim 4, wherein the fourth control module is further configured to perform at least one of:

and controlling the virtual image to carry out communication and interaction with the user in the process of consumption of the user in the leisure and entertainment place so as to accompany the leisure and entertainment of the user.

6. The apparatus of claim 4, further comprising:

a determining module, configured to perform face recognition on the user in response to the user entering the specific place to determine an identity of the user; and

a fourth obtaining module for obtaining the exclusive avatar of the user based on the identity of the user,

wherein the second display module is further configured to: displaying the exclusive avatar on the specific interactive apparatus set in the specific place.

7. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.

8. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-3.

9. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-3.