CN116225234A

CN116225234A - Interaction method and cloud server

Info

Publication number: CN116225234A
Application number: CN202310333569.9A
Authority: CN
Inventors: 陈海青; 张佶; 张邦; 张婧; 李彬; 高星; 张建海
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2023-06-06

Abstract

The application provides an interaction method and a cloud server. According to the method, attribute information of a target object, an object 3D model and interaction content information are obtained; according to the attribute information of the target object, automatically generating a personification 3D material suitable for the target object, and synthesizing the personification 3D material onto the object 3D model to obtain a personification 3D model of the target object, wherein different personification images are not required to be designed manually for the object; further, the anthropomorphic 3D model of the object is used as an integral virtual image, the anthropomorphic 3D model is displayed in the interactive page, and the anthropomorphic 3D model is driven to execute the interactive behavior matched with the interactive content information, so that the anthropomorphic 3D model can produce the matched interactive behavior, different animation actions are not required to be designed manually for the object, the efficiency is high, the period is short, the method can be automatically adapted to different objects and different interactive contents, and the actions are more diversified and anthropomorphic.

Description

Interaction method and cloud server

Technical Field

The application relates to the fields of artificial intelligence, deep learning, machine learning, virtual reality, augmented reality, man-machine interaction and the like in computer technology, in particular to an interaction method and a cloud server.

Background

The existing Three-dimensional (3D) animation has an object anthropomorphic form, and gives the life to the object, so that the cold ice object can behave like a person, think and change expression, and completely different feelings from the object view angle are brought to the audience. For example, a Three-dimensional (3D) animation created on a real object announces the 3D animation of the object to a user.

At present, the traditional method of manually making by a designer and the traditional method of automatically generating a anthropomorphic 3D model are all to make/render Two-dimensional (2D for short) anthropomorphic patterns (such as virtual five sense organs and virtual limb models) on the 3D model by constructing a 3D model of a real object, so as to form a anthropomorphic 3D model and an animation. The anthropomorphic patterns such as the five sense organs, the limbs and the like added to the object are all 2D materials, the animation is repeatedly played according to preset animation actions to form the animation, actions in the animation are monotonously repeated, and in order to improve the diversity of anthropomorphic animations of different objects, the animation actions corresponding to the objects are required to be respectively set, so that the efficiency is low, the period is long, the actions are fixed, and the interactivity is poor.

Disclosure of Invention

The application provides an interaction method and a cloud server, which are used for solving the problems of low efficiency, long period, fixed action and poor interactivity of object personification in a traditional interaction scene.

In a first aspect, the present application provides an interaction method, including:

responding to an interaction request for a target object, and acquiring attribute information, an object 3D model and interaction content information of the target object;

generating a anthropomorphic 3D material suitable for the target object according to the attribute information of the target object, and synthesizing the anthropomorphic 3D material onto the 3D model of the object to obtain a anthropomorphic 3D model of the target object;

and displaying the personified 3D model in an interaction page, and driving the personified 3D model to execute interaction behavior matched with the interaction content information.

In a second aspect, the present application provides an interaction method, including:

responding to a personification request of end-side equipment on a target object, and acquiring attribute information of the target object, an object 3D model and broadcasting content;

Displaying the personification 3D model in an interactive page, driving the personification 3D model to finish the interactive behavior of broadcasting the broadcasting content, and generating a broadcasting animation;

and responding to the confirmation operation of the broadcasting animation, and providing the broadcasting animation for the terminal side equipment.

In a third aspect, the present application provides an interaction method, including:

in a real-time interaction process, responding to an interaction request with a target object, and acquiring attribute information, an object 3D model and interaction content information of the target object;

and displaying the personified 3D model in a real-time interaction page, and driving the personified 3D model to execute interaction behavior matched with the interaction content information.

In a fourth aspect, the present application provides a cloud server, including: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the method of any of the above aspects.

In a fifth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for performing the method of any one of the above aspects when executed by a processor.

According to the interaction method and the cloud server, attribute information of a target object, the 3D model of the object and interaction content information are obtained; according to the attribute information of the target object, automatically generating a anthropomorphic 3D material suitable for the target object, and synthesizing the anthropomorphic 3D material onto the object 3D model to obtain a anthropomorphic 3D model of the target object; the method comprises the steps of displaying a personification 3D model in an interaction page, driving the personification 3D model to execute interaction behaviors matched with interaction content information, automatically generating applicable personification 3D materials according to attribute information of an object, synthesizing the materials on the object 3D model to obtain a corresponding personification 3D model of the object, manually designing different personification images for the object, enabling the personification 3D model of the object to serve as an integral virtual image, automatically driving the personification 3D model according to the interaction content information to enable the personification 3D model to obtain matched interaction behaviors, manually designing different animation actions for the object, enabling the efficiency to be high, the period to be short, automatically adapting to different objects and different interaction contents, enabling actions based on the interaction behaviors of the personification image of the object to be more diversified and personification, enabling the matching degree with the interaction content to be higher, and improving the representation of the personification image of the object.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is an exemplary architecture diagram of an interactive system applicable to the present application;

FIG. 2 is a flowchart of an interaction method according to an exemplary embodiment of the present application;

FIG. 3 is an exemplary diagram of interactive behavior based on an avatar in accordance with an exemplary embodiment of the present application;

FIG. 4 is an exemplary diagram of a real-time interaction provided by an exemplary embodiment of the present application;

FIG. 5 is an interactive flow chart of a customized anthropomorphic, object-oriented, broadcast animation scenario provided in an exemplary embodiment of the present application;

FIG. 6 is a flow chart of a real-time interaction method based on an avatar image according to an exemplary embodiment of the present application;

FIG. 7 is a schematic structural diagram of an interaction device according to an exemplary embodiment of the present application;

fig. 8 is a schematic structural diagram of a cloud server according to an embodiment of the present application.

Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The terms referred to in this application are explained first:

generate content (AI Generated Content, AIGC for short): refers to content automatically generated using artificial intelligence (Artificial Intelligence, AI) algorithm models, such as generating new content (e.g., text or images) from known text, audio, and images. Wherein the AI algorithm model for generating content is called a generative model (AI Generative Model, AIGM for short), also called a generative AI model.

Augmented reality (Augmented Reality, AR for short): AR technology refers to augmented reality technology that can connect real world information and virtual world information, such as adding virtual images to a photographed picture.

Digital human AI drive: the AI algorithm is used for 'understanding' text, voice or music and other content information, so that the expression and action of the digital person are decided and generated, and the digital person moves.

Multimodal interactions: namely, man-machine interaction is performed in various modes such as words, voice, vision, actions, environment and the like, so that the interaction modes between people are fully simulated.

The traditional method for constructing the 3D model of the real object comprises the steps of based on preset animation actions, manufacturing/rendering 2D anthropomorphic patterns on the 3D model of the object, generating a anthropomorphic 3D model and an animation, and monotonically repeatedly playing preset animation actions in the animation to realize the interaction method based on the anthropomorphic object, wherein the corresponding animation actions are required to be designed and manufactured according to the object, and the method is low in efficiency, long in period, fixed in action and poor in interactivity.

The application provides an interaction method, which comprises the steps of obtaining attribute information of a target object, an object 3D model and interaction content information; according to the attribute information of the target object, automatically generating a anthropomorphic 3D material suitable for the target object, and synthesizing the anthropomorphic 3D material onto the object 3D model to obtain a anthropomorphic 3D model of the target object; the method comprises the steps of displaying a personified 3D model in an interaction page, driving the personified 3D model to execute interaction behavior matched with interaction content information, automatically generating applicable personified 3D materials according to attribute information of an object, synthesizing the materials on the object 3D model to obtain a corresponding personified 3D model of the object, manually designing different personified images for the object, taking the personified 3D model of the object as an integral virtual image, and automatically driving the personified 3D model according to the interaction content information to enable the personified 3D model to obtain matched interaction behavior.

The interaction method provided by the application can be applied to scenes such as electronic commerce, video live broadcast, film/animation production and the like, for example, the interaction method can be applied to commodity propaganda scenes in electronic commerce, such as anthropomorphic broadcasting video for generating commodities; or can be used in a live video scene to replace a host or virtual host based on the anthropomorphic image of the object so as to complete the interaction with the watching user; or can be used in an animation scene to automatically generate the anthropomorphic image of the object character, drive the anthropomorphic image of the object to make corresponding actions based on the content of the given scenario, and record and store the actions as an animation video. The interaction method provided by the application can also be applied to intelligent customer service scenes in electronic commodities or other scenes needing to generate the anthropomorphic image of the object and driving the anthropomorphic image of the object to make certain interaction behaviors, and the scenes are not listed here.

FIG. 1 is an exemplary architecture diagram of an interactive system suitable for use herein. As shown in fig. 1, the system architecture may specifically include a cloud server, an end-side device, and a data service device, as shown in fig. 1.

The cloud server can be a server cluster arranged at the cloud end, communication links capable of being communicated are arranged between the cloud server and each terminal side device, and communication connection between the cloud server and each terminal side device can be achieved. The cloud server comprises three modules of image generation, a dialogue model and an AI driving engine. The image generation module can automatically generate a anthropomorphic 3D model of the object based on the attribute information of the target object. The dialogue model realizes the question-answering and dialogue functions of the real-time interaction scene, and acquires the interaction content information based on the user input data. In some scenarios, the interactive content information may be provided directly to the cloud server by the user through the end-side device, or obtained by the cloud server from the data service device. The AI driving engine is used for driving the anthropomorphic 3D model of the object to make interaction behavior matched with the interaction content information, so that interaction between the user and the anthropomorphic object is realized. The different ways of acquiring the interactive content information are shown by dotted lines in fig. 1 as optional ways, and the cloud server may acquire the interactive content information in different application scenarios in different ways.

The terminal side device may specifically be a hardware device with a network communication function, an operation function and an information display function, which are used by each user, and include, but are not limited to, a smart phone, a tablet computer, a desktop computer, an AR device, an internet of things device, and the like.

The data service device may be a device capable of providing the cloud server with the 3D model of the object, attribute information, and at least one source of multi-modal knowledge data, and the data service device externally provides an application program interface (Application Programming Interface, abbreviated as API) for obtaining relevant data of the target object, and the server obtains the required data by calling the API. For example, the data service device may be an e-commerce platform, or a server of a live broadcast platform storing merchandise related data and a 3D model, or a user-specified device.

In a possible application scenario, the method can be applied to generating a personified 3D model of a target object according to the target object and broadcasting content appointed by a user, and generating a broadcasting animation matched with the broadcasting content based on the personified 3D model. And the user can send a personification request for the target object to the cloud server through the used terminal side equipment, wherein the request carries the information and broadcasting content of the target object. The cloud server identifies and determines the target object based on the received personification request for the target object, and acquires attribute information of the target object, the object 3D model and broadcasting content. The cloud server generates a personified 3D material suitable for the target object according to the attribute information of the target object, and synthesizes the personified 3D material onto the object 3D model to obtain a personified 3D model of the target object; displaying the anthropomorphic 3D model in the interactive page, driving the anthropomorphic 3D model to finish broadcasting the interactive behavior of the broadcasting content, and generating a broadcasting animation. And the cloud server provides the broadcasting animation for the terminal side equipment. The terminal side equipment can carry out broadcasting animation for throwing the anthropomorphic 3D model based on the target object through various approaches.

In one possible application scenario, the scheme can be applied to a real-time interaction scenario. The cloud server provides a real-time interactive interface. In the real-time interaction process, a user can send an interaction request with a target object to a cloud server through used end-side equipment, and the request carries information of the target object. The cloud server identifies and determines the target object based on the received personification request for the target object, and acquires attribute information of the target object, the object 3D model and interaction content information. The interactive content information may be preset broadcast content corresponding to the target object, or may be reply content generated by the cloud server according to input data of the user. The cloud server generates a personified 3D material suitable for the target object according to the attribute information of the target object, and synthesizes the personified 3D material onto the object 3D model to obtain a personified 3D model of the target object; the method comprises the steps of displaying a personification 3D model in a real-time interaction page, driving the personification 3D model to execute interaction behavior matched with interaction content information, and accordingly achieving interaction between a target object and a user through personification images in a real-time interaction scene, enabling the target object to move and speak with an original static object in a real-time interaction mode of 'first person' visual angle and user real-time conversation, and improving interactivity and interactive interestingness of the user and the target object.

The following describes the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 2 is a flowchart of an interaction method according to an exemplary embodiment of the present application. The execution main body of the embodiment is a cloud server in the interactive system architecture, and the method provided by the embodiment is used for automatically generating the anthropomorphic 3D material matched with the attribute of the target object, automatically anchoring the position of the anthropomorphic 3D material on the target object, synthesizing the anthropomorphic 3D material onto the 3D model of the object, generating the anthropomorphic 3D model of the target object, and driving the anthropomorphic 3D model to execute the interactive behavior matched with the interactive content information, so that the interactive method based on the anthropomorphic image of the object is realized.

As shown in fig. 2, the method specifically comprises the following steps:

step S201, attribute information, an object 3D model and interaction content information of a target object are acquired in response to an interaction request of the target object.

Wherein the request for interaction with the target object may be a request to initiate interaction with the target object. In different application scenarios, the interactive requests for the target object may be represented as different requests.

For example, a request initiated by a merchant user to generate a personalized broadcast video of a commodity may include at least one of an image of the commodity, descriptive information. The image of the commodity may be a photograph taken by the user and including the commodity, or an external image of the commodity which is prepared in advance. The description information of the commodity may include identification information (such as commodity name, code, etc.), category information (such as category), etc. based on the description information of the commodity, the cloud server may identify and determine the target commodity so as to acquire attribute information, object 3D model, and interactive content information of the target commodity. And the cloud server generates a broadcasting video of the target object and returns the broadcasting video to the user.

For example, a host/virtual host in a live scene initiates a request for interaction with a user by a specified target object using an personified avatar, such as the host/virtual host issuing a preset-talk "please XXX object," where "XXX" refers to identification information such as an object name or code. The cloud server may identify a determination target object (which may be a commodity of a live broadcasting room or an object to be a virtual anchor) based on the identification information of the object, so as to acquire attribute information of the target object, an object 3D model, and interactive content information. The cloud server generates and drives a personified 3D model of the target object, and real-time interaction between the personified image of the object and a user is realized based on the live interface.

For example, in an intelligent customer service scenario, a user sends a link to a target object to a cloud server. The cloud server can identify and determine the target object based on the link of the target object, so that attribute information, an object 3D model and interaction content information of the target object can be acquired. The cloud server generates and drives a personified 3D model of the target object, and based on the customer service interaction interface, the real-time dialogue interaction between the target object and the user at the first person viewing angle is realized, and the interactivity and the interestingness of the customer service system are improved.

In this embodiment, the interactive content information refers to content information output to the user based on the interactive request target object. The interactive content information may be content information provided by the user through the terminal side device, or may be reply content information automatically generated by the cloud server to the user input data.

Step S202, according to the attribute information of the target object, generating a personified 3D material suitable for the target object, and synthesizing the personified 3D material onto the 3D model of the object to obtain a personified 3D model of the target object.

Wherein the attribute information of the target object includes, but is not limited to, identification information, color, shape, category of the target object.

In this embodiment, the cloud server automatically generates the personified 3D material matched with the attribute of the target object according to the attribute information of the target object. Further, the cloud server automatically anchors the position of the anthropomorphic 3D material on the target object according to the attribute information of the target object and the preset proportion parameters, synthesizes the anthropomorphic 3D material on the corresponding position on the 3D model of the object, and generates the anthropomorphic 3D model of the target object.

For example, the cloud server may determine the connection location of the personified 3D material of the target object according to the contour and shape of the target object, according to the central axis (horizontal or vertical) and aspect ratio of the 3D model of the object. The connection position of the anthropomorphic 3D material can be a point (called a connection point) or an area (called a connection area). For example, the positions of the connection points of the upper limbs, the connection points of the lower limbs, the positions of the connection areas of the eyes, nose, mouth, and the like are determined. The connection points of the limbs can be symmetrically arranged on two sides of the central axis of the object 3D model of the target object, and the specific positions of the connection points need to be determined according to the length-width ratio, for example, for a target object in a vertical cuboid shape, such as lipstick, the connection points of the upper limbs need to be adjusted upwards, and the connection points of the lower limbs need to be adjusted downwards. For a target object with a transverse cuboid shape, such as a tissue box, the connection point of the upper limb of the target object needs to be adjusted in the middle, and the proportion of the finally obtained anthropomorphic image (anthropomorphic 3D model) is more coordinated and reasonable through the adjustment.

And step 203, displaying the anthropomorphic 3D model in the interaction page, and driving the anthropomorphic 3D model to execute interaction behavior matched with the interaction content information.

After the anthropomorphic 3D model of the target object is generated, the cloud server can display the anthropomorphic 3D model of the target object in a current or new interaction page, and drive the anthropomorphic 3D model to make interaction behaviors matched with the interaction content information according to the interaction content information by using an AI driving algorithm, wherein the interaction behaviors comprise expression, mouth shape, limb actions and the like, and interaction based on the anthropomorphic image of the object is realized.

The AI driving algorithm can be automatically adapted to different objects and different interactive content information, so that the interactive actions based on the object personification image are more diversified and personified, and the matching degree with the interactive content information is higher. The animation of the target object personification image presented based on the AI driving algorithm is dynamically generated according to the image characteristics of the object and the interactive content information, and the animation produced by the mode is not monotonous repeated play of a plurality of prefabricated actions, but has higher diversity, can be more matched with the object, and improves the expressive force of the object personification.

Fig. 3 is an exemplary diagram of an interaction behavior generated based on a anthropomorphic image of an object, in which fig. 3 uses a target object as a strawberry as an example, and a 3D model of five sense organs and a 3D model of limbs are synthesized on a 3D model of the strawberry to form the anthropomorphic image of the strawberry. Under the drive of AI algorithm, the anthropomorphic image of strawberry makes complex and various actions, not monotonous repetition of several preset actions. It should be noted that, the images in fig. 3 are merely multi-frame images extracted from the interactive behavior animation generated by driving, and there are multi-frame images between two adjacent images, so that the actions in the two adjacent images are coherent and flow.

In the embodiment, attribute information of a target object, an object 3D model and interaction content information are acquired; according to the attribute information of the target object, automatically generating a anthropomorphic 3D material suitable for the target object, and synthesizing the anthropomorphic 3D material onto the object 3D model to obtain a anthropomorphic 3D model of the target object; displaying the anthropomorphic 3D model in the interactive page, driving the anthropomorphic 3D model to execute the interactive behavior matched with the interactive content information, automatically generating applicable anthropomorphic 3D materials according to the attribute information of the object, and synthesizing the materials on the 3D model of the object to obtain a corresponding anthropomorphic 3D model of the object, and manually designing different anthropomorphic images for the object is not needed; furthermore, the anthropomorphic 3D model of the object is used as an integral 3D virtual image, the anthropomorphic 3D model is driven according to the interactive content information, so that the anthropomorphic 3D model can be made to perform matched interactive behaviors, different animation actions are not required to be designed manually for the object, the efficiency is high, the period is short, the method is applicable to various objects, the user-defined interactive content information is supported, the multi-mode interaction is supported, and the interactivity is better.

In an alternative embodiment, the user submits an image of the target object to the cloud server. The cloud server identifies attribute information of the target object based on the image of the target object and acquires a 3D model of the target object. In the step S201, when the attribute information of the target object and the 3D model of the object are obtained in response to the interaction request with the target object, the following method is specifically adopted:

responding to an interaction request with a target object, and acquiring an image of the target object; and identifying attribute information of the target object according to the image of the target object, and acquiring an object 3D model of the target object.

Specifically, in response to an interaction request with a target object, the image of the target object sent by the terminal-side device is received, or the image of the target object carried by the terminal-side device is acquired from the interaction request. The image of the target object may be an image of the target object photographed by the user or an image made by the owner of the target object.

Further, identifying attribute information of the target object based on the image of the target object, wherein the attribute information includes at least one of: color, shape, class.

Specifically, the color of the target object, such as the subject color of the target object, or each color coverage area, is determined by visually detecting the image of the target object. And automatically identifying the target object and the background according to the image of the target object, carrying out image segmentation, completely segmenting the target object from the background, and calculating the contour points of the target object to obtain the contour information of the target object. The shape of the target object may be determined from the profile information of the target object. Classification of the image of the target object may be performed to determine a class of the target object, such as SKU (Stock Keeping Unit ) identification, category, etc. identifying the target object.

Further, the object 3D model of the target object is obtained, which can be specifically implemented in the following manner:

according to the image of the target object, detecting the contour information and the 3D geometric information of the target object; and generating a 3D model of the target object according to the contour information and the 3D geometric information of the target object to obtain the 3D model of the object. 3D modeling is carried out on the target object by detecting the contour information and the 3D geometric information of the target object to obtain a 3D model of the target object, and for the object without the 3D model built in advance, the 3D model of the target object can be automatically generated in a 3D modeling mode, so that the method can be applied to any object, and the application range of the interaction method based on the object personification image is wider. The 3D modeling of the target object may be implemented using existing modeling software, which is not described here in detail.

Alternatively, 3D models of various classes of objects may be built and stored in advance. When the object 3D model of the target object is obtained, the object 3D model corresponding to the category of the target object can be obtained from the object 3D models corresponding to the categories of the target object which are built in advance according to the category of the target object, namely, the object 3D model can be obtained, the time for 3D modeling of the object can be saved, the efficiency of obtaining the anthropomorphic image generation of the target object can be improved, and the efficiency of interaction based on the object anthropomorphic image can be improved.

Alternatively, when the object 3D model of the target object is acquired, the object 3D model corresponding to the category of the target object may be acquired from the object 3D models corresponding to the categories of the target object, which are constructed in advance. Under the condition that an object 3D model corresponding to the category of the target object is not found, detecting outline information and 3D geometric information of the target object according to an image of the target object by a 3D modeling method; and generating a 3D model of the target object according to the contour information and the 3D geometric information of the target object.

The scheme of the embodiment supports an application scenario in which a user provides only an image of a target object to a cloud server.

In another alternative embodiment, the user submits the description information of the target object to the cloud server. Illustratively, the description information of the target object may include at least one of the following name, category, and detail page links. In response to an interaction request with the target object, the cloud server invokes attribute information of the target object and the object 3D model based on the description information of the target object.

Specifically, the attribute information of the target object and the 3D model of the object are called, and specifically, any mode of local calling and external calling is included. For example, attribute information of different objects and an object 3D model may be stored in advance on a cloud server, in which case the cloud server may directly acquire the attribute information of the target object and the object 3D model from the local storage space. The attribute information of different objects and the 3D model of the object may be pre-stored on another data service device (such as a server or other devices), and the cloud server may retrieve the attribute information and the 3D model of the object corresponding to the identification information of the target object from the data service device based on the identification information (such as a name or a category) of the target object.

The scheme of the embodiment supports an application scenario in which a user provides one or more pieces of description information of a target object to a cloud server.

In an alternative embodiment, the attribute information of the target object acquired in the above step S201 includes a category and a color. In the step S202, according to the attribute information of the target object, the anthropomorphic 3D material suitable for the target object is generated, which may be specifically implemented in the following manner:

acquiring basic anthropomorphic 3D materials corresponding to a target object; and according to the category and the color of the target object, adjusting the configuration parameters of the basic anthropomorphic 3D material to obtain the anthropomorphic 3D material suitable for the target object. Wherein, anthropomorphic 3D material includes: a five sense organ 3D model and/or a limb 3D model.

For example, a set of anthropomorphic 3D materials (including 3D models of the five sense organs and limbs) may be set as the anthropomorphic 3D materials of the default basis, that is, as the anthropomorphic 3D materials of the basis of any object correspondence. Alternatively, different underlying personified 3D materials may be set for different classes of objects, depending on the class of objects. Or, the objects can be divided into a plurality of major classes, different basic anthropomorphic 3D materials are set for the objects in different major classes, and different objects in the same major class correspond to the same basic anthropomorphic 3D materials.

Optionally, the cloud server may display a preset anthropomorphic 3D material, so that the user may select the anthropomorphic 3D material desired to be used as the anthropomorphic 3D material based on the target object. Responding to the selection operation of a user on any preset anthropomorphic 3D material, and taking the selected anthropomorphic 3D material as a basic anthropomorphic 3D material corresponding to a target object.

Further, the cloud server can adjust configuration parameters of the basic anthropomorphic 3D material according to the type and color of the target object, including but not limited to adjusting the material (such as mapping, reflection, etc.) and shape of the basic anthropomorphic 3D material, so that the adjusted anthropomorphic 3D material can better match the type, color and atmosphere of the target object. For example, mesh deformation is performed on the basic anthropomorphic 3D material by a mesh (mesh) deformer, so that the shape is adjusted.

The scheme of the embodiment can support the user to finely adjust the automatically generated anthropomorphic 3D model of the target object. After the configuration parameters of the basic anthropomorphic 3D material are adjusted according to the category and the color of the target object to obtain the anthropomorphic 3D material suitable for the target object, the cloud server can also display the configuration parameters of the anthropomorphic 3D material of the target object. The user may edit the configuration parameters of the displayed personified 3D material. And responding to the editing operation of the configuration parameters of the anthropomorphic 3D material of the target object, and updating the anthropomorphic 3D material of the target object by the cloud server according to the edited configuration parameters. In addition, one or more configuration parameters with different characteristics (such as thigh, silly lovely, etc.) can be preset, and the user can adjust the configuration parameters of the basic anthropomorphic 3D material to one of the configuration parameters.

In an optional embodiment, in the step S202, the anthropomorphic 3D material is synthesized onto the 3D model of the object to obtain the anthropomorphic 3D model of the target object, which may be implemented specifically as follows:

determining the connection position of the anthropomorphic 3D material on the 3D model of the object according to the shape and contour information of the 3D model of the object; and synthesizing the anthropomorphic 3D material to the corresponding connection position according to the connection position to obtain the anthropomorphic 3D model of the target object.

In this embodiment, the cloud server may automatically anchor the connection position of the anthropomorphic 3D material on the target object according to the attribute information of the target object and the preset proportional parameter, and synthesize the anthropomorphic 3D material onto the corresponding connection position on the 3D model of the object, so as to obtain the anthropomorphic 3D model of the target object.

Specifically, several shape types are divided according to different shapes of the object, different anchoring algorithms/rules are set for the different shape types, one anchoring algorithm/rule containing a set of scale parameters. And when the connection position of the anthropomorphic 3D material on the 3D model of the target object is anchored, determining the shape type of the target object according to the shape of the target object. According to an anchoring algorithm/rule corresponding to the shape type of the target object, namely according to a preset proportional parameter corresponding to the shape type of the target object, determining the connection position of the anthropomorphic 3D material on the 3D model of the object.

Shape types include, but are not limited to, at least one of: vertical tubes, horizontal tubes, spheres, approximate spheres, squares, approximate squares. Wherein, various shape types can be divided according to the aspect ratio of the bounding box of the object. The bounding box of the object may be a five-sided bounding box, which may be understood as a bounding box or bounding box. The geometric center of the hexahedral bounding box is the center of the object. That is, the geometric center of the object is determined as the center of the 3D model of the object. The length and width of the object 3D model refer to the length (horizontal direction) and width (vertical direction) of the front face (corresponding to the front view angle of the object) of the hexahedral bounding box of the object 3D model, and the aspect ratio of the object 3D model refers to the ratio of the length (horizontal direction) and width (vertical direction) of the front face (corresponding to the front view angle of the object) of the hexahedral bounding box of the object 3D model. The front surface of the hexahedral bounding box, that is, the front surface of the object, may be set according to different objects. For an object without a significant front face, one face of the hexahedral bounding box may be randomly selected to set the front face, or one face with a larger area may be set as the front face.

The preset proportion parameters corresponding to any shape type comprise a proportion range of the distance between the connection position of each anthropomorphic 3D material on the object 3D model and the side face of the bounding box of the object 3D model to the length of the bounding box, and a proportion range of the distance between the connection position and the top face and the bottom face of the bounding box of the object 3D model to the width of the bounding box.

Optionally, different anchoring algorithms/rules are set for different shape types, where one anchoring algorithm/rule includes a set of proportion parameters, and may further include a preset anchoring rule, for example, a connection region of the five sense organs cannot cover a preset pattern on the object, such as a logo pattern like a trademark, an object-specific functional region, and the like; the difference between the colors in the corresponding connection area and the surrounding area of the anthropomorphic 3D material and the colors of the anthropomorphic 3D material reach a preset difference threshold value, so that the colors of the anthropomorphic 3D material and the surrounding area have larger contrast. The preset difference threshold can be set according to the requirements of actual application scenes, different personified 3D materials of different types of objects can be set to different preset difference thresholds, and different personified 3D materials can also correspond to different preset difference thresholds.

For example, for vertical tubular objects, such as lipstick, beverage bottle, etc., the connection area of the five sense organ 3D model may be selected at the lower half of the lipstick, and the five sense organ 3D model with a color having a larger contrast with the color of the lower half of the lipstick body is selected; the connection area of the five sense organs 3D model avoids the trademark pattern of the beverage bottle. For transverse tubes, such as tissue boxes, the attachment points of the upper limbs are selected to be relatively far apart on either side of the tissue box, and the attachment points of the upper limbs can be positioned relatively close to the middle. For objects with small aspect ratio and shape types (such as spherical, approximately spherical, square, approximately square), such as strawberries, apples and the like, the connection area of the five-sense organ 3D model can be selected at a position close to the middle and uniformly distributed on the front surface of the object 3D model.

In addition, any anchoring algorithm/rule comprises preset proportion parameters and preset anchoring rules, and the user is supported to define the proportion parameters and the anchoring rules, or a new fixed algorithm/rule is generated by adjusting the preset proportion parameters and the anchoring rules.

In this embodiment, the size of the object is consistent with the size of the 3D model of the object, or the size of the object is scaled down or scaled up in proportion to the size of the 3D model of the object. Optionally, before the anthropomorphic 3D material is synthesized to the corresponding connection position, the size of the anthropomorphic 3D material can be adjusted according to the connection position, so that the size of the anthropomorphic 3D material is matched with the size of the 3D model of the object, and the size of the anthropomorphic 3D material on the anthropomorphic 3D model of the generated object is more reasonable and coordinated.

Optionally, according to the connection position, the anthropomorphic 3D material is synthesized to the corresponding connection position, and before the anthropomorphic 3D model of the target object is obtained, the cloud server can also display the connection position of the anthropomorphic 3D material on the object 3D model, so that a user can adjust the connection position of any anthropomorphic 3D material on the object 3D model. Further, in response to an adjustment operation of the connection position of the anthropomorphic 3D material on the object 3D model, the cloud server updates the connection position of the anthropomorphic 3D material on the object 3D model.

Optionally, according to the connection position, the cloud server may further perform color adjustment on the connection position and/or color of the anthropomorphic 3D material in the surrounding area of the connection position of the anthropomorphic 3D material on the 3D model of the object before synthesizing the anthropomorphic 3D material to the corresponding connection position to obtain the anthropomorphic 3D model of the target object.

For example, the connection position corresponding to the anthropomorphic 3D material is adjusted according to the color of the surrounding area of the connection position of the anthropomorphic 3D material on the 3D model of the object, so that the color of the surrounding area of the anthropomorphic 3D material and the corresponding connection position have larger contrast.

For example, the colors of the anthropomorphic 3D material are adjusted according to the colors of the surrounding areas of the connection positions of the anthropomorphic 3D material on the 3D model of the object, so that the colors of the surrounding areas of the anthropomorphic 3D material and the corresponding connection positions have larger contrast.

Optionally, according to the connection position, the anthropomorphic 3D material is synthesized to the corresponding connection position, after the anthropomorphic 3D model of the target object is obtained, the cloud server can display the anthropomorphic 3D model, and the user is supported to adjust the connection position of the anthropomorphic 3D material on the object 3D model. Further, in response to an adjustment operation of the connection location of the personified 3D material on the personified 3D model, the cloud server updates the personified 3D model.

In this embodiment, after the final anthropomorphic 3D model of the target object is obtained, the anthropomorphic 3D model is driven as an overall avatar to perform a corresponding action. When the lens or the scene of the interaction page rotates, the anthropomorphic 3D material and the 3D of the object are automatically tracked and aligned, and the anthropomorphic 3D model is used as an integral virtual image to follow the rotation of the lens or the scene.

In an alternative embodiment, in the real-time interaction scenario, in step S201, the interactive content information of the target object is obtained in response to the interaction request with the target object, which may be specifically implemented in the following manner:

In response to a first interaction request with a target object, input dialog content is obtained, along with multimodal knowledge data of at least one source of the target object. Wherein the first interactive request is a real-time interactive request. The request contains information of the target object and dialogue content entered by the user.

And the cloud server acquires dialogue content input by the user, for example, the cloud server receives dialogue text sent by the user through the terminal side equipment, and the dialogue text is the dialogue content. Or receiving dialogue audio sent by the user through the terminal side equipment, and converting the dialogue audio into dialogue text to obtain dialogue content. The cloud server can determine the target object according to the information of the target object, acquire attribute information of the target object and an object 3D model, generate anthropomorphic 3D materials suitable for the target object according to the attribute information of the target object, and synthesize the anthropomorphic 3D materials onto the object 3D model to obtain an anthropomorphic 3D model of the target object. The cloud server can determine the target object according to the information of the target object, and can acquire multi-mode knowledge data of at least one source of the target object. The cloud server may obtain multimodal knowledge data of at least one source of the target object from a local storage space or other data service platform.

Further, in the step S203, the anthropomorphic 3D model is displayed in the interaction page, and the anthropomorphic 3D model is driven to execute the interaction behavior matched with the interaction content information, which may be specifically implemented in the following manner:

calling a dialogue model, and generating reply information of dialogue content according to multi-mode knowledge data of at least one source of a target object; and displaying the anthropomorphic 3D model of the target object in the first interaction page, and driving the anthropomorphic 3D model to play the action of broadcasting the reply information according to the reply information. Wherein the action of broadcasting the reply information includes at least one of an expression, a mouth shape, and a limb action.

Wherein the multi-modal knowledge data of at least one source of the target object includes, but is not limited to: common question-answering (FAQ) knowledge base, attribute knowledge, title, knowledge graph, comment data.

The dialogue model for generating the reply information of the dialogue content may employ any dialogue model supporting multi-modal interaction. By way of example, a knowledge, personality, emotion, and memory based four-in-one, unified dialog model, such as a generative model, may be used. The large unified dialogue model collects mass data, has a certain degree of general dialogue capability, has multi-round and multi-source heterogeneous knowledge dialogue capability, and can be accessed into various multi-source heterogeneous knowledge such as FAQ knowledge base, attribute knowledge, title, knowledge map, comment data and the like related to a target object to conduct question-answering and dialogue.

For example, fig. 4 shows an exemplary diagram of real-time interaction, and as shown in fig. 4, the target object is a commodity of pumpkin seed, and the dialogue model generates reply information of the dialogue content input by the user based on heterogeneous multimodal indication data of multiple sources such as FAQ, commodity attributes, commodity titles, point of interest (Point Of Interest, POI) labels, commodity comments, and the like.

Further, after the reply information of the dialogue content is obtained, displaying the anthropomorphic 3D model of the target object in the first interaction page, and driving the anthropomorphic 3D model to play the action of broadcasting the reply information according to the reply information through the AI driving engine. The answer information of the dialogue content can be text or voice, and the AI driving engine can drive the personified 3D model to make reasonable expression, mouth shape and limb action based on the input text or voice. The AI-driven engine may be implemented using AI-driven algorithms that support text-and/or voice-driven virtual characters, which are not described in detail herein. The AI driving algorithm drives the anthropomorphic 3D model of the target object to make interaction behaviors matched with the response information according to the input response information, and the same target object can be self-adaptive to different response information, so that different interaction behaviors can be made by the anthropomorphic 3D model, the output of the AI driving algorithm can be automatically adapted to different objects and different styles of interaction contents of the same object, the interaction behaviors of the anthropomorphic image of the object are more diversified and anthropomorphic, and the matching degree with the interaction contents is higher.

In an alternative embodiment, in the scenario of the anthropomorphic broadcast video of the customized object, in step S201, the interactive content information of the target object is obtained in response to the request for interaction with the target object, which may be specifically implemented in the following manner:

and responding to a second interaction request with the target object, and acquiring the input broadcasting content. The second interaction request is an interaction request for acquiring the anthropomorphic broadcast video of the target object. The request contains information of the target object and broadcast content input by the user.

The cloud server can determine the target object according to the information of the target object, acquire attribute information of the target object and an object 3D model, generate anthropomorphic 3D materials suitable for the target object according to the attribute information of the target object, and synthesize the anthropomorphic 3D materials onto the object 3D model to obtain an anthropomorphic 3D model of the target object. And the cloud server acquires the broadcasting content input by the user, and if the cloud server receives the broadcasting content text sent by the user through the terminal side equipment, the broadcasting content is obtained. Or receiving broadcasting content audio sent by the user through the terminal side equipment to obtain broadcasting content.

And displaying the anthropomorphic 3D model of the target object in the second interaction page, and driving the anthropomorphic 3D model to play the broadcasting content according to the broadcasting content. The action of broadcasting the broadcasting content comprises at least one of expression, mouth shape and limb action.

In this embodiment, after the broadcast content is obtained and the anthropomorphic 3D model of the target object is generated, the anthropomorphic 3D model of the target object is displayed in the second interactive page, and the action of broadcasting the broadcast content is performed by driving the anthropomorphic 3D model according to the broadcast content through the AI driving engine. The broadcasting content can be text or voice, and the AI driving engine can drive the anthropomorphic 3D model to make reasonable expression, mouth shape and limb actions based on the input text or voice. The AI-driven engine may be implemented using AI-driven algorithms that support text-and/or voice-driven virtual characters, which are not described in detail herein. And driving the anthropomorphic 3D model of the target object to make interaction behaviors matched with the broadcasting contents according to the input broadcasting contents by an AI driving algorithm, and self-adapting different broadcasting contents for the same target object so that the anthropomorphic 3D model makes different interaction behaviors. The output of the AI driving algorithm can be automatically adapted to interactive contents of different objects and different styles of the same object, so that the interactive actions of the object personification image in the broadcasting video based on the object personification image are more diversified and personified, and the matching degree with the broadcasting content is higher.

In an alternative embodiment, in the scenario where the anthropomorphic image of the customized object follows the video danced by the specified audio data (such as music), in the step S201, the interactive content information of the target object is obtained in response to the request for interaction with the target object, which may be specifically implemented as follows:

and responding to a third interaction request with the target object, and acquiring input audio data. The third interaction request is an interaction request for acquiring a video in which the personified image of the target object dances along with the appointed audio data. The request contains information of the target object and audio data input by the user. The audio data may be a piece of music or a person's recorded voice.

The cloud server can determine the target object according to the information of the target object, acquire attribute information of the target object and an object 3D model, generate anthropomorphic 3D materials suitable for the target object according to the attribute information of the target object, and synthesize the anthropomorphic 3D materials onto the object 3D model to obtain an anthropomorphic 3D model of the target object.

And displaying the anthropomorphic 3D model of the target object in the third interaction page, and driving the anthropomorphic 3D model to make dance movements matched with the audio data according to the audio data, wherein the dance movements comprise at least one of expressions, mouth shapes and limb movements.

In this embodiment, after audio data is obtained and a anthropomorphic 3D model of the target object is generated, the anthropomorphic 3D model of the target object is displayed in the third interactive page, and the anthropomorphic 3D model is driven by the AI driving engine according to the audio data to make dance movements matched with the audio data. The AI-driven engine may be implemented using an AI-driven algorithm that supports audio data driving of virtual characters, which is not described in detail herein. And driving the anthropomorphic 3D model of the target object to make dance movements matched with the audio data according to the input audio data by an AI driving algorithm, so as to realize the function of anthropomorphic image dancing of the music driving target object.

Based on the scheme provided by any method embodiment, one or more target objects can be provided, and the interaction content information of any target object can be determined based on the input data of a user or based on the output data of other virtual figures (such as virtual figures and anthropomorphic figures of objects) so as to realize the linkage interaction effect of various objects.

In an alternative embodiment, by using a multi-modal AI-driven engine that combines deep learning and motion map algorithms, reasonable expressions, mouth shapes, limb movements can be generated from text, speech, video, or random signal inputs, including directions, dialogue interactive behaviors (call, smile), dancing on music, etc. The AI driving engine can be automatically adapted to interactive contents of different objects and different modes, so that the interactive actions based on the object personification image are more diversified and personified, and the matching degree with the interactive contents is higher. The animation based on the target object personification image presented by the AI driving engine is dynamically generated according to the characteristics of the object image and input information (text, image, video and setting), and the animation produced by the mode is not monotonous repeated play of a plurality of prefabricated actions, but has higher diversity, can be more matched with the object, and improves the expressive force of the object personification.

Fig. 5 is an interactive flowchart of a customized anthropomorphic broadcast animation scenario according to an exemplary embodiment of the present application. As shown in fig. 5, the method specifically comprises the following steps:

And step S501, responding to a personification request of the terminal side equipment on the target object, and acquiring attribute information of the target object, the 3D model of the object and broadcasting content.

In this embodiment, a user sends a personification request for a target object to a cloud server through a used terminal device, where the request may be a request for obtaining a personification broadcasting animation of the target object, and the user inputs broadcasting content and sends the broadcasting content to the cloud server through the terminal device. The personification request for the target object carries information of the target object, such as a name, a code, a category, and the like.

Responding to a personification request of the terminal side equipment on the target object, acquiring information of the target object by the cloud server, determining the target object according to the information of the target object, and acquiring attribute information of the target object and an object 3D model. And the cloud server receives the broadcasting content sent by the terminal side equipment. The broadcast content may be text, voice, video, etc. to be broadcast. Wherein the attribute information of the object includes, but is not limited to, color, category, shape of the object.

Step S502, according to the attribute information of the target object, generating a personified 3D material suitable for the target object, and synthesizing the personified 3D material onto the 3D model of the object to obtain a personified 3D model of the target object.

The specific implementation manner of this step is consistent with the implementation manner of step S202 described above, and the implementation manner of step S202 provided in any of the foregoing embodiments may be adopted, and details related to step S202 in the foregoing embodiments are specifically referred to and will not be described herein.

And step S503, displaying the anthropomorphic 3D model in the interactive page, driving the anthropomorphic 3D model to finish the interactive behavior of broadcasting the broadcasting content, and generating a broadcasting animation.

In this step, the specific implementation manner of displaying the anthropomorphic 3D model in the interactive page and driving the anthropomorphic 3D model to complete the interactive behavior of broadcasting the broadcast content is consistent with the implementation manner of the step S203, and the implementation manner of the step S203 provided in any one of the foregoing embodiments may be adopted, and details related to the step S203 in the foregoing embodiments are specifically referred to and will not be described herein.

In the embodiment, in the process of displaying the anthropomorphic 3D model in the interactive page and driving the anthropomorphic 3D model to finish broadcasting the interactive behavior of the broadcasting content, the picture of the interactive behavior of the anthropomorphic 3D model of the target object in the interactive page is saved as video data in a screen capturing or recording mode, so as to obtain the broadcasting animation.

And step S504, responding to the confirmation operation of the broadcasting animation, and providing the broadcasting animation for the terminal side equipment.

After generating the broadcast animation, the cloud server can display the broadcast animation through another interactive page for the user to watch. After the user confirms that the broadcasting animation meets the self requirement, the broadcasting animation can be obtained from the cloud server by triggering the confirmation operation of the broadcasting animation.

And responding to the confirmation operation of the broadcasting animation, and providing the broadcasting animation to the terminal side equipment by the cloud server. For example, the cloud server transmits a broadcast animation to the end-side device; or the cloud server provides a download link of the broadcasting animation, so that the terminal side equipment acquires the broadcasting animation according to the download link download.

The scheme of the embodiment is applied to the scene of the anthropomorphic broadcasting animation of the object customized by the user through the terminal side equipment, based on the scheme, the cloud server can automatically generate anthropomorphic 3D models of various objects, drive the anthropomorphic 3D models of the objects to make corresponding interaction behaviors according to broadcasting contents input by the user, generate broadcasting videos and provide the broadcasting videos for the terminal side equipment, different animation actions are not required to be designed manually for the objects, the efficiency is high, the period is short, the cloud server is applicable to various objects, and the user-defined broadcasting contents are supported. According to the scheme of the embodiment, the broadcasting animation based on the anthropomorphic image of the object presented by the AI drive is dynamically generated according to the attribute characteristics of the object and the input broadcasting content, the animation produced by the mode is not monotone repeated broadcasting of a plurality of prefabricated actions, but has higher diversity, can be matched with the characteristics of the object more, improves the expressive force of the object after anthropomorphic, ensures that the broadcasting animation content based on the object anthropomorphic image is richer and more vivid, and improves the efficiency and quality of the generation of the broadcasting animation of the object anthropomorphic image.

Fig. 6 is a flowchart of a real-time interaction method based on an avatar image according to an exemplary embodiment of the present application. As shown in fig. 6, the method specifically comprises the following steps:

step S601, providing a real-time interaction page.

In this embodiment, the cloud server provides a real-time interaction page, and human-computer interaction between the user (through the used terminal device) and the interaction system is realized through the real-time interaction page. The real-time interaction page can be an interaction page in a real-time interaction scene such as intelligent customer service, live broadcast and the like.

In the step S602, in the real-time interaction process, attribute information of a target object, an object 3D model and interaction content information are acquired in response to an interaction request with the target object.

In different real-time interaction scenarios, the interaction request with the target object may be a different request. For example, in an intelligent customer service scenario, the request for interaction with the target object may be that the user sent a link to the target object to the intelligent customer service, or that the user posed a question related to the target object to the intelligent customer service, or that the user sent a picture of the target object to the intelligent customer service, or that the user sent various information in the connection, question, and picture of the target object to the intelligent customer service. In a live scenario, the request for interaction with the target object may be a request by the anchor/virtual anchor to initiate interaction with the user by the specified target object using the personification image, e.g. the anchor/virtual anchor issues a preset conversation, such as "please XXX object", "see this XXX in me hand", etc., where "XXX" refers to identification information such as object name or code. For example, the anchor/virtual anchor makes a preset gesture, such as pointing to an object, etc., on a target object in the live scene.

The request for interaction with the target object may carry information of the target object, such as a name, a code, a category, a picture, and the like. And the cloud server can identify and determine the target object according to the information of the target object and acquire attribute information of the target object and the 3D model of the object.

In this embodiment, the interactive content information refers to content information output to the user based on the interactive request target object. The interactive content information is reply content information to the user input data generated by the cloud server from the input data. The input data of the user may include at least one of: an image of the target object and descriptive information of the target object. The specific implementation manner of acquiring the interactive content information of the target object according to the input data is specifically referred to the implementation manner of acquiring the content interactive information in the real-time interactive scene in step S201 in the foregoing embodiment, which is not described herein again.

And step S603, according to the attribute information of the target object, generating a anthropomorphic 3D material suitable for the target object, and synthesizing the anthropomorphic 3D material onto the 3D model of the object to obtain the anthropomorphic 3D model of the target object.

The specific implementation manner of this step is consistent with the implementation manner of step S202, and the implementation manner of step S202 in the real-time interaction scenario in the foregoing embodiment may be adopted, which is specifically referred to the content related to the foregoing embodiment and will not be repeated here.

And step S604, displaying the anthropomorphic 3D model in the real-time interaction page, and driving the anthropomorphic 3D model to execute interaction behavior matched with the interaction content information.

In the embodiment, after the interactive content information is obtained and the anthropomorphic 3D model of the target object is generated, the anthropomorphic 3D model is displayed in the real-time interactive page, the AI driving engine is utilized to drive the anthropomorphic 3D model to execute the interactive behavior matched with the interactive content information, so that the interactive behavior of the target object can be adaptively matched with different content interactive information, the output of the AI driving can be automatically adapted to the interactive content information of different objects and different styles of the same object, the interactive behavior of the anthropomorphic image of the object is more diversified and anthropomorphic, and the matching degree with the interactive content information is higher, thereby improving the interactive quality based on the anthropomorphic image of the object.

Fig. 7 is a schematic structural diagram of an interaction device according to an exemplary embodiment of the present application. The interaction device provided by the embodiment of the application can execute the processing flow provided by the interaction method embodiment. As shown in fig. 7, the interaction device 70 includes: an information acquisition module 71, a personification processing module 72 and a driving module 73.

The information acquisition module 71 is configured to acquire attribute information of a target object, an object 3D model, and interaction content information in response to an interaction request for the target object.

The personification processing module 72 is configured to generate a personification 3D material suitable for the target object according to the attribute information of the target object, and synthesize the personification 3D material onto the object 3D model to obtain a personification 3D model of the target object.

The driving module 73 is used for displaying the anthropomorphic 3D model in the interaction page and driving the anthropomorphic 3D model to execute the interaction behavior matched with the interaction content information.

In an alternative embodiment, when implementing the acquisition of the attribute information of the target object, the 3D model of the object in response to the request for interaction with the target object, the information acquisition module 71 is further configured to:

In an alternative embodiment, in implementing the object 3D model of the acquisition target object, the information acquisition module 71 is further configured to:

according to the image of the target object, detecting the contour information and the 3D geometric information of the target object; and generating a 3D model of the target object according to the contour information and the 3D geometric information of the target object to obtain the 3D model of the object.

responding to an interaction request with a target object, and acquiring description information of the target object; and according to the description information of the target object, the attribute information of the target object and the 3D model of the object are called.

In an alternative embodiment, the attribute information of the target object includes a category and a color, and when implementing generating the personified 3D material applicable to the target object according to the attribute information of the target object, the personification processing module 72 is further configured to:

acquiring basic anthropomorphic 3D materials corresponding to a target object; according to the category and the color of the target object, the configuration parameters of the basic anthropomorphic 3D material are adjusted to obtain the anthropomorphic 3D material suitable for the target object, and the anthropomorphic 3D material comprises: a five sense organ 3D model and/or a limb 3D model.

In an alternative embodiment, after adjusting the configuration parameters of the basic personified 3D material according to the category and color of the target object, the personified processing module 72 is further configured to:

and responding to the editing operation of the configuration parameters of the anthropomorphic 3D material of the target object, and updating the anthropomorphic 3D material of the target object according to the edited configuration parameters.

In an alternative embodiment, when implementing the synthesis of the personified 3D material onto the 3D model of the object, the personified processing module 72 is further configured to:

In an alternative embodiment, before synthesizing the anthropomorphic 3D material to the corresponding connection location according to the connection location, the anthropomorphic processing module 72 is further configured to:

and in response to the adjustment operation of the connection position of the anthropomorphic 3D material on the 3D model of the object, updating the connection position of the anthropomorphic 3D material on the 3D model of the object.

In an alternative embodiment, in response to the request for interaction with the target object, the information obtaining module 71 is further configured to, when obtaining the interaction content information of the target object:

in response to a first interaction request with a target object, input dialog content is obtained, along with multimodal knowledge data of at least one source of the target object.

The driving module 73 is further configured to, when displaying the anthropomorphic 3D model in the interaction page and driving the anthropomorphic 3D model to perform the interaction behavior matching the interaction content information:

calling a dialogue model, and generating reply information of dialogue content according to multi-mode knowledge data of at least one source of a target object; and displaying the anthropomorphic 3D model of the target object in the first interaction page, and driving the anthropomorphic 3D model to play back the action of the response information according to the response information, wherein the action of playing back the response information comprises at least one of expression, mouth shape and limb action.

and responding to a second interaction request with the target object, and acquiring the input broadcasting content.

displaying the anthropomorphic 3D model of the target object in the second interaction page, and driving the anthropomorphic 3D model to make an action of broadcasting the broadcasting content according to the broadcasting content, wherein the action of broadcasting the broadcasting content comprises at least one of expression, mouth shape and limb action.

and responding to a third interaction request with the target object, and acquiring input audio data.

The device provided in the embodiment of the present application may be specifically configured to execute the interaction method provided in any of the foregoing method embodiments, and specific functions and technical effects that can be achieved are not described herein.

Fig. 8 is a schematic structural diagram of a cloud server according to an embodiment of the present application. As shown in fig. 8, the cloud server includes: a memory 801 and a processor 802. Memory 801 for storing computer-executable instructions and may be configured to store various other data to support operations on cloud servers. The processor 802 is communicatively connected to the memory 801, and is configured to execute computer-executable instructions stored in the memory 801, so as to implement the technical solution provided in any one of the above method embodiments, and the specific functions and the technical effects that can be implemented are similar, and are not repeated herein.

Optionally, as shown in fig. 8, the cloud server further includes: firewall 803, load balancer 804, communication component 805, power component 806, and other components. Only some components are schematically shown in fig. 8, which does not mean that the cloud server only includes the components shown in fig. 8.

The embodiment of the application further provides a computer readable storage medium, in which computer executable instructions are stored, and when the computer executable instructions are executed by a processor, the computer executable instructions are used to implement the scheme provided by any one of the method embodiments, and specific functions and technical effects that can be implemented are not described herein.

The embodiment of the application also provides a computer program product, which comprises: the computer program is stored in a readable storage medium, and the computer program can be read from the readable storage medium by at least one processor of the cloud server, so that the at least one processor executes the computer program to enable the cloud server to execute the scheme provided by any one of the method embodiments, and specific functions and technical effects that can be achieved are not repeated herein. The embodiment of the application provides a chip, which comprises: the processing module and the communication interface can execute the technical scheme of the cloud server in the embodiment of the method. Optionally, the chip further includes a storage module (e.g. a memory), where the storage module is configured to store the instructions, and the processing module is configured to execute the instructions stored in the storage module, and execution of the instructions stored in the storage module causes the processing module to execute the technical solution provided in any one of the foregoing method embodiments.

The memory may be an object store (Object Storage Service, OSS).

The memory may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The communication component is configured to facilitate wired or wireless communication between the device in which the communication component is located and other devices. The device where the communication component is located may access a wireless network based on a communication standard, such as a mobile hotspot (WiFi), a mobile communication network of a second generation mobile communication system (2G), a third generation mobile communication system (3G), a fourth generation mobile communication system (4G)/Long Term Evolution (LTE), a fifth generation mobile communication system (5G), or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

The power supply component provides power for various components of equipment where the power supply component is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, compact disk read-only memory (CD-ROM), optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should be noted that, the user information (including but not limited to user equipment information, user attribute information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with related laws and regulations and standards, and provide corresponding operation entries for the user to select authorization or rejection.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations appearing in a particular order are included, but it should be clearly understood that the operations may be performed out of order or performed in parallel in the order in which they appear herein, merely for distinguishing between the various operations, and the sequence number itself does not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types. The meaning of "a plurality of" is two or more, unless specifically defined otherwise.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards, and provide corresponding operation entries for the user to select authorization or rejection.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. An interaction method, comprising:

2. The method of claim 1, wherein obtaining attribute information of a target object, an object 3D model, in response to an interaction request with the target object, comprises:

responding to an interaction request with a target object, and acquiring an image of the target object;

and identifying attribute information of the target object according to the image of the target object, and acquiring an object 3D model of the target object.

3. The method of claim 2, wherein the acquiring the object 3D model of the target object comprises:

detecting contour information and 3D geometric information of the target object according to the image of the target object;

and generating a 3D model of the target object according to the contour information and the 3D geometric information of the target object to obtain the 3D model of the object.

4. The method according to claim 1, wherein the obtaining attribute information of the target object, the object 3D model, in response to the interaction request with the target object, comprises:

Responding to an interaction request with a target object, and acquiring description information of the target object;

and according to the description information of the target object, the attribute information of the target object and the 3D model of the object are called.

5. The method according to any one of claims 1 to 4, wherein the attribute information of the target object includes a category and a color,

the generating the anthropomorphic 3D material applicable to the target object according to the attribute information of the target object comprises the following steps:

acquiring basic anthropomorphic 3D materials corresponding to the target object;

according to the category and the color of the target object, adjusting the configuration parameters of the basic anthropomorphic 3D material to obtain the anthropomorphic 3D material suitable for the target object, wherein the anthropomorphic 3D material comprises: a five sense organ 3D model and/or a limb 3D model.

6. The method according to claim 5, wherein the adjusting the configuration parameters of the basic anthropomorphic 3D material according to the category and color of the target object, after obtaining the anthropomorphic 3D material suitable for the target object, further comprises:

7. The method according to any one of claims 1-4, wherein said synthesizing the personified 3D material onto the object 3D model results in a personified 3D model of the target object, comprising:

determining the connection position of the anthropomorphic 3D material on the 3D model of the object according to the shape and contour information of the 3D model of the object;

and synthesizing the anthropomorphic 3D material to the corresponding connection position according to the connection position to obtain the anthropomorphic 3D model of the target object.

8. The method according to claim 7, wherein the synthesizing the anthropomorphic 3D material to the corresponding connection location according to the connection location further comprises, before obtaining the anthropomorphic 3D model of the target object:

9. The method according to any one of claims 1-4, wherein obtaining interactive content information of a target object in response to an interactive request with the target object comprises:

responding to a first interaction request with a target object, and acquiring input dialogue content and multi-modal knowledge data of at least one source of the target object;

Displaying the personified 3D model in an interaction page, and driving the personified 3D model to execute interaction behavior matched with the interaction content information, wherein the method comprises the following steps:

calling a dialogue model, and generating reply information of the dialogue content according to the multi-mode knowledge data of at least one source of the target object;

displaying the anthropomorphic 3D model of the target object in a first interaction page, and driving the anthropomorphic 3D model to play back the reply information according to the reply information, wherein the action of playing back the reply information comprises at least one of expression, mouth shape and limb action.

10. The method according to any one of claims 1-4, wherein obtaining interactive content information of a target object in response to an interactive request with the target object comprises:

responding to a second interaction request with the target object, and acquiring input broadcasting content;

displaying the anthropomorphic 3D model of the target object in a second interaction page, and driving the anthropomorphic 3D model to make an action of broadcasting the broadcasting content according to the broadcasting content, wherein the action of broadcasting the broadcasting content comprises at least one of expression, mouth shape and limb action.

11. The method according to any one of claims 1-4, wherein obtaining interactive content information of a target object in response to an interactive request with the target object comprises:

responding to a third interaction request with the target object, and acquiring input audio data;

and displaying the anthropomorphic 3D model of the target object in a third interaction page, and driving the anthropomorphic 3D model to make a dance motion matched with the audio data according to the audio data, wherein the dance motion comprises at least one of expression, mouth shape and limb motion.

12. An interaction method, comprising:

13. An interaction method, comprising:

14. A cloud server, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-13.