CN117726728A

CN117726728A - Avatar generation method, device, electronic equipment and storage medium

Info

Publication number: CN117726728A
Application number: CN202410033947.6A
Authority: CN
Inventors: 吴昊潜; 李林橙; 吴佳阳; 陈伟杰; 武蕴杰; 范长杰; 胡志鹏
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2024-01-09
Filing date: 2024-01-09
Publication date: 2024-03-19

Abstract

The application provides an avatar generation method, an avatar generation device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a description text for creating an avatar; tracking an editing intention for the avatar based on the acquired descriptive text, and generating a reply text; and displaying a reply text and an avatar generated according to the editing intention, wherein the reply text is used for indicating the generated avatar to carry out editing response on the descriptive text. According to the interactive virtual image generation scheme, the operation of a user can be simplified, and interactive response can be timely made.

Description

Avatar generation method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer applications, and in particular, to a method and apparatus for generating an avatar, an electronic device, and a storage medium.

Background

The "pinching face" provides the virtual character image customizing function, and is an important module in role playing games, augmented reality and meta universe. The method allows the user to adjust the skeleton position, the makeup attribute and the like of the virtual character so as to meet the personalized customization requirement of the user, and greatly enhances the immersive experience of the user.

However, hundreds of character parameters, while providing a great degree of freedom in character creation, also increase the operational burden on the user. Currently, the technology for automatically generating the virtual character is rapidly developed, so that a user can generate the virtual character through simple input, such as input images, texts and the like, thereby helping the user save time and further improving playability.

The face pinching scheme in the related art is usually non-interactive, and needs a user to perform one-time complete description on a desired character image, and then directly generate a character customization result, but the complete description of the character image has higher burden on the user, and when the text description content is longer and more complex, the situation that the deviation between the generation result and the description is overlarge easily occurs.

Disclosure of Invention

In view of the foregoing, embodiments of the present application provide at least a method, apparatus, electronic device, and storage medium for generating an avatar, so as to overcome at least one of the above-mentioned drawbacks.

In a first aspect, exemplary embodiments of the present application provide an avatar generation method, the method including: acquiring a description text for creating an avatar; tracking an editing intention for the avatar based on the acquired descriptive text, and generating a reply text; and displaying the reply text and the avatar generated according to the editing intention, wherein the reply text is used for indicating the editing response of the generated avatar to the descriptive text.

In a second aspect, embodiments of the present application further provide an avatar generating apparatus, including: the acquisition module is used for acquiring a description text for creating the virtual image; the recognition module is used for tracking the editing intention aiming at the virtual image based on the acquired description text and generating a reply text; and the display module displays the reply text and the avatar generated according to the editing intention, wherein the reply text is used for indicating the editing response of the generated avatar to the descriptive text.

In a third aspect, embodiments of the present application further provide an electronic device, a processor, a storage medium, and a bus, where the storage medium stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor communicates with the storage medium through the bus, and the processor executes the machine-readable instructions to perform the steps of the avatar generation method described above.

In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the avatar generation method described above.

According to the virtual image generation method, the device, the electronic equipment and the storage medium, the operation of a player can be simplified, so that the input burden of the user is reduced, the processing efficiency of character creation can be improved through timely interaction response, and therefore a more natural, more convenient and more accurate character creation interaction scheme can be provided for the user.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 illustrates a flowchart of an avatar generation method provided in an exemplary embodiment of the present application;

FIG. 2 illustrates a flowchart of the language identification steps for descriptive text provided in an exemplary embodiment of the present application;

FIG. 3 illustrates a flow chart of a multi-round interaction provided by an exemplary embodiment of the present application;

FIG. 4 illustrates a flowchart of steps for training a correlation prediction model provided by exemplary embodiments of the present application;

FIG. 5 illustrates a flowchart of steps provided in exemplary embodiments of the present application for training a micro-renderer;

FIG. 6 shows a flowchart of steps for determining a priori loss provided by exemplary embodiments of the present application;

fig. 7 illustrates a schematic structure of an avatar generating apparatus provided in an exemplary embodiment of the present application;

fig. 8 shows a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the accompanying drawings in the present application are only for the purpose of illustration and description, and are not intended to limit the protection scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this application, illustrates operations implemented according to some embodiments of the present application. It should be appreciated that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to the flow diagrams and one or more operations may be removed from the flow diagrams as directed by those skilled in the art.

The terms "a," "an," "the," and "said" are used in this specification to denote the presence of one or more elements/components/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. in addition to the listed elements/components/etc.; the terms "first" and "second" and the like are used merely as labels, and are not intended to limit the number of their objects.

It should be understood that in embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or" is merely an association relationship describing an association object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. "comprising A, B and/or C" means comprising any 1 or any 2 or 3 of A, B, C.

It should be understood that in the embodiments of the present application, "B corresponding to a", "a corresponding to B", or "B corresponding to a", means that B is associated with a, from which B may be determined. Determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.

In addition, the described embodiments are only some, but not all, of the embodiments of the present application. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

However, hundreds of character parameters, while providing a great degree of freedom in character creation, also increase the operational burden on the user. Currently, automatic virtual character generation technology is rapidly developed, allowing a user to generate a virtual character through simple input, for example, generating a virtual character based on an input image or text, helping the user save time, and further improving playability.

The face pinching scheme in the related art is typically single-wheeled, non-interactive. For example, a user is required to make a complete description of a desired character figure at a time and then directly generate a character customization result, but the above-mentioned pinching face has the following drawbacks:

(1) The character is completely described once, and a large input load is brought to a user. For situations where the user is unfamiliar with the face pinching scheme, the user cannot enter accurate descriptive content at a time, which may require the user to try repeatedly to obtain the desired generation result. In addition, if the user input is not complete and accurate enough, deviation of the generated result may be caused.

(2) When the character image generated by the algorithm deviates from the description or the user's expectation, the character image cannot be further adjusted and only a long complete description content can be input again.

(3) When the text description is long and complex, the algorithm is also extremely challenged, and the situation that the generated result and the description deviate too much is easy to occur.

In addition, the face pinching schemes in the related art have the problems of overlong operation time and non-robust operation effect, and abnormal results which do not accord with natural distribution of faces are easy to generate. The above problems occur mainly because an inefficient parameter searching mode is adopted in the related scheme, so that the operation efficiency is low. In addition, the related scheme is to generate parameters under unconstrained space, and lacks the constraint of prior distribution, so that abnormal results which do not accord with the natural distribution of the face are easy to generate.

In view of at least one of the above problems, the present application proposes an interactive avatar generation scheme to reduce the input burden of a user, which helps to make the character creation process more natural, more convenient and more accurate.

First, names involved in the embodiments of the present application will be described.

Terminal equipment:

the terminal device in this embodiment of the present application mainly refers to an electronic device capable of providing a User Interface (User Interface) to implement man-machine interaction, and in an exemplary application scenario, the terminal device may be used to provide game images (e.g., a relevant setting/configuration Interface in a game, an Interface for presenting a game scenario), and may be an intelligent device capable of performing control operations on a virtual character, where the terminal device may include, but is not limited to, any one of the following devices: smart phones, tablet computers, portable computers, desktop computers, gaming machines, personal Digital Assistants (PDAs), electronic book readers, MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compression standard audio layer 4) players, and the like. The terminal device has installed and running therein an application program supporting a game scene, such as an application program supporting a three-dimensional game scene. The application may include, but is not limited to, any of a virtual reality application, a three-dimensional map application, a military simulation application, a MOBA game (Multiplayer Online Battle Arena, multiplayer online tactical athletic game), a multiplayer gunfight survival game, a Third person shooter game (TPS, third-Personal Shooting Game). Alternatively, the application may be a stand-alone application, such as a stand-alone 3D (Three-dimensional) game program, or a network online application.

Graphical user interface:

is an interface display format in which a person communicates with a computer, allowing a user to manipulate icons, logos, or menu options on a screen using an input device such as a mouse, a keyboard, and/or a joystick, and also allowing a user to manipulate icons or menu options on a screen by performing a touch operation on a touch screen of a touch terminal to select a command, start a program, or perform some other task, etc. In a game scenario, a game scenario interface may be displayed in the graphical user interface, along with a game configuration interface.

Virtual scene:

is a virtual environment that an application displays (or provides) when running on a terminal device or server. Optionally, the virtual scene is a simulation environment for the real world, or a semi-simulated semi-fictional virtual environment, or a purely fictional virtual environment. The virtual scene may be any one of a two-dimensional virtual environment, a 2.5-dimensional virtual environment, and a three-dimensional virtual environment, and the virtual environment may be sky, land, sea, or the like. The virtual scene is a scene of a complete game logic of a user control virtual character, and optionally, the virtual scene is also used for virtual environment fight between at least two virtual objects, and virtual resources available for at least two virtual characters are arranged in the virtual scene.

Virtual roles:

refers to a virtual character in a virtual environment (e.g., a game scene) that may be a virtual character manipulated by a player, including but not limited to at least one of a virtual character, a virtual animal, a cartoon character, a non-player-manipulated virtual character (NPC), and a virtual object, such as a static object in a virtual scene, e.g., a virtual prop in a virtual scene, a virtual task, a location in a virtual environment, a terrain, a house, a bridge, vegetation, etc. Static objects are often not directly controlled by players, but can respond to the interactive behavior (e.g., attacks, teardown, etc.) of virtual characters in a scene to make corresponding performances, such as: the virtual character may be to demolish, pick up, drag, build, etc. the building. Alternatively, the virtual object may not be able to respond to the interaction of the virtual character, for example, the virtual object may be a building, a door, a window, a plant, etc. in the game scene as well, but the virtual character cannot interact with the virtual object, for example, the virtual character cannot destroy or remove the window, etc. Alternatively, when the virtual environment is a three-dimensional virtual environment, the virtual characters may be three-dimensional virtual models, each having its own shape and volume in the three-dimensional virtual environment, occupying a part of the space in the three-dimensional virtual environment. Optionally, the virtual character is a three-dimensional character constructed based on three-dimensional human skeleton technology, which implements different external figures by wearing different skins. In some implementations, the avatar may also be implemented using a 2.5-dimensional or 2-dimensional model, which is not limited by embodiments of the present application.

There may be multiple virtual characters in the virtual scene, which are virtual characters that the player manipulates (i.e., characters that the player controls through input devices, touch screens), or artificial intelligence (Artificial Intelligence, AI) set in the virtual environment combat through training. Optionally, the avatar is a avatar that plays in the game scene. Optionally, the number of virtual characters in the game scene fight is preset, or is dynamically determined according to the number of terminal devices joining the virtual fight, which is not limited in the embodiment of the present application. In one possible implementation, a user can control a virtual character to move in the virtual scene, e.g., to run, jump, crawl, etc., and can also control the virtual character to fight other virtual characters using skills, virtual props, etc., provided by the application.

In an alternative embodiment, the terminal device may be a local terminal device. Taking a game as an example, the local terminal device stores a game program and is used to present a game screen. The local terminal device is used for interacting with the player through the graphical user interface, namely, conventionally downloading and installing the game program through the electronic device and running. The manner in which the local terminal device provides the graphical user interface to the player may include a variety of ways, for example, it may be rendered for display on a display screen of the terminal device, or provided to the player by holographic projection. For example, the local terminal device may include a display screen for presenting a graphical user interface including a game scene screen and a game configuration interface, and a processor for running the game, generating the graphical user interface, and controlling the display of the graphical user interface on the display screen.

Application scenarios applicable to the application are introduced. The method and the device can be applied to the technical field of games, and in the games, a plurality of players participating in the games join in the same virtual game together.

Before entering into the virtual match, the player may select different character attributes, e.g., identity attributes, for the virtual characters in the virtual match by assigning different character attributes to determine different camps, so that the player wins the game play by performing the assigned tasks of the game at different stages of the virtual match, e.g., multiple virtual characters with a character attribute "cull" the virtual characters with B character attribute at the stages of the match to obtain the winning of the game play. Here, when entering the virtual fight, a character attribute may be randomly assigned to each virtual character participating in the virtual fight.

An implementation environment provided by one embodiment of the present application may include: the system comprises a first terminal device, a server and a second terminal device, wherein the first terminal device and the second terminal device are respectively communicated with the server so as to realize data communication. In this embodiment, the first terminal device and the second terminal device are respectively installed with an application program for executing the avatar generation method provided in the present application, and the server is a server side for executing the avatar generation method provided in the present application. The first terminal device and the second terminal device can communicate with the server respectively through the application program.

Taking a first terminal device as an example, the first terminal device establishes communication with a server by running an application. In an alternative embodiment, the server establishes the virtual combat based on the game requests of the application. The parameters of the virtual combat may be determined according to parameters in the received game request, for example, the parameters of the virtual combat may include the number of persons participating in the virtual combat, the level of roles participating in the virtual combat, and so on. When the first terminal equipment receives a response of the game server, displaying a game scene corresponding to the virtual fight through a graphical user interface of the first terminal equipment, wherein the first terminal equipment is equipment controlled by a first user, the virtual character displayed in the graphical user interface of the first terminal equipment is a player character (namely a first virtual character) controlled by the first user, and the first user inputs a character operation instruction through the graphical user interface so as to control the player character to execute corresponding operation in the game scene.

Taking a second terminal device as an example, the second terminal device establishes communication with the server by running an application. In an alternative embodiment, the server establishes the virtual combat based on the game requests of the application. The parameters of the virtual combat may be determined according to parameters in the received game request, for example, the parameters of the virtual combat may include the number of persons participating in the virtual combat, the level of roles participating in the virtual combat, and so on. And when the second terminal equipment receives the response of the server, displaying a game scene corresponding to the virtual fight through a graphical user interface of the second terminal equipment. The second terminal device is a device controlled by a second user, the virtual character displayed in the graphical user interface of the second terminal device is a player character controlled by the second user (namely, a second virtual character), and the second user inputs a character operation instruction through the graphical user interface so as to control the player character to execute corresponding operation in the virtual scene.

The server calculates data according to game data reported by the first terminal equipment and the second terminal equipment, and synchronizes the calculated game data to the first terminal equipment and the second terminal equipment, so that the first terminal equipment and the second terminal equipment control the graphical user interface to render corresponding game scenes and/or virtual roles according to the synchronous data issued by the game server.

In this embodiment, the first virtual character controlled by the first terminal device and the second virtual character controlled by the second terminal device are virtual characters in the same virtual combat. The first virtual role controlled by the first terminal device and the second virtual role controlled by the second terminal device may have the same role attribute, or may have different role attributes, and the first virtual role controlled by the first terminal device and the second virtual role controlled by the second terminal device may belong to the same camping or may belong to different camps.

In the virtual fight, two or more virtual roles may be included, and different virtual roles may correspond to different terminal devices, that is, in the virtual fight, there are two or more terminal devices that perform transmission and synchronization of game data with the game server, respectively.

The avatar generation method provided by the embodiment of the application can be applied to any one of a virtual reality application program, a three-dimensional map program, a military simulation program, a multi-person online tactical competition game (MOBA), a multi-person gunfight survival game, a third-person combat game and a first-person combat game.

The avatar generation method in one of the embodiments of the present application may be run on a local terminal device or a server. When the method is run on a server, the method can be implemented and executed based on a cloud interaction system, wherein the cloud interaction system comprises the server and the client device.

In an alternative embodiment, various cloud applications may be run under the cloud interaction system, for example: and (5) cloud game. Taking cloud game as an example, cloud game refers to a game mode based on cloud computing. In the cloud game operation mode, the game program operation main body and the game picture presentation main body are separated, the storage and operation of the information presentation method are completed on the cloud game server, the client device is used for receiving and sending data and presenting game pictures, for example, the client device can be a display device with a data transmission function close to a user side, such as a mobile terminal, a television, a computer, a palm computer and the like; but the cloud game server which performs information processing is a cloud. When playing the game, the player operates the client device to send an operation instruction to the cloud game server, the cloud game server runs the game according to the operation instruction, codes and compresses data such as game pictures and the like, returns the data to the client device through a network, and finally decodes the data through the client device and outputs the game pictures.

In an alternative embodiment, taking a game as an example, the local terminal device stores a game program and is used to present a game screen. The local terminal device is used for interacting with the player through the graphical user interface, namely, conventionally downloading and installing the game program through the electronic device and running. The manner in which the local terminal device provides the graphical user interface to the player may include a variety of ways, for example, it may be rendered for display on a display screen of the terminal, or provided to the player by holographic projection. For example, the local terminal device may include a display screen for presenting a graphical user interface including game visuals, and a processor for running the game, generating the graphical user interface, and controlling the display of the graphical user interface on the display screen.

In a possible implementation manner, the embodiment of the present invention provides a method for generating an avatar, and a graphical user interface is provided through a terminal device, where the terminal device may be the aforementioned local terminal device or the aforementioned client device in the cloud interaction system.

In order to facilitate understanding of the present application, detailed descriptions of the method, the apparatus, the electronic device, and the storage medium for generating an avatar are provided in the embodiments of the present application.

Referring to fig. 1, a flowchart of an avatar generation method according to an exemplary embodiment of the present application is generally applied to a game server, for example, the cloud game server described above, but the present application is not limited thereto.

In the avatar creation process of the exemplary embodiments of the present application, at least one round of interactions may be included, and each round of interactions may include, but is not limited to: and acquiring the description text of the round of interaction for creating the avatar, and then displaying the result of the round of interaction, such as displaying the reply text and the avatar generated by the description text of the round of interaction. The user can select to end the interaction based on the virtual image and the reply text displayed by the interaction of the round, or execute the input of the description text of the next round of interaction, and then display the virtual image and the reply text aiming at the description text input in the next round, so that the process is repeated until the interaction is ended, and the finally created virtual image is formed. Through the interactive virtual image creation scheme, the user is allowed to complete the creation and modification of the roles in a multi-round natural language dialogue mode, and better role image creation experience is provided for the user.

The following describes a one-round interaction flow among a plurality of rounds of interactions for creating an avatar with reference to fig. 1, concretely as follows:

step S101: descriptive text for creating an avatar is acquired.

In the embodiment of the present application, the above description text may be obtained through various approaches, which is not limited in this application.

In a preferred example of the present application, the descriptive text may be acquired through input performed on the avatar editing interface.

For example, a graphical user interface may be provided by the terminal device on which an avatar editing interface of the game is presented, which may be used to create and/or adjust the avatar.

For example, the avatar edit interface may include a text input area, and at this time, descriptive text for creating an avatar is acquired based on input information of a user in the text input area.

It should be understood that the above description text may be obtained through various input means at the avatar editing interface, which is not limited in this application. For example, voice data input by the user may also be collected based on a voice input control on the avatar editing interface, and text content corresponding to the voice data may be obtained by recognizing the collected voice data, and the text content may be determined as descriptive text.

Besides the above description text obtaining manner, the description text may be received from other terminal devices, or may be obtained from a network.

In the embodiment of the present application, the description text may be text for describing the overall or partial characteristics of the avatar, or may be a fuzzy or precise description for the avatar. For example, the above description text may be a short text for describing the appearance characteristics of the avatar, for example, a natural sentence, to reduce the input burden of the user.

Here, the avatar may refer to a shape presented in the virtual scene. The avatar may be a three-dimensional model or a planar image in terms of its structure. The virtual image may be an virtual image formed by simulating a character image, an virtual image formed by simulating an animal image, or an virtual image formed based on an image in a cartoon or a cartoon.

Illustratively, the avatar is generally an avatar, which may be a person, an animal, or the like. The created avatar may be an overall avatar of the avatar or may be a partial avatar of the avatar, for example, the avatar may refer to a facial image of the avatar.

Step S102: based on the acquired descriptive text, an editing intention for the avatar is tracked, and a reply text is generated.

Here, the corresponding editing intention and reply text may be determined by language recognition of the descriptive text. For example, a large language model (LLM, large Language Model) may be trained in advance, and descriptive text is input as LLM to obtain editing intent and reply text corresponding to the descriptive text.

Alternatively, another large language model may be trained in advance, the descriptive text is input as the large language model to obtain the editing intention corresponding to the descriptive text, and then the corresponding reply text is generated based on the determined editing intention.

The reply text is associated with the identified editing intention, or the intention understanding of the avatar generation algorithm for the descriptive text can be reflected through the reply text.

In a preferred embodiment of the present application, a large language model is introduced into the avatar creation system, and a memory mechanism is designed for it, i.e., a large language model with a memory mechanism is formed, which can track the editing process of the user, thereby realizing more accurate understanding and parsing of the complex dialog process.

The process of intent understanding based on a large language model with memory mechanism is described below in conjunction with FIG. 2.

Fig. 2 shows a flowchart of the language identification steps for descriptive text provided in an exemplary embodiment of the present application.

As shown in fig. 2, in step S201, interactive edit memories for creating an avatar are extracted.

Here, the above-described interactive edit memory is used to indicate an interactive history for the creation process of the avatar, i.e., an edit history of the avatar in a history interactive round before the present round of interaction. For the case of the first round of interaction, the interactive editing memory extracted in step S201 is empty at this time, in which case, the editing intention and the reply text are determined based on only the acquired descriptive text.

In a first embodiment, the interactive edit memory may include a history dialogue.

By way of example, a history dialog includes at least one dialog group, where a round of interactions may correspond to generating one dialog group, and each dialog group may include, but is not limited to, descriptive text and its corresponding reply text in a round of interactions.

In a second embodiment, the interactive editing memory may include editing attributes and associated editing strengths corresponding to the editing attributes.

Illustratively, the editing attribute includes a preset character figure parameter in an editing state among a plurality of character figure parameters, where a plurality of character figure parameters for characterizing an external figure thereof may be predefined for the avatar, and may include, for example, but not limited to, character figure parameters for characterizing types of face, body type, clothing, hair color, hairstyle, skin color, etc., and may include a plurality of character figure parameters under any of the above type parameters, and may include, for example, color figure parameters such as a face, a long face, etc.

Here, in the edit status may refer to character avatar parameters edited in the course of creating an avatar, for example, character avatar parameters involved in a history interaction round before the present round of interactions, i.e., character avatar parameters associated with editing intention in the history interaction round. Here, the identified editing intent in each round of interactions is used to characterize the editing object of that round of interactions, which may refer to character image parameters that need to be adjusted in the present round of interactions.

For example, the relevant editing strength is used to characterize the degree of association between the preset character image parameter and the descriptive text, or the relevant editing strength is used to characterize the degree of association between the preset character image parameter and the identified editing intent.

In a third embodiment, the interactive editing memory may include a history dialogue, editing attributes, and associated editing strengths corresponding to the editing attributes.

In step S202, editing intent and reply text for the avatar are obtained based on the description text and the interactive editing memory.

For example, interactive editorial memory and descriptive text for creating an avatar may be input into a pre-trained large language model to obtain editorial intent and reply text.

For the first embodiment of interactive edit memory described above, historical dialogs may be extracted from the dialog memory structure.

For example, a dialogue group in a history interaction round before the present round of interaction is stored in the dialogue memory structure, the initial state of the dialogue memory structure is empty, and the content stored in the dialogue memory structure is updated as the interaction round progresses.

Specifically, the dialog memory structure may be updated by: each time a round of interaction is finished, the descriptive text corresponding to the round of interaction and the corresponding reply text are added into the dialogue memory structure for storage.

In a preferred embodiment of the present application, the editing intention for the avatar and the reply text may be obtained together with a related editing strength, where the obtained related editing strength is used to characterize a degree of association between a target character avatar parameter determined according to the editing intention identified in the present round of interaction, that is, a character avatar parameter associated with the editing intention of the present round of interaction, and the descriptive text.

For example, interactive editorial memory and descriptive text for creating an avatar may be entered into a pre-trained large language model to obtain edit intent, associated edit strength, and reply text.

Here, the training of the large language model in the above embodiments may be performed in various manners, and this part of the disclosure will not be repeated herein.

For the second embodiment of interactive edit memorization described above, the edit property and the associated edit strength corresponding to the edit property may be extracted from the edit status memory structure.

For example, the editing attribute and the corresponding relevant editing intensity related to the historical interaction round before the current round of interaction are stored in the editing state memory structure, that is, the character image parameter edited in the historical interaction round and the corresponding relevant editing intensity are stored in the editing state memory structure, the initial state of the editing state memory structure is empty, and the content stored in the editing state memory structure is updated along with the progress of the interaction round.

Specifically, the edit status memory structure can be updated by: each time a round of interaction ends, it is searched for whether there is an edit intention corresponding to the round of interaction in the edit status memory structure, i.e., whether there is an edit attribute corresponding to the edit intention of the round of interaction. If so, updating the relevant editing strength corresponding to the searched editing intention, wherein the updating can be exemplified by increasing the value of the relevant editing strength so as to show that the association degree becomes high, and if not, adding the editing intention corresponding to the round of interaction and the relevant editing strength into an editing state memory structure.

Returning to fig. 1, step S103: the reply text and the avatar generated according to the editing intention are presented.

Here, the reply text is used to indicate an edit response of the generated avatar with respect to the description text, and the edit response is used to indicate, as an example, a description of creation or adjustment made with respect to the avatar in the present round of interaction. In other words, the creation or adjustment of the avatar in this round of interaction is consistent with what is described in the reply text, so that the user desiring to create the avatar can learn the creation and adjustment process of the algorithm in time. In addition, the method can also help to guide the user to input the description text in the next round of interaction to a certain extent, so that the processing efficiency of the avatar is improved.

For the case that the avatar editing interface is displayed through the terminal device, the avatar editing interface may further include an avatar display area and a system response area in addition to the text input area.

For example, an avatar generated according to the editing intention may be presented in the avatar presentation area, and the reply text may be presented in the system response area. In a preferred example, when the description text is input in the first round of interaction, only a text input area may be provided on the avatar editing interface for receiving the description text, and the avatar presentation area and the system response area are not displayed on the avatar editing interface, or the avatar presentation area and the system response area may be displayed while the text input area is provided in the first round of interaction, but the contents are not displayed in the two areas.

Here, description text, avatar, and reply text related to the present round of interaction may be presented in the text input area, avatar presentation area, and system response area. Preferably, a history input viewing control, a history image viewing control and a history editing viewing control may be further displayed on the avatar editing interface, and the history input viewing control may be displayed at a position associated with the text input area, and in response to an operation for the history input viewing control, descriptive text acquired in a history round may be displayed in the text input area, and descriptive text acquired in the current round of interaction and the history round may be displayed simultaneously in the text input area.

Accordingly, the historical image viewing control can be displayed at a position associated with the image display area, the historical editing viewing control can be displayed at a position associated with the system response area, and operations for the historical image viewing control and the historical editing viewing control are similar to those for the historical input viewing control, so that the content of the part is not repeated.

The multi-round interactive process of creating an avatar is described below in connection with fig. 3.

FIG. 3 illustrates a flow chart of multi-round interactions provided by an exemplary embodiment of the present application.

As shown in fig. 3, in step S301, a description text for creating an avatar is acquired.

The multi-round interaction process is illustrated below by way of a non-limiting example. The description text of The current round of interaction input by The user at The avatar editing interface is "The girl doesn't notes current end high.

In step S302, editing intent, related editing strength, and reply text are obtained based on the descriptive text and the interactive editing memory.

For the case that the present round of interaction is not the first round of interaction, for example, the following may be stored in the dialogue memory structure:

‘user’：Make the skin fair.

‘system’：……

……

the above list is a history dialogue stored in a dialogue memory structure, wherein 'user' refers to descriptive text in the corresponding interaction round and 'system' refers to reply text in the corresponding interaction round.

Illustratively, the edit status memory structure may have stored therein:

‘round face’：0.5

‘fair skin’：0.5

……

the above listed are editing attributes stored in the editing state memory structure and corresponding relevant editing intensities, wherein 'round face' and 'fair skin' refer to the editing attributes "round face", "fair skin", respectively, and the values corresponding to the editing attributes are used for representing the relevant editing intensities corresponding to the editing attributes. Illustratively, the larger the value, the higher the degree of characterization association, and the smaller the value, the lower the degree of characterization association.

With the above example in mind, historical dialogs, edit properties in edit state, and associated edit strengths may be extracted from the dialog memory structure and edit status memory structure, populated to a pre-designed promt along with user entered descriptive text. Illustratively, the promt may include the following:

‘user input’：

‘histery’：

‘state’：

the text of the description is correspondingly filled in 'user input', the history dialogue is correspondingly filled in 'histery', and the editing attribute and the related editing strength in the editing state are correspondingly filled in 'state'.

Inputting the filled Prompt into a pre-trained large language model, and exemplarily, obtaining an analyzed instruction in json form, wherein the method comprises the following steps of:

Target：Big Eyes

Strength：0.5

Response：“I increased the size of the eyes a bit.Dose she looks better？”

wherein, the content of Target corresponds to the parsed editing intention T _k That is, big Eyes characterize the target character avatar parameters that need to be edited for the current round of interactions inferred for descriptive text and interaction edit memory. Here, the editing intention may be a text instruction for editing for an avatar to follow, for example, for instructing an editing operation performed for an avatar obtained by a previous round of interaction.

The content of Strength characterizes the Strength of association between the inferred edit intent of the current round of interaction and the descriptive text, corresponding to the resolved relative edit Strength s, i.e., 0.5.

Response content corresponds to reply text R presented to the user by the avatar creation system _k I.e., "I increased the size of the eyes a bit. Dose she logs beer? ".

After the present round of interaction is finished, the text y can be according to the description _k And reply text R _k Updating dialogue memory structure according to editing intention T _k And updating the edit status memory structure with the associated edit strength s. Here, the subscript k indicates that the current is the kth round of interaction, where 1+.k+.m, where M is a positive integer, and may be, for example, a preset maximum number of rounds of interaction.

In step S303, an initial hidden variable is acquired.

Here, the optimized hidden variable may be obtained through a plurality of iterations of the initial hidden variable, and thus the character avatar parameters for generating the avatar may be obtained.

In a preferred embodiment of the present application, the initial hidden variable is used to characterize the projection of character image parameters in a low-dimensional space.

Here, the initial hidden variable z _k For the character image parameter after the optimization corresponding to the last iterationProjection in a low dimensional space. For the case of non-first iteration, the last iteration refers to the iteration of the previous round of the current iteration, and for the case of non-first round of interaction, but first iteration, the last iteration refers to the last iteration of the previous round of the current interaction round.

For the first iteration of the first-round interaction, the initial hidden variable is the projection of the character avatar parameter corresponding to the reference avatar in the low-dimensional space, and the reference avatar includes an avatar matched with the description text of the first-round interaction, for example, an avatar with the highest matching degree with the description text of the first-round interaction can be selected from a plurality of candidate avatars created in advance to be determined as the reference avatar.

For the case where the avatar is a facial image of a avatar, the above-mentioned character avatar parameters may include a pinching face parameter, which may include parameters for characterizing head features of the avatar, and may include, by way of example and not limitation, at least one of the following: parameters characterizing facial features (eye shape/size, nose shape/size, lip shape/size), parameters characterizing head shape, hairstyle, hair color features.

In step S304, initial avatar parameters are obtained by decoding the initial hidden variables.

Here, the initial hidden variable may be decoded back to the original space (i.e., character avatar parameter space) through a back-projection matrix to obtain an initial avatar parameter. The above-mentioned projection and back-projection processes between the low-dimensional space and the original space are common knowledge in the art, and the contents of this section will not be repeated in this application.

In an alternative embodiment, the initial avatar parameters obtained after the above decoding may be directly used as candidate avatar parameters to perform the subsequent step S307. In addition, a relevance vector may be introduced to determine candidate avatar parameters based on the initial avatar parameters and the relevance vector.

Specifically, in step S305, a correlation vector is obtained.

For example, the relevance of a plurality of character persona parameters to the edit intent determined by the present round of interaction may be predicted to obtain a relevance vector, where each element in the relevance vector is used to characterize the strength of the correlation between the corresponding character persona parameter and the edit intent.

Exemplary, the parsed edit intention T may be determined using a pre-trained correlation prediction model _k Inputting into a relevance prediction model to predict the relevance between each character image parameter and the editing intention to form a relevance vector r _k 。

The training process for the correlation prediction model is described below with reference to fig. 4.

FIG. 4 shows a flowchart of the steps provided by exemplary embodiments of the present application for training a correlation prediction model.

As shown in fig. 4, in step S401, a first training sample is acquired.

For example, the first training sample may be obtained by writing a plurality of description text samples for the avatar by a large language model according to an example, and obtaining text relevance labels corresponding to the respective description text samples by a large language model rough labeling and artificial fine labeling to form the first training sample from the plurality of description text samples and the corresponding text relevance labels.

In step S402, descriptive text samples of the current iteration are input to a relevance prediction model, obtaining a predictive relevance vector.

In step S403, a loss function is calculated.

For example, a penalty function between the predicted relevance vector and the text relevance label corresponding to the descriptive text sample may be calculated. Here, the loss function may be calculated in various ways, which the present application is not limited to.

In step S404, network parameters are optimized.

For example, the network parameters in the correlation prediction model may be updated by gradient descent methods using a loss function.

In step S405, it is determined whether the current iteration reaches the maximum number of iterations.

If the maximum number of iterations is not reached, the number of iterations is increased by 1, and the step S402 is executed again to continue training the correlation prediction model.

If the maximum number of iterations is reached, step S406 is executed: and saving the model parameters. At this time, a trained correlation prediction model, that is, a correlation prediction model constructed based on the above-described optimized network parameters is obtained.

Returning to fig. 3, in step S306, candidate character parameters are obtained from the initial character parameters and the correlation vector.

For example, by applying a correlation vector r _k And initial character parameters (e.g.,) Weighting, exemplary, may relate to a relevance vector r _k One element of the hidden variable is multiplied by a corresponding one of the initial image parameters to obtain an initial hidden variable z _k Corresponding candidate image parameter x _k 。

In the embodiment of the application, the relevance vector r is introduced _k Fine adjustment of the avatar can be achieved, and the irrelevant area is prevented from being changed. For example, in the above process, according to the text instruction of the editing intention, the relevance of the character image parameters is predicted through a relevance prediction model, for example, the relevant five-sense organ region and the makeup attribute are predicted, and then editing is performed on the predicted relevant region and attribute.

In the embodiment of the application, the intention T is edited _k Related edit intensity s, related vector r _k Input into the FR-T2P model to obtain character image parameters after iterative optimization (i.e. after editing)In the process, the hidden variable, namely the projection of the character image parameter in the low-dimensional space, is iterated continuously, so that the coding distance between the rendering image corresponding to the hidden variable and the text instruction of the editing intention is finally as close as possible.

Specifically, in step S307, a rendered character image of the game engine is obtained based on the candidate character parameters.

For example, the candidate avatar parameter x may be _k Inputting a pre-trained neural renderer network to obtain corresponding rendered avatar images, which may refer to images of the avatar rendered by the game engine.

By way of example, the neural renderer network may be a micro-renderer, e.g., trained on pre-prepared (character avatar parameters and rendered avatar images of the game engine) data to mimic the rendering process of the game engine, thereby enabling the rendering process to be micro-.

In the process of generating character image parameters based on editing intention (namely, text instruction), a neural renderer network with a makeup is pre-trained, so that the character image parameters are generated in a completely random gradient descending mode. The process of training the micro-renderers is described below with reference to fig. 5.

Fig. 5 shows a flowchart of the steps provided by an exemplary embodiment of the present application for training a micro-renderer.

As shown in fig. 5, in step S501, a second training sample is acquired.

For example, the second training sample may include randomly sampled sample avatar parameters and avatar images rendered by the game engine, with the randomly sampled sample avatar parameters as input and the avatar images rendered by the game engine as output, to train the micro-renderers.

For the case where the avatar refers to a face image of a virtual character, continuous face parameters and discrete makeup parameters may be randomly sampled with the face image of the corresponding game virtual character rendered by the game engine as training data.

In step S502, sample character parameters are input to the micro-renderers to obtain predicted rendered images of the game engine.

For the example of pinching the face described above, continuous face parameters may be input to the micro-renderer, rendering a predicted face image of the character.

In step S503, a loss function is calculated.

For example, a loss function between a predictive rendered image that is rendered by the micro-renderer and a rendered avatar image of the game engine may be calculated. Here, the loss function may be calculated in various ways, which the present application is not limited to.

In step S504, network parameters are optimized.

For example, the network parameters in the micro-renderers may be updated by gradient descent methods using a loss function.

In step S505, it is determined whether the current iteration reaches the maximum number of iterations.

If the maximum iteration number is not reached, the iteration number is increased by 1, and the step S502 is executed again to continue training the micro-renderer.

If the maximum number of iterations is reached, step S506 is executed: and saving the model parameters. At this time, a trained micro-renderer, i.e., a micro-renderer constructed based on the above-described optimized network parameters, is obtained.

In the embodiment of the application, the deviation between the reduced editing intention and the rendered figure image is taken as an optimization target, and the candidate figure parameters are optimized through multiple iterations, so that the optimized character figure parameters are obtained.

Specifically returning to fig. 3, in step S308, the rendered avatar image is input to CLIP to obtain an image code.

For example, a cross-modality encoding model CLIP (Contrastive Language-Image Pre-tracking) may be Pre-trained to determine Image encoding based on rendered persona images of the game engine to which candidate persona parameters of a current iteration correspond.

In step S309, text encoding is obtained based on the editing intention.

Exemplary, the editing intention T may be _k (i.e., text instructions) input a pre-trained CLIP, resulting in a text code.

In step S310, an intended understanding loss is obtained from the determined image encoding and text encoding.

For example, a cosine distance between the image encoding and the text encoding may be calculated and determined as the Loss of intent to understand CLIP Loss.

In the present embodiment, the initial hidden variable z will be optimized based on the gradient descent method _k The candidate image parameters are optimized with the minimization of the cosine distance described above, i.e., with the intent understanding loss as the optimization objective.

In a preferred embodiment of the present application, the related editing strength s obtained based on the description text also implements control of editing strength by influencing CLIP loss weight in the optimization process. For example, the product of the relevant edit strength s and the intended understanding Loss CLIP Loss may be calculated to optimize the candidate avatar parameters with the intended understanding Loss under the influence of the relevant edit strength (i.e., the product described above) as an optimization objective.

In addition to the above-mentioned most predominant optimization objective (CLIP Loss) in the gradient descent method, in a preferred embodiment of the present application, the initial hidden variable z is also _k A priori regularization is applied.

Specifically, in step S311, a priori losses are obtained.

Illustratively, in the process of generating the character image parameters from the text to the character image parameters in a single round, prior distribution is introduced and a complete gradient descent framework is used, so that more robust and faster character image parameter generation is realized.

Fig. 6 shows a flowchart of the steps for determining a priori loss provided by an exemplary embodiment of the present application.

As shown in fig. 6, in step S601, a plurality of groups of samples are acquired from the avatar sample set.

Here, a plurality of sets of samples may be acquired from the avatar image dataset of the disclosed avatar, each set of samples including a plurality of avatar image parameters.

In step S602, a priori distribution statistics corresponding to the plurality of sets of samples are counted.

Exemplary, a priori distribution statistics of character image parameters for multiple sets of samples may include, but are not limited to, a mean μ _Z Covariance A _Z . The method of calculating the mean and covariance is common general knowledge in the art, and the present application is not limited thereto.

In step S603, a priori losses are determined from the a priori distribution statistics.

Illustratively, the a priori loss can be calculated by the following formula:

priorLoss＝||A _Z ·(z-μ _Z )|| ²

where priorLoss represents a priori loss, z represents a character image parameter of the current iteration, and illustratively, z may also represent an initial hidden variable of the current iteration.

In a preferred embodiment, a priori constraints may be applied during the optimization process, with the intent to understand loss and a priori loss as optimization objectives, to optimize the candidate avatar parameters.

In the process, prior distribution is introduced, so that the parameter optimization space is limited in the prior experimental distribution space, and the prior distribution constraint is additionally applied, thereby avoiding the generation of a result which does not accord with the real face distribution.

Returning to fig. 3, in step S312, it is determined whether the optimization condition is satisfied, that is, it is determined whether the optimization process is ended.

By way of example, it may be determined whether the optimization process is finished by determining whether a maximum number of iterations is reached.

If the optimization condition is not satisfied (e.g., the maximum number of iterations is not reached), determining an hidden variable adjustment value, updating the initial hidden variable based on the hidden variable adjustment value, and returning to step S303 to perform iteration again based on the updated initial hidden variable.

Illustratively, the hidden variable adjustment value may be obtained by: the intent understanding loss and the prior loss are input to an optimizer to determine, by the optimizer, a hidden variable adjustment value that characterizes the degree and direction of adjustment for at least one character persona parameter, which may be expressed in + -DeltaZ.

If the optimization condition is satisfied (e.g., the maximum number of iterations is reached), step S313 is performed: and determining character image parameters.

For example, with the above-described optimization objective, the initial hidden variable z is iteratively optimized continuously in a gradient-decreasing manner _k Converging to obtain optimal hidden variableThen back projection is carried out to obtain the character image parameter +.>

In step S314, the avatar and the reply text are presented.

For example, character image parameters determined according to the aboveAn avatar corresponding to the editing intention is generated. For example, the determined avatar and the reply text may be presented simultaneously at an avatar editing interface in which the description text is input.

In step S315, it is determined whether the session is ended.

Here, the session end condition may be set in advance (e.g., the maximum interaction round is reached), the session end is determined when the set session end condition is detected to be satisfied, and the session is determined not to be ended when the set session end condition is detected not to be satisfied.

In addition to the above manner, a control button for triggering the end of the dialogue may be provided on the avatar editing interface, the operation for the control button is detected, the end of the dialogue is determined, and if the operation for the control button is not detected, it is determined that the dialogue is not ended.

If it is determined that the dialogue is not ended, the process returns to step S301, and the description text input by the user in the next round of interaction is continuously received.

And if the conversation is determined to be ended, determining the avatar displayed by the current round of interaction as the avatar created by the user.

Based on the multi-round interaction, the creation or adjustment of the virtual image can be possibly completed through a natural language dialogue mode without manually and repeatedly adjusting parameters or knowing the specific meaning of each parameter, fine adjustment can be realized, and the irrelevant area is prevented from being changed, so that the time cost spent by a user in the adjustment process of the virtual image can be greatly reduced.

Based on the same application conception, the embodiment of the application also provides an avatar generating device corresponding to the method provided by the embodiment, and because the principle of solving the problem of the device in the embodiment of the application is similar to that of the avatar generating method of the embodiment of the application, the implementation of the device can refer to the implementation of the method, and the repetition is omitted.

Fig. 7 is a schematic structural view of an avatar generating apparatus provided in an exemplary embodiment of the present application.

As shown in fig. 7, the avatar generating apparatus 200 includes:

An acquisition module 210 acquiring descriptive text for creating an avatar;

the recognition module 220 tracking the editing intention for the avatar based on the acquired description text and generating a reply text;

and a presentation module 230 presenting the reply text indicating an edit response of the generated avatar with respect to the description text and the avatar generated according to the edit intention.

In one possible embodiment of the present application, the identification module 220 is further configured to: extracting an interactive edit memory for creating an avatar, the interactive edit memory indicating an interactive history for a creation process of the avatar; and obtaining the editing intention for the avatar and the reply text based on the description text and the interactive editing memory.

In one possible implementation manner of the application, the creation process includes at least one round of interaction, and in each round of interaction, corresponding reply texts and virtual images are generated based on descriptive texts of the round of interaction, wherein the interaction editing memory includes a history dialogue, the history dialogue includes at least one dialogue group, each dialogue group includes descriptive texts in one round of interaction and corresponding reply texts, and/or the interaction editing memory includes editing attributes and relevant editing intensity corresponding to the editing attributes, the editing attributes include preset character image parameters in an editing state in a plurality of character image parameters, and the relevant editing intensity is used for representing association degree between the preset character image parameters and the descriptive texts.

In one possible embodiment of the present application, the identification module 220 is further configured to: extracting a history dialogue from a dialogue memory structure, wherein the recognition module 220 updates the dialogue memory structure by: whenever a round of interaction ends, the descriptive text and its corresponding reply text corresponding to the round of interaction are added to the dialog memory structure.

In one possible embodiment of the present application, the presentation module 230 generates an avatar according to the editing intention by: obtaining a rendered character image of the game engine based on the candidate character parameters; taking deviation between the reduced editing intention and the rendered image as an optimization target, and optimizing the candidate image parameters through multiple iterations to obtain optimized character image parameters; and generating an avatar corresponding to the editing intention according to the character avatar parameters.

In one possible implementation of the present application, the presentation module 230 obtains candidate image parameters by: acquiring an initial hidden variable of the current iteration, wherein the initial hidden variable represents projection of character image parameters in a low-dimensional space; decoding the initial hidden variable to obtain candidate image parameters of the current iteration; the initial hidden variable is the projection of the character image parameter corresponding to the last iteration and optimized in the low-dimensional space, and the initial hidden variable is the projection of the character image parameter corresponding to the reference virtual image in the low-dimensional space when the first iteration of the first round of interaction is performed, wherein the reference virtual image comprises the virtual image matched with the description text of the first round of interaction.

In one possible implementation of the present application, the presentation module 230 obtains candidate image parameters by: predicting the correlation of a plurality of character image parameters and the editing intention to obtain a correlation vector, wherein each element in the correlation vector is used for representing the correlation strength between the corresponding character image parameter and the editing intention; decoding an initial hidden variable to obtain an initial image parameter, wherein the initial hidden variable represents projection of the character image parameter in a low-dimensional space; obtaining candidate image parameters according to the initial image parameters and the correlation vector; the initial hidden variable is the projection of the character image parameter corresponding to the last iteration and optimized in the low-dimensional space, and the initial hidden variable is the projection of the character image parameter corresponding to the reference virtual image in the low-dimensional space when the first iteration of the first round of interaction is performed, wherein the reference virtual image comprises the virtual image matched with the description text of the first round of interaction.

In one possible implementation of the present application, the presentation module 230 optimizes the candidate image parameters by: determining an image code based on the rendered image of the game engine corresponding to the candidate image parameter of the current iteration; determining a text encoding based on the editing intent; obtaining an intended understanding loss according to the determined image code and text code; and optimizing the candidate image parameters by taking the intent understanding loss as an optimization target.

In one possible embodiment of the present application, the display module 230 is further configured to: obtaining a plurality of groups of samples from the avatar sample set, each group of samples including a plurality of character image parameters; counting prior distribution statistic values corresponding to the plurality of groups of samples; determining a priori loss according to the priori distribution statistic value; wherein the multiple iterations optimize the candidate image parameters with the intent understanding loss and the prior loss as optimization objectives.

In one possible embodiment of the present application, the identification module 220 is further configured to: obtaining an editing intention for the avatar based on the descriptive text, and simultaneously obtaining a related editing strength, wherein the related editing strength is used for representing the association degree between a target character image parameter and the descriptive text, and the target character image parameter is determined according to the editing intention; and optimizing the candidate image parameters by taking the intention understanding loss under the influence of the related editing intensity as an optimization target.

In one possible implementation of the present application, the recognition module 220 obtains the editing intent, the associated editing strength, and the reply text by: the interactive edit memory for creating the avatar and the descriptive text are input to a pre-trained large language model to obtain the edit intent, associated edit strength, and the reply text.

In one possible implementation of the present application, the interactive edit memory includes edit attributes and associated edit intensities corresponding to the edit attributes, wherein the identification module 220 extracts the interactive edit memory for creating the avatar by: extracting an edit property and a related edit strength corresponding to the edit property from an edit status memory structure, wherein the identification module 220 updates the edit status memory structure by: each time one round of interaction is finished, searching whether the editing intention corresponding to the round of interaction exists in the editing state memory structure, if so, updating the relevant editing strength corresponding to the searched editing intention, and if not, adding the editing intention corresponding to the round of interaction and the relevant editing strength into the editing state memory structure.

In one possible embodiment of the present application, the obtaining module 210 is further configured to: an avatar editing interface for displaying a game, the avatar editing interface including a text input area, an avatar display area, and a system response area, the descriptive text being obtained based on input information of a user in the text input area, and/or the display module 230 being further configured to: and presenting the virtual image generated according to the editing intention in the image presentation area, and presenting the reply text in the system response area.

In one possible embodiment of the present application, the avatar includes a facial image of the avatar, and the character avatar parameters include a pinching face parameter.

Based on the above device, an avatar can be generated through interface interactions and timely respond to the interactions.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application. As shown in fig. 8, the electronic device 300 includes a processor 310, a memory 320, and a bus 330.

The memory 320 stores machine-readable instructions executable by the processor 310, wherein when the electronic device 300 is running, the processor 310 communicates with the memory 320 through the bus 330, and when the machine-readable instructions are executed by the processor 310, the steps of the avatar generation method in any of the above embodiments may be executed, as follows:

acquiring a description text for creating an avatar; tracking an editing intention for the avatar based on the acquired descriptive text, and generating a reply text; and displaying the reply text and the avatar generated according to the editing intention, wherein the reply text is used for indicating the editing response of the generated avatar to the descriptive text.

Based on the electronic device, the avatar can be generated through interface interaction and timely respond to the interaction.

The embodiment of the present application also provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor may perform the steps of the avatar generation method in any one of the above embodiments, specifically as follows:

Based on the above computer-readable storage medium, an avatar can be generated through interface interactions and respond to the interactions in time.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solutions of the present application may be embodied in essence or a part contributing to the prior art or a part of the technical solutions, or in the form of a software product, which is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of avatar generation, the method comprising:

acquiring a description text for creating an avatar;

tracking an editing intention for the avatar based on the acquired descriptive text, and generating a reply text;

and displaying the reply text and the avatar generated according to the editing intention, wherein the reply text is used for indicating the editing response of the generated avatar to the descriptive text.

2. The method of claim 1, wherein the steps of tracking the editing intention for the avatar based on the acquired descriptive text, and generating the reply text include:

extracting an interactive edit memory for creating an avatar, the interactive edit memory indicating an interactive history for a creation process of the avatar;

And obtaining the editing intention for the avatar and the reply text based on the description text and the interactive editing memory.

3. The method of claim 2, wherein the creation process includes at least one round of interactions, during each round of interactions, generating corresponding reply text and avatar based on descriptive text for the round of interactions,

wherein the interactive editorial memory comprises a history dialogue including at least one dialogue group, each dialogue group including descriptive text and its corresponding reply text in a round of interaction,

and/or the interactive editing memory comprises editing attributes and relevant editing intensity corresponding to the editing attributes, wherein the editing attributes comprise preset character image parameters in an editing state in a plurality of character image parameters, and the relevant editing intensity is used for representing the association degree between the preset character image parameters and the descriptive text.

4. The method of claim 3, wherein the extracting the interactive edit memory for creating the avatar comprises:

a history dialogue is extracted from the dialogue memory structure,

wherein the dialog memory structure is updated by: whenever a round of interaction ends, the descriptive text and its corresponding reply text corresponding to the round of interaction are added to the dialog memory structure.

5. The method of claim 1, wherein the avatar is generated according to the editing intention by:

obtaining a rendered character image of the game engine based on the candidate character parameters;

taking deviation between the reduced editing intention and the rendered image as an optimization target, and optimizing the candidate image parameters through multiple iterations to obtain optimized character image parameters;

and generating an avatar corresponding to the editing intention according to the character avatar parameters.

6. The method of claim 5, wherein the candidate image parameters are obtained by:

acquiring an initial hidden variable of the current iteration, wherein the initial hidden variable represents projection of character image parameters in a low-dimensional space;

decoding the initial hidden variable to obtain candidate image parameters of the current iteration;

the initial hidden variable is the projection of the character image parameter corresponding to the last iteration and optimized in the low-dimensional space, and the initial hidden variable is the projection of the character image parameter corresponding to the reference virtual image in the low-dimensional space when the first iteration of the first round of interaction is performed, wherein the reference virtual image comprises the virtual image matched with the description text of the first round of interaction.

7. The method of claim 5, wherein the candidate image parameters are obtained by:

predicting the correlation of a plurality of character image parameters and the editing intention to obtain a correlation vector, wherein each element in the correlation vector is used for representing the correlation strength between the corresponding character image parameter and the editing intention;

decoding an initial hidden variable to obtain an initial image parameter, wherein the initial hidden variable represents projection of the character image parameter in a low-dimensional space;

obtaining candidate image parameters according to the initial image parameters and the correlation vector;

8. The method according to any one of claims 5-7, wherein the candidate image parameters are optimized by:

determining an image code based on the rendered image of the game engine corresponding to the candidate image parameter of the current iteration;

Determining a text encoding based on the editing intent;

obtaining an intended understanding loss according to the determined image code and text code;

and optimizing the candidate image parameters by taking the intent understanding loss as an optimization target.

9. The method of claim 8, wherein the method further comprises:

obtaining a plurality of groups of samples from the avatar sample set, each group of samples including a plurality of character image parameters;

counting prior distribution statistic values corresponding to the plurality of groups of samples;

determining a priori loss according to the priori distribution statistic value;

wherein the multiple iterations optimize the candidate image parameters with the intent understanding loss and the prior loss as optimization objectives.

10. The method of claim 8, wherein the method further comprises:

obtaining an editing intention for the avatar based on the descriptive text, and simultaneously obtaining a related editing strength, wherein the related editing strength is used for representing the association degree between a target character image parameter and the descriptive text, and the target character image parameter is determined according to the editing intention;

and optimizing the candidate image parameters by taking the intention understanding loss under the influence of the related editing intensity as an optimization target.

11. The method of claim 10, wherein the editing intent, associated editing strength, and reply text are obtained by:

the interactive edit memory for creating the avatar and the descriptive text are input to a pre-trained large language model to obtain the edit intent, associated edit strength, and the reply text.

12. The method of claim 10, wherein the interactive edit memory comprises edit attributes and associated edit intensities corresponding to the edit attributes,

wherein the interactive edit memory for creating the avatar is extracted by:

extracting editing attribute and relative editing strength corresponding to the editing attribute from the editing state memory structure,

wherein the edit status memory structure is updated by:

each time a round of interaction ends, searching whether an editing intention corresponding to the round of interaction exists in the editing state memory structure,

if so, updating the relevant editing strength corresponding to the searched editing intention,

if not, adding the editing intention corresponding to the round of interaction and the related editing strength into the editing state memory structure.

13. The method of claim 1, wherein the step of obtaining descriptive text for creating the avatar comprises:

an avatar edit interface for presenting a game, the avatar edit interface including a text input area, an avatar presentation area and a system response area,

based on the input information of the user in the text input area, the descriptive text is acquired,

and/or, the step of presenting the reply text and the avatar generated according to the editing intention includes:

presenting an avatar generated according to the editing intention within the avatar presentation area,

the reply text is presented within the system response area.

14. The method of claim 5, wherein the avatar comprises a facial image of a avatar, and the character avatar parameters comprise a pinching face parameter.

15. An avatar generation apparatus, the apparatus comprising:

the acquisition module is used for acquiring a description text for creating the virtual image;

the recognition module is used for tracking the editing intention aiming at the virtual image based on the acquired description text and generating a reply text;

and the display module displays the reply text and the avatar generated according to the editing intention, wherein the reply text is used for indicating the editing response of the generated avatar to the descriptive text.

16. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the method of any one of claims 1 to 14.

17. A computer-readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the steps of the method according to any of claims 1 to 14.