CN112669422A

CN112669422A - Simulated 3D digital human generation method and device, electronic equipment and storage medium

Info

Publication number: CN112669422A
Application number: CN202110020070.3A
Authority: CN
Inventors: 杨国基; 刘致远; 陈泷翔; 王鑫宇; 叶颖; 吴悦
Original assignee: Shenzhen Zhuiyi Technology Co Ltd
Current assignee: Shenzhen Zhuiyi Technology Co Ltd
Priority date: 2021-01-07
Filing date: 2021-01-07
Publication date: 2021-04-16
Anticipated expiration: 2041-01-07
Also published as: CN112669422B

Abstract

The application discloses a simulation 3D digital person generation method, a simulation 3D digital person generation device, electronic equipment and a storage medium, wherein the simulation 3D digital person generation method comprises the steps of obtaining description parameters, wherein the description parameters comprise relative position information of a target object relative to a reference position, and the target object comprises an object for calibrating the angle of a simulation digital person; determining the presentation angle of the simulation digital person to be presented according to the relative position information; acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model; and outputting the simulated digital human image. The method can be used for generating the model images, so that when the model used for training to obtain the simulated digital people has no time, the simulated digital people images at various angles can be generated through the simulated digital people model obtained by the model training, the model does not need to be specially invited to participate in image shooting at specific time and place, the environment that a user is in face-to-face communication with the simulated digital people in real time can be simulated, and the human-computer interaction experience is optimized.

Description

Simulated 3D digital human generation method and device, electronic equipment and storage medium

Technical Field

The present application relates to the technical field of virtual image construction, and more particularly, to a method and an apparatus for generating a simulated 3D digital person, an electronic device, and a storage medium.

Background

In recent years, with the continuous development and application of information technology, the number of scenes presented by simulated digital people is increasing. The traditional method for presenting the simulated digital people generally presents the simulated digital people in corresponding states aiming at different scenes by fixing a plurality of actions. At the present stage, a method for presenting through an avatar appears, and the method generally leads a simulation digital person to present more diversified actions through training of a neural network, thereby ensuring better presentation effect. Although the display effect is more realistic compared to the conventional artificial digital human presentation method, the presentation angle of the avatar is fixed in the method of presenting through the avatar in the prior art, so that the presentation effect of the picture is not realistic.

Disclosure of Invention

In view of the above problems, embodiments of the present application provide a method, an apparatus, an electronic device, and a storage medium for generating a simulated 3D digital human, so as to solve the above problems.

In a first aspect, an embodiment of the present application provides a method for generating a simulated 3D digital person, where the method for generating a simulated 3D digital person includes: obtaining description parameters, wherein the description parameters comprise relative position information of a target object relative to a reference position, and the target object comprises an object for calibrating the angle of the simulated digital person; determining the presentation angle of the simulation digital person to be presented according to the relative position information; acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model; and outputting the simulated digital human image.

Optionally, obtaining the description parameter includes: acquiring an image containing a target object, and determining spatial position information of the target object based on the image; acquiring reference position information of a reference position, wherein the reference position is used for representing the position of a reference object of the simulation digital person; and determining the relative position information of the target object relative to the reference position according to the spatial position information and the reference position information.

Optionally, determining the relative position information of the target object with respect to the reference position according to the spatial position information and the reference position information includes: acquiring a target coordinate parameter of a target object according to the spatial position information; acquiring a reference coordinate parameter of a reference object according to the reference position information; and comparing the target coordinate parameters with the reference coordinate parameters, and determining the relative distance and the relative angle between the target object and the reference object to obtain relative position information comprising the relative distance and the relative angle.

Optionally, obtaining a simulated digital human image corresponding to the presentation angle according to the presentation angle and a preset simulated digital human model, including: acquiring a plurality of target images corresponding to the presentation angle from a preset simulation digital human model according to the presentation angle; and combining the plurality of target images to obtain the simulation digital human image corresponding to the description parameters.

Optionally, before obtaining the description parameters, the method further includes: acquiring a plurality of sample images and sample description parameters corresponding to each sample image; and constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model.

Optionally, the sample description parameters comprise camera parameters; constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model, wherein the method comprises the following steps: acquiring sample image configuration parameters corresponding to the camera parameters; acquiring angle information of the target model according to the sample image, and associating the angle information with the sample image configuration parameters; and constructing a simulation digital human model according to the sample image configuration parameters and the angle information to obtain a preset simulation digital human model.

Optionally, the sample description parameters further include sample input information; constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model, wherein the method comprises the following steps: acquiring sample semantic information corresponding to the sample input information; acquiring sample facial expression parameters of the target model according to the sample image, and associating sample semantic information with the sample facial expression parameters; and constructing a simulation digital human model according to the sample semantic information and the sample facial expression parameters to obtain a preset simulation digital human model.

Optionally, the step of obtaining sample facial expression parameters of the target model according to the sample image, and associating the sample semantic information with the sample facial landmark clear parameters includes: acquiring a face area of a target model in a sample image; acquiring a face key point in a face area; and processing the key points of the face in the face area, and determining sample facial expression parameters of the target model in the sample image.

Optionally, the sample description parameters further include sample input information; constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model, wherein the method comprises the following steps: acquiring sample pronunciation information corresponding to the sample input information; acquiring sample mouth shape parameters of the target model according to the sample image, and associating the sample pronunciation information with the sample mouth shape parameters; and constructing a simulation digital human model according to the sample pronunciation information and the sample mouth shape parameters to obtain a preset simulation digital human model.

Optionally, the sample description parameters further include sample input information; constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model, wherein the method comprises the following steps: acquiring sample semantic information corresponding to the sample input information; obtaining a sample semantic category of sample semantic information; acquiring sample trunk action parameters of the target model according to the sample image, and associating the sample semantic category with the sample trunk action parameters; and constructing a simulation digital human model according to the semantic category and the trunk action parameters of the sample to obtain a preset simulation digital human model.

Optionally, obtaining a sample trunk action parameter of the target model according to the sample image, and associating the sample semantic category with the sample trunk action parameter, including: obtaining an effective area including a target model in each sample image; and performing semantic segmentation processing on the effective area, determining sample trunk action parameters of the target model in each sample image, and associating the sample semantic category with the sample trunk action parameters.

Optionally, after outputting the simulated digital human image, the method further comprises: acquiring a plurality of simulation digital human images; determining time sequence information of at least two pieces of simulation digital human image output; generating a simulation digital human video based on a plurality of simulation digital human images according to the time sequence information; configuring corresponding audio information for the simulated digital human video according to the time sequence information; and synchronously playing the video and audio information of the simulated digital person.

In a second aspect, an embodiment of the present application provides a simulation 3D digital human generation apparatus, where the simulation 3D digital human generation apparatus includes a description parameter obtaining module, a presentation angle obtaining module, a simulation digital human image obtaining module, and a first simulation digital human image output module. The description parameter acquisition module is used for acquiring description parameters, and the description parameters comprise relative position information of the target relative to a reference position. And the presentation angle acquisition module is used for determining the presentation angle of the simulated digital person to be presented according to the relative position information. And the simulation digital human image acquisition module is used for acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model. The first simulation digital human image output module is used for outputting simulation digital human images.

Optionally, the description parameter acquiring module includes an image acquiring unit, a position acquiring unit, and a relative position information acquiring unit. The image acquisition unit is used for acquiring an image containing a target object and determining the spatial position information of the target object based on the image. The position acquisition unit is used for acquiring reference position information of a reference position, and the reference position is used for representing the position of a reference object simulating the digital person. The relative position information acquisition unit is used for determining the relative position information of the target object relative to the reference position according to the spatial position information and the reference position information.

Optionally, the relative position information obtaining unit includes a target coordinate parameter subunit, a reference coordinate parameter subunit, and a reference coordinate parameter subunit. The target coordinate parameter subunit is used for acquiring a target coordinate parameter of the target object according to the spatial position information. The reference coordinate parameter subunit is used for acquiring reference coordinate parameters of the reference object according to the reference position information. The reference coordinate parameter subunit is used for comparing the target coordinate parameter with the reference coordinate parameter, and determining the relative distance and the relative angle between the target object and the reference object so as to obtain the relative position information comprising the relative distance and the relative angle.

Optionally, the simulated digital human image acquisition module includes a target image acquisition unit and a simulated digital human image acquisition unit. The target image acquisition unit is used for acquiring a plurality of target images corresponding to the presentation angles from a preset simulation digital human model according to the presentation angles. The simulation digital human image acquisition unit is used for combining the plurality of target images to acquire the simulation digital human image corresponding to the description parameters.

Optionally, the simulation 3D digital human generation apparatus further includes a sample description parameter acquisition unit and a simulation digital human model acquisition unit. The sample description parameter acquiring unit is used for acquiring a plurality of sample images and sample description parameters corresponding to each sample image. The simulation digital human model obtaining unit is used for constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model.

Optionally, the sample description parameters comprise camera parameters; the simulation digital human model obtaining unit comprises a sample image configuration parameter obtaining subunit, an image configuration parameter association subunit and a first simulation digital human model obtaining subunit. The sample image configuration parameter acquiring subunit is used for acquiring sample image configuration parameters corresponding to the camera parameters. The image configuration parameter association subunit is used for acquiring the angle information of the target model according to the sample image and associating the angle information with the sample image configuration parameters. The first simulation digital human model obtaining subunit is used for constructing a simulation digital human model according to the sample image configuration parameters and the angle information to obtain a preset simulation digital human model.

Optionally, the sample description parameters further include sample input information; the simulation digital human model obtaining unit comprises a first sample semantic information obtaining subunit, a sample facial expression parameter association subunit and a second simulation digital human model obtaining subunit. The first sample semantic information acquiring subunit is used for acquiring sample semantic information corresponding to the sample input information. And the sample facial expression parameter association subunit is used for acquiring sample facial expression parameters of the target model according to the sample image and associating the sample semantic information with the sample facial expression parameters. The second simulation digital human model obtaining subunit is used for constructing a simulation digital human model according to the sample semantic information and the sample facial expression parameters to obtain a preset simulation digital human model.

Optionally, the sample facial expression parameter association subunit includes a facial region acquisition component, a facial keypoint acquisition component, and a sample facial expression parameter acquisition component. Wherein the facial region acquisition component is used for acquiring the facial region of the target model in the sample image. The facial keypoint acquisition component is for acquiring facial keypoints in a facial region. The sample facial expression parameter acquisition component is used for processing the facial key points in the facial area and determining sample facial expression parameters of the target model in the sample image.

Optionally, the sample description parameters further include sample input information; the simulated digital human model acquisition unit comprises a sample pronunciation information acquisition subunit, a sample mouth shape parameter association subunit and a third simulated digital human model acquisition subunit. The sample pronunciation information acquisition subunit is used for acquiring sample pronunciation information corresponding to the sample input information. The sample mouth shape parameter association subunit is used for acquiring sample mouth shape parameters of the target model according to the sample image and associating the sample pronunciation information with the sample mouth shape parameters. And the third simulated digital human model obtaining subunit is used for constructing a simulated digital human model according to the sample pronunciation information and the sample mouth shape parameters to obtain a preset simulated digital human model.

Optionally, the sample description parameters further include sample input information; the simulation digital human model obtaining unit comprises a second sample semantic information obtaining subunit, a sample semantic category obtaining subunit, a sample trunk action parameter association subunit and a fourth simulation digital human model obtaining subunit. The second sample semantic information acquiring subunit is used for acquiring sample semantic information corresponding to the sample input information. The sample semantic category acquiring subunit is used for acquiring the sample semantic category of the sample semantic information. The sample trunk action parameter association subunit is used for acquiring the sample trunk action parameters of the target model according to the sample images and associating the sample semantic categories with the sample trunk action parameters. And the fourth simulation digital human model obtaining subunit is used for constructing a simulation digital human model according to the sample semantic category and the sample trunk action parameters to obtain a preset simulation digital human model.

Optionally, the sample trunk action parameter association subunit includes an effective region acquisition component and a sample trunk action parameter association component. The effective area acquisition component is used for acquiring an effective area including the target model in each sample image. The sample trunk action parameter association component is used for performing semantic segmentation processing on the effective area, determining sample trunk action parameters of the target model in each sample image, and associating the sample semantic categories with the sample trunk action parameters.

Optionally, the simulation 3D digital human generation apparatus further includes a second simulation digital human image output module, a timing information determination module, a simulation digital human video generation module, an audio information configuration module, and a playing module. The second simulation digital human image output module is used for acquiring a plurality of simulation digital human images. The time sequence information determining module is used for determining the time sequence information of the output of at least two simulated digital human images. The simulation digital human video generation module is used for generating simulation digital human videos based on the plurality of simulation digital human images according to the time sequence information. And the audio information configuration module is used for configuring corresponding audio information for the simulated digital human video according to the time sequence information. The playing module is used for synchronously playing the video and audio information of the simulation digital person.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the steps of the simulated 3D digital person generation method provided in the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program is executed by a processor to perform the steps of the simulated 3D digital human generation method provided in the first aspect.

Compared with the prior art, the method, the device, the electronic equipment and the storage medium for generating the simulated 3D digital person provided by the embodiment of the application can acquire the description parameters, determine the presentation angle of the simulated digital person based on the description parameters, and acquire and output the simulated digital person image corresponding to the presentation angle, so that when a model for training to obtain the simulated digital person does not have time, the simulated digital person image at various angles can be generated through the simulated digital person model obtained by the model training, the model does not need to be specially invited to participate in image shooting at specific time and place, the image generation cost is reduced, meanwhile, the real-time environment where the user and the simulated digital person are in face-to-face communication can be simulated, the vivid effect of presenting the simulated digital person image is improved, and the human-computer interaction experience is optimized.

These and other aspects of the present application will be more readily apparent from the following description of the embodiments.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating an application environment of a method for generating a simulated 3D digital human according to an embodiment of the present application.

Fig. 2 is a schematic diagram illustrating another application environment of a method for simulating 3D digital human generation according to an embodiment of the present application.

Fig. 3 shows a flow diagram of a simulation 3D digital human generation method provided by an embodiment of the present application.

Fig. 4 shows a flow chart of the method of fig. 2 for constructing the simulated digital human model.

FIG. 5 is a flow chart illustrating a process of constructing a simulated digital human model based on sample image configuration parameters in the method shown in FIG. 4.

FIG. 6 is a flow chart illustrating a process for determining relative position information based on images in the method of FIG. 2.

Fig. 7 is a flow chart illustrating a process of acquiring relative position information based on coordinates in the method shown in fig. 6.

Fig. 8 is a flow chart illustrating a process of acquiring an image of a simulated digital person based on image combination in the method shown in fig. 2.

FIG. 9 is a flow chart illustrating a process of obtaining a simulated digital human model based on sample semantic information in the method shown in FIG. 4.

Fig. 10 is a flow chart illustrating a process of obtaining sample facial expression parameters based on facial key points in the method shown in fig. 9.

FIG. 11 is a flow chart illustrating a process of obtaining a simulated digital human model based on sample pronunciation information in the method shown in FIG. 4.

FIG. 12 is a flow chart illustrating the process of obtaining a simulated digital human model based on sample semantic categories in the method shown in FIG. 4.

FIG. 13 illustrates a flow diagram for correlating sample torso-motion parameters in the method illustrated in FIG. 12.

Fig. 14 shows another flowchart of a method for generating a simulated 3D digital human provided by the embodiment of the present application.

Fig. 15 shows a functional block diagram of an emulated 3D digital human generation apparatus proposed in the embodiment of the present application.

Fig. 16 shows a functional block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Definition of terms

3D digital human: and (3) realizing digital human by computer graphics technologies such as 3D modeling and rendering.

Simulating a digital person: and generating a vivid image with the image quality of each frame being similar to that of the camera through the deep learning model, wherein the digital person has the effect like a real person shot by the camera. Alternatively, a video digital person may be generated from a coherent realistic image.

Simulating a 3D digital person: the digital man is generated by simulating the digital man technology, and the three-dimensional vivid effect is realized by simulating the digital man in consideration of the presentation angle of the digital man. Alternatively, a stereorealistic video digital person may be generated from a sequence of multiple simulated digital person images.

At present, the presentation mode of the simulation digital human in the prior art is generally to construct a model for outputting a simulation digital human image by training a neural network. In order to improve the fidelity of the displayed picture of the simulated digital person, various actions are usually designed for the simulated digital person, and the actions are matched with the voice fed back and conducted to the user, so that the user has a better appearance. Although the method of matching the action with the voice can obviously improve the fidelity of the displayed picture of the simulated digital person, the method only matches the voice with the action and does not establish the relation between the posture of the user and the simulated digital person, and when the method is actually applied, the simulated digital person is usually in a state of a fixed display angle in the displayed picture and does not accord with the habit of watching the picture of a person by the user, thereby causing the fidelity of the displayed picture of the simulated digital person to be lower.

In order to solve the above-described problems, the present inventors have made research and development to study how to adjust the rendering angle of a simulated digital person in a rendering screen. Based on this, the inventor provides a method, an apparatus, an electronic device and a storage medium for generating a simulated 3D digital person according to the embodiments of the present application, which can obtain description parameters, determine a presentation angle of the simulated digital person based on the description parameters, and obtain and output a simulated digital person image corresponding to the presentation angle, so that when a model for training to obtain the simulated digital person does not have time, the simulated digital person image at various angles can be generated through the simulated digital person model obtained by the model training, and the model does not need to be specially invited to participate in image shooting at a specific time and place, thereby reducing the cost of generating images, and simultaneously, an environment in which a user is in face-to-face communication with the simulated digital person in real time can be simulated, thereby improving the realistic effect of presenting the simulated digital person image, and optimizing human-computer interaction experience.

In order to better understand the method, the apparatus, the electronic device, and the storage medium for generating the simulated 3D digital human provided in the embodiments of the present application, an application environment suitable for the embodiments of the present application is described below.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment suitable for the embodiment of the present application. The simulation 3D digital human generation method provided by the embodiment of the present application can be applied to the simulation 3D digital human generation system 100 shown in fig. 1. The simulation 3D digital human generation system 100 comprises an intelligent terminal 101 and a server 102, wherein the server 102 is in communication connection with the intelligent terminal 101. The server 102 may be implemented by an independent server or a server cluster composed of a plurality of servers. In addition, the server may be a cloud server, and may also be a traditional machine room server, which is not specifically limited herein.

In some embodiments, the smart terminal 101 may be a variety of electronic devices having a display screen and supporting data entry, including but not limited to smartphones, tablets, laptop portable computers, desktop computers, wearable electronic devices, and the like. Specifically, the data input may be based on voice input by a voice module, characters input by a character input module, images input by an image input module, videos input by a video input module, and the like, which are provided on the intelligent terminal 101, or may be based on a gesture recognition module installed on the intelligent terminal 101, so that a user may implement an interactive mode such as gesture input.

Wherein, the intelligent terminal 101 may be installed with a client application program, and a user may communicate with the service end 102 based on the client application program (e.g. APP, wechat applet, etc.), specifically, the service end 102 is installed with a corresponding service application program, and the user may register a user account at the service end 102 based on the client application program and communicate with the service end 102 based on the user account, for example, a user logs into a user account at a client application, and enters through the client application based on the user account, text information, voice information, image information or video information and the like can be input, and after the client application program receives the information input by the user, the information can be sent to the server 102, so that the server 102 can receive, process and store the information, and the server 102 can also receive the information and return corresponding output information to the intelligent terminal 101 according to the information.

In some embodiments, a client application may be used to provide customer service to a user, in customer service communication with the user, and the client application may interact with the user based on an emulated digital person. In particular, the client application may receive information input by a user and respond to the information based on an emulated digital person. The simulated digital human is a software program based on visual graphics, and the software program can present a robot shape simulating biological behaviors or ideas to a user after being executed. The simulated digital person may be a simulated digital person simulating a real person, such as a simulated digital person shaped like a real person built according to the user's own form or the form of another person, or a simulated digital person having an animation effect, such as a simulated digital person in an animal form or a cartoon character form.

In some embodiments, as shown in fig. 2, after acquiring reply information corresponding to information input by the user, the smart terminal 101 may display an emulated digital human image corresponding to the reply information on a display screen of the smart terminal 101 or other image output device connected thereto. As a mode, while the simulated digital human image is played, the audio corresponding to the simulated digital human image may be played through a speaker of the intelligent terminal 101 or other audio output devices connected thereto, and the text or the graphic corresponding to the reply information may also be displayed on the display screen of the intelligent terminal 101, so that multi-state interaction with the user in multiple aspects of image, voice, text, and the like is realized.

In some embodiments, the device for processing the information input by the user may also be disposed on the intelligent terminal 101, so that the intelligent terminal 101 can realize the interaction with the user without establishing communication with the server 102, and at this time, the simulation 3D digital human generation system 100 may only include the intelligent terminal 101.

The above application environments are only examples for facilitating understanding, and it is to be understood that the embodiments of the present application are not limited to the above application environments.

The following describes in detail the method, apparatus, electronic device and storage medium for generating simulated 3D digital human provided by the embodiments of the present application with specific embodiments.

Referring to fig. 3, an embodiment of the present application provides a method for generating a simulated 3D digital person, where the method for generating a simulated 3D digital person can be applied to the simulated 3D digital person generation system 100, can also be applied to the intelligent terminal 101 in the simulated 3D digital person generation system 100, and can also be applied to the server 102 in the simulated 3D digital person generation system 100. Specifically, the simulation 3D digital person generation method may include the following steps S11 to S14.

Step S11: and acquiring description parameters, wherein the description parameters comprise relative position information of the target object relative to a reference position, and the target object comprises an object for calibrating the angle of the simulated digital person.

In this embodiment, the reference position may be a position set in advance for obtaining the relative position information. For example, the reference position may be a camera for acquiring an image of the target object, or may be a frame of the intelligent terminal 101 for presenting a 3D digital person, and the setting of the reference position is not particularly limited herein.

In this embodiment, the description parameters may be obtained directly or indirectly. When the description parameters are acquired in a direct mode, the engineer can directly input the description parameters to adjust the presentation angle of the simulation digital person. The method is particularly suitable for scenes for making the simulated digital human video by directly inputting the description parameters (for example, when the simulated digital human video needs to be made, the description parameters are directly input, the presenting angle of the simulated digital human is adjusted in real time), the presenting angle of the simulated digital human is in a dynamic change state, when the simulated digital human is generated based on the real human, the made simulated digital human video is closer to the recorded real human video, and the fidelity of the presenting picture of the simulated digital human in the simulated digital human video is improved.

In some examples, when the description parameter is obtained in an indirect manner, the position of the target object may be obtained, and the relative position information may be obtained according to the position of the target object and the reference position. The method for obtaining the position of the target object may include infrared ranging, sound wave ranging, image ranging, and the like, and the method for obtaining the position of the target object is not particularly limited. The description parameters are obtained in an indirect mode, the method is particularly suitable for a scene of interaction between a user and the simulation digital person, the presenting angle of the simulation digital person is adjusted in real time by detecting the relative position information of the target object in real time, so that a simulation digital person picture corresponding to the orientation of the user can be presented, the environment of face-to-face communication between the user and the simulation digital person is simulated, and the interaction experience of the user is improved.

It should be noted that, for acquiring description parameters in different manners, the description parameters can be preferentially applied to a scene corresponding to the manner. For example, when the method of directly inputting the description parameters is adopted, the method can be preferentially applied to the simulation 3D digital human generation system 100 shown in fig. 1; when the description parameters are obtained in an indirect manner, the method can be preferentially applied to the simulation 3D digital human generation system 100 shown in fig. 2. Specifically, the description parameters may also be obtained by using a combination of a direct method and an indirect method, and at this time, the priority for obtaining the description parameters in the direct method may be preset to be higher than the priority for obtaining the description parameters in the indirect method. In some examples, the priority of the description parameter acquired by the information input by the user may be set in advance to be greater than the priority of the description parameter acquired by the measurement, that is, if the relative position information acquired by the information input by the user is a and the relative position information acquired by the measurement is B, a is used as the orientation parameter. Therefore, by setting the priority, the actual application scene of the simulation 3D digital person generation method of the present embodiment can be adjusted accordingly, and the relative position information can be determined flexibly, so that the simulation 3D digital person generation method of the present embodiment is suitable for some special scenes (for example, when the intelligent terminal 101 for presenting the simulation digital person is a large screen of a meeting place, a user interacting with the simulation digital person wants to make the simulation digital person face a certain position in the meeting place, rather than making the simulation digital person face himself), thereby widening the application scene of the simulation 3D digital person generation method, and better meeting the requirement of the user for controlling the simulation digital person.

Step S12: and determining the presentation angle of the simulated digital person to be presented according to the relative position information.

In this embodiment, the relative position information may be in a corresponding state to the presentation angle of the emulated digital person. The presentation angle of the simulated digital person may include a presentation angle of an eyeball, a face, and the like of the simulated digital person on a carrier (e.g., a meeting place large screen, a projector, a smartphone, and the like) on which the simulated digital person is presented. For example, when it is determined that the face orientation of the simulated digital person is deviated from the axis direction by 15 degrees from the face orientation of the target object based on the relative position information, the current presentation angle of the simulated digital person is determined and adjusted so that the face orientation of the simulated digital person substantially coincides with the axis direction. The face orientation can be the direction of a ray which is in the same plane with the face midline and is perpendicular to the face, and the face midline can be a line formed by connecting the midpoint of the two eyes and the nose tip. The axial direction may be a reference direction in which the face faces, and the axial direction may be set in advance.

Step S13: and acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model.

In the present embodiment, the simulation digital human model may be a model constructed in advance for outputting a simulation digital human image. In addition, the mode of outputting the simulated digital human image corresponding to the presentation angle by the simulated digital human model is not specifically limited here. For example, the simulated digital person model may be a simulated digital person image construction model in which a presentation angle is input, the simulated digital person model constructs a simulated digital person image at the presentation angle (for example, the presentation angle is such that an angle between a face direction and an axis direction is zero, the simulated digital person model constructs only an image corresponding to a screen to be presented (a front view of the simulated digital person), and does not need to construct an image other than the screen presented by the simulated digital person), and therefore, in generating the simulated digital person image, the 3D digital person is driven without constructing the 3D digital person.

Further, when the 3D digital human model is a 3D digital human, the simulated digital human image may be obtained by adjusting an angle of the 3D digital human, or the simulated digital human image may be obtained by adjusting an angle of the obtained 3D digital human, or the simulated digital human image may be obtained by combining the angle of the 3D digital human and the angle of the obtained 3D digital human.

When the simulated digital person image is acquired by adjusting the angle of the 3D digital person, the 3D digital person can be adjusted to rotate so that the angle of the 3D digital person is in accordance with the presentation angle, and the simulated digital person image corresponding to the presentation angle is acquired. For example, when the face orientation of the 3D digital person is deviated by 15 degrees from the expected presentation angle, the 3D digital person is controlled to rotate by 15 degrees so that the face orientation of the 3D digital person coincides with the expected presentation angle, and then the simulated digital person image corresponding to the presentation angle is acquired. At the moment, the direction of the acquisition module for acquiring the simulation digital human image for acquiring the image can be kept unchanged, only the angle of the 3D digital human needs to be adjusted, the mode for adjusting the angle of the 3D digital human is particularly suitable for a 3D digital human model which is constructed in advance by the 3D digital human, and each direction of the 3D digital human in the 3D digital human model constructs a complete scene without consuming a large amount of calculation power to process the image.

When the simulated digital person image is acquired by adjusting the angle of acquiring the 3D digital person, the direction of acquiring the image by the acquisition module for acquiring the simulated digital person image can be adjusted so that the angle of the 3D digital person conforms to the presentation angle, and then the simulated digital person image corresponding to the presentation angle is acquired. For example, when the face orientation of the 3D digital person is deviated by 15 degrees from the expected presentation angle, the direction of the image captured by the capturing module is controlled to be deflected by 15 degrees, so that the face orientation of the 3D digital person is matched with the expected presentation angle, and then the simulated digital person image corresponding to the presentation angle is captured. At the moment, the angle of the 3D digital person can be kept unchanged, and only the direction of the acquisition module for acquiring the image needs to be adjusted.

When the angle of the 3D digital person is adjusted and the angle of the acquired 3D digital person is adjusted to be combined to acquire the simulated digital person image, the 3D digital person is adjusted to rotate while the direction of the acquisition module for acquiring the simulated digital person image for acquiring the image is adjusted, so that the angle of the 3D digital person is in accordance with the presentation angle, and the simulated digital person image corresponding to the presentation angle is acquired. For example, when the front of the 3D digital person deviates 15 degrees from the expected presentation angle, the 3D digital person is controlled to rotate 7.5 degrees, the direction of the image acquired by the acquisition module is controlled to deflect 7.5 degrees, so that the face orientation of the 3D digital person is matched with the expected presentation angle, and the simulated digital person image corresponding to the presentation angle is acquired. At this time, the direction of the image collected by the collecting module and the angle of the 3D digital person can be adjusted at the same time. The mode of adjusting the direction of the acquisition module acquiring the image and the angle of the 3D digital person at the same time can be compatible with the mode of adjusting the angle of the 3D digital person and the effect of adjusting the direction of the acquisition module acquiring the image, so that the presentation angle of the 3D digital person in the simulation digital person image is more in line with expectation, the action change of the 3D digital person is more vivid, and the human-computer interaction experience is improved.

Further, when the simulated digital human model builds a model for the simulated digital human image, the presentation angle may be associated with the image output by the simulated digital human model. Specifically, when the rendering angle is acquired, each sub-image (which may be an image representing the form of the simulated digital person at various rendering angles) related to the rendering angle is acquired in the simulated digital person model, and the sub-images are spliced, so that the simulated digital person image is output. It should be noted that, because the simulated digital human image output by the simulated digital human model is formed by splicing the real images of the real human used for training the simulated digital human model, and the display picture of the simulated digital human in the simulated digital human image is the shot real image, the simulated digital human in the output simulated digital human image can be more vivid.

It can be understood that the process of obtaining a stereoscopic 3D digital person through 3D modeling greatly depends on the prior experience of the modeler, and 3D digital persons close to real persons are realized through a great amount of artificial adjustment, and obtaining 3D digital persons corresponding to different models requires repeated modeling processes, which consumes a great deal of labor cost. The preset simulated digital human model is a deep learning model obtained through training, 3D modeling is not needed in the process of obtaining the target simulated digital human image through the simulated digital human model, the obtained simulated digital human is closer to a real human model, the effect is more vivid, and the method is suitable for the situation that different real human models are possibly required to be modeled to obtain the simulated digital human in practical application.

Step S14: and outputting the simulated digital human image.

In this embodiment, the emulated digital person image may be output to a device for presenting the emulated digital person image, such that the device presents the emulated digital person image. For example, the device may be a smart terminal 101 such as a meeting place large screen, a projector, a smart phone, and the like. In some examples, when the user faces the smart terminal 101, the smart terminal 101 may acquire the face orientation of the user and present a 3D digital human screen corresponding to the face orientation, simulating an environment in which the user communicates with the 3D digital human face. In other examples, when the simulation 3D digital person generation method of the present embodiment is applied to video recording, a simulation digital person image that meets the user's expectations may be obtained based on the description parameters without specially photographing a real person.

In this embodiment, through the implementation of the above steps S11 to S14, description parameters may be obtained, a presentation angle of the simulated digital person is determined based on the description parameters, and an image of the simulated digital person corresponding to the presentation angle is obtained and output, so that when a model for training to obtain the simulated digital person does not have time, the simulated digital person image at various angles may be generated through a simulated digital person model obtained by training the model, and it is not necessary to specially invite the model to participate in image shooting at a specific time and place, thereby reducing the cost of generating images, and at the same time, an environment in which a user is in face-to-face communication with the simulated digital person in real time may be simulated, thereby improving the realistic effect of presenting the simulated digital person image, and optimizing human-computer interaction experience.

In order to be able to output the simulation digital human image, a simulation digital human model may be constructed in advance, and to this end, the embodiment of the present application further provides a simulation 3D digital human generation method, as shown in fig. 4, the simulation 3D digital human generation method may include the following steps S21 to S26. The method for generating a simulated 3D digital person provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S21: a plurality of sample images and sample description parameters corresponding to each sample image are acquired.

In this embodiment, the sample image may include images generated by photographing the object at different angles. Specifically, the sample image may be an image acquired when the target object makes various motions, sounds, expressions, and the like at various angles. The sample description parameters may include photographing parameters of a photographing device for photographing a sample image when the sample image is photographed, image parameters of the sample image, and the like. For example, the photographing parameters may include a focal length, a distance from the target object when the photographing device photographs the sample image, a direction and an angle when the photographing device photographs the sample image, and the like. The image parameters may be pixel size, contrast, saturation, etc. of the sample image.

In this embodiment, the object may comprise a model associated with an emulated digital person. For example, when the target object is a particular cast, the emulated digital persona may be an emulated digital persona that is substantially the same modality as the cast. The target object may be a person whose face, skeleton, or figure is similar to the face, skeleton, or figure of the target object, or may be a dummy (e.g., a wax figure) whose face, skeleton, or figure is similar to the face, skeleton, or figure of the target object.

In some examples, cameras for capturing images of the object may be arranged annularly around the object periphery, wherein cameras of different focal length sizes may be provided with respect to the same orientation of the object. When the target object makes a sound, changes an action, changes a facial expression, or the like, images including the target object may be simultaneously acquired by using the respective image pickup devices, thereby obtaining a plurality of sample images.

Step S22: and constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model.

In this embodiment, morphological information of the target object at different angles may be acquired based on the sample image and the sample description parameters. The morphological information may include information related to a change in the body of the target object. For example, the shape information may be a drop in mouth angle, a right deviation of eyes, a head lift, a right hand lift, or the like.

In some examples, the angle at which the sample image is captured may be obtained based on the sample description parameters, or the sample image may be calibrated to obtain the angle of the target object in the sample image. When the angle is obtained by calibrating the sample image, the target object in the sample image can be identified to obtain the presenting angle of the sample image, so that the sample image is labeled to obtain the angle, and each sample image can be manually labeled to obtain the angle.

In some examples, when the target object in the sample image is identified, each part of the target object may be acquired from the sample image through a target detection algorithm, and the morphological information of the part is determined based on the change states of the same part in the multiple continuous sample images, so as to obtain the morphological information of each part of the target object. For example, the target detection algorithm may be a sliding window target detection, a two stage target detection algorithm, a one stage target detection algorithm, or the like.

In this embodiment, when the form information of the target object at different angles is acquired, each angle may be in one-to-one correspondence with different forms of the target object, when the description parameter including the relative position information is input, the presentation angle may be obtained based on the description parameter, and the presentation angle is input to the simulation digital human model, the sub-image corresponding to the presentation angle is acquired in the simulation digital human model, the sub-image includes the form of the target object corresponding to the angle, and the simulation digital human image is output after the sub-images are spliced. At this time, the method for generating the simulated 3D digital person provided by this embodiment is particularly suitable for being applied to a model for training to obtain the simulated digital person without time, and can generate images of the simulated digital person at various angles through a simulated digital person model obtained by training the model, so that it is not necessary to specially invite the model to participate in image shooting at a specific time and place, thereby reducing the cost of generating the images.

In this embodiment, when the simulated digital human model is a 3D digital human, the original 3D digital human and modeling information of the original 3D digital human may be acquired, and the 3D digital human may be generated according to the morphological information and the modeling information. Wherein the original 3D digital person may comprise a model of the already constructed 3D digital person. For example, the original 3D digital person may be an average human face model of a certain region, or may be a 3D animation model in an industrial cartoon, and the type of the original 3D digital person is not particularly limited herein. In addition, the modeling information may include parameter information for constructing an original 3D digital person, by which the original 3D digital person may be restored so that the original 3D digital person can be presented.

In this embodiment, the morphological information of the object may be combined with the modeling information such that morphological features of the object are added to the original 3D digital person, thereby obtaining a 3D digital person including the morphological information of the object. In addition, when the description parameter including the relative position information is input, a presentation angle may be obtained based on the description parameter, and the presentation angle is input to the simulation digital human model, the presentation angle of the 3D digital human in the simulation digital human model is controlled, and the simulation digital human image including the 3D digital human at the presentation angle is acquired. At this time, the method for generating the simulated 3D digital person provided by this embodiment is particularly suitable for constructing a relatively complete scene in each direction of the 3D digital person in the simulated digital person model, and does not need to consume a large amount of computing power to process an image, and only needs to provide description parameters to drive the 3D digital person, so that the angle of the 3D digital person is a presentation angle.

In this embodiment, through the implementation of the above steps S21 to S22, the simulated digital human model may be obtained by image stitching or a 3D digital human being building method, and then the simulated digital human image is generated according to the simulated digital human model.

Further, in order to enable the presentation angle of the simulated digital person to be in accordance with expectation in the simulated digital person image output by the simulated digital person model, camera parameters can be obtained in advance and combined with the sample image; the sample description parameters include camera parameters, and as shown in fig. 5, the above step S22 may include the following steps S221 to S223.

Step S221: sample image configuration parameters corresponding to the camera parameters are acquired.

In the present embodiment, the camera parameters may include parameters employed when a photographing device for photographing the sample image photographs the target model. For example, the camera parameters may be focal length, aperture size, and the like. The sample image configuration parameters may include parameters of a sample image generated by a photographing device for photographing a sample image photographing the target model. For example, the sample image configuration parameters may be pixel size, image exposure, the percentage of the target model in the image, the location of the target model in contact with the ground, and the like.

Step S222: and obtaining angle information of the target model according to the sample image, and associating the angle information with the sample image configuration parameters.

In this embodiment, the angle information may include an angle at which the target model is presented in the sample image. For example, when the angle between the face orientation of the target model in the sample image and the preset axis direction is 15 degrees, 15 degrees may be taken as the angle information. In some examples, the sample images may be identified to obtain an angle of the target model. Specifically, each part of the target model can be acquired from the sample image through a target detection algorithm, the angle of the part is determined based on the change state of the same part in the multiple continuous sample images, so that the angle of each part of the target model is obtained, and the angle of each part is used as angle information. For example, the target detection algorithm may be a sliding window target detection, a two stage target detection algorithm, a one stage target detection algorithm, or the like.

Step S223: and constructing a simulation digital human model according to the sample image configuration parameters and the angle information to obtain a preset simulation digital human model.

In this embodiment, the sample image may be regarded as being composed of a plurality of regions and a plurality of points, the states of the plurality of regions and the plurality of point locations of the target model at each angle are obtained based on the sample image configuration parameters and the angle information, and the regions and the point locations at each angle are combined to construct the simulated digital human model, so that the simulated digital human model can output images including the target model at different angles.

In this embodiment, through the implementation of the above steps S221 to S223, the camera parameters may be obtained in advance, and the camera parameters are combined with the sample image, so that the presentation angle of the simulated digital person in the simulated digital person image output by the simulated digital person model can be better matched with the expected angle.

Step S23: and acquiring description parameters, wherein the description parameters comprise relative position information of the target object relative to a reference position, and the target object comprises an object for calibrating the angle of the simulated digital person.

Further, in order to obtain the relative position information, the position of the determined target object and the reference position may be obtained first to determine the relative position information of the target object with respect to the reference position; as shown in fig. 6, the above step S23 may include the following steps S231 to S233.

Step S231: an image including a target object is acquired, and spatial position information of the target object is determined based on the image.

In this embodiment, the image including the target object may be an image of a captured target object at different angles, or may be a sound vibration image generated based on sound wave feedback, and the representation form of the image including the target object is not particularly limited herein. In addition, the spatial position information may include information for characterizing the position of the target object in space. For example, the spatial position information may be a position where the target object is located in the image, and may also be a segment of a preset amplitude and frequency in the sound vibration image.

Step S232: reference position information of a reference position is acquired, and the reference position is used for representing the position of a reference object simulating the digital person.

In some examples, the reference position information may include information characterizing a position of the reference object. In order to reduce the amount of calculation for calculating the orientation parameters, the reference position information may be stored in advance and may be extracted directly when calculating the reference position information. The expression form of the reference position information may be an image, a digital signal, or the like, and the expression form of the reference position information is not particularly limited herein.

Step S233: and determining the relative position information of the target object relative to the reference position according to the spatial position information and the reference position information.

In this embodiment, the control position information may be compared with the reference position information to determine the relative position information.

In the present embodiment, through implementation of the above steps S231 to S233, the relative position information of the target object with respect to the reference position can be acquired, and the relative position information can be detected and acquired in real time, so that the presentation angle of the presented simulated digital person can be determined in real time based on the relative position information.

Further, in order to acquire more accurate relative position information, the distance and the relative angle between the target object and the reference object can be calculated; as shown in fig. 7, the above step S233 may include the following steps S2331 to S2333.

Step S2331: and acquiring a target coordinate parameter of the target object according to the spatial position information.

In this embodiment, the target coordinate parameter may be a target coordinate parameter of a head, eyes, mouth, or the like of the target object. For example, when the acquired images of the target object are included in different angles, the camera parameters of the image captured by the capturing device and the target object information in the image may be acquired, and then the target coordinate parameters of the target object may be determined based on the capturing parameters and the target object information of different images. Wherein the photographing means may comprise means for photographing the object to form an image. The camera parameters may include parameters used when the photographing device photographs the object. For example, the camera parameters may include a photographing angle, a focal length, an aperture, and the like of the photographing device. The object information may include presentation information of the object in the image. For example, the object information may be a presentation angle of the object in the image, a proportion of the object in the image, a position where the object is in contact with the ground, and the like.

Step S2332: and acquiring the reference coordinate parameters of the reference object according to the reference position information.

In this embodiment, the manner of obtaining the reference coordinate parameters of the reference object according to the reference position information is similar to the manner of obtaining the target coordinate parameters of the target object according to the spatial position information in step S2331, and is not repeated herein.

In addition, it should be noted that the reference coordinate parameters may also be stored in advance, and the reference coordinate parameters may be extracted subsequently.

Step S2333: and comparing the target coordinate parameters with the reference coordinate parameters, and determining the relative distance and the relative angle between the target object and the reference object to obtain relative position information comprising the relative distance and the relative angle.

In this embodiment, the target coordinate parameter and the reference coordinate parameter may be placed in the same coordinate system for comparison, and then the distance and the relative angle between the target object and the reference position may be calculated, so as to obtain the relative position information including the distance and the relative angle.

In this embodiment, through the implementation of the above steps S2331 to S2333, the distance and the relative angle between the target object and the reference object can be obtained, so as to obtain the relative position information, and the real-time detection and the acquisition of the relative position information are realized, so that the presenting angle of the presented simulated digital person can be determined in real time based on the relative position information.

Step S24: and determining the presentation angle of the simulated digital person to be presented according to the relative position information.

Step S25: and acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model.

Further, in order to be able to obtain a simulated digital human image corresponding to the presentation angle, a plurality of target images may be combined; as shown in fig. 8, the above step S25 may include the following steps S251 to S252.

Step S251: and acquiring a plurality of target images corresponding to the presentation angle from a preset simulation digital human model according to the presentation angle.

In this embodiment, the target image may include images presented at the presentation angle for presenting respective preset areas in the presentation screen for simulating the digital person. The presentation screen may include a screen presented by a display device (for example, the display device may be a meeting place large screen, a projector, a smart phone, or the like) for presenting an emulated digital person screen. The preset areas can be set in advance according to experience, and the size of each preset area can be the same or different. For example, when the presentation screen is a large conference screen, the screen for presenting the simulated digital person in the large conference screen may be equally divided into two hundred equal parts, four hundred equal parts, one thousand equal parts, and the like.

In this embodiment, based on the presentation angle, the target image corresponding to each preset area under the presentation angle may be queried in the simulated digital human model. For example, the presentation angle is 30 degrees, the number of the preset regions is two hundred, and the target images presented by two hundred preset regions at 30 degrees may be acquired at 30 degrees.

Step S252: and combining the plurality of target images to obtain the simulation digital human image corresponding to the description parameters.

In this embodiment, each target image may be combined and spliced according to the corresponding relationship between the target image and the preset region, so as to obtain the simulated digital human image corresponding to the description parameter.

In this embodiment, through the implementation of the steps S251 to S252, a plurality of target images corresponding to the presentation angle may be obtained, and the plurality of target images may be spliced, without constructing a complete 3D digital person, only a picture presented by the simulated digital person at the presentation angle needs to be spliced and output, so as to reduce the time for training the simulated digital person model.

Step S26: and outputting the simulated digital human image.

In order to enable the presentation effect of the subsequently output simulation digital human image to meet the expectation of a user, a simulation digital human model can be trained based on sample semantic information; the sample description parameters further include sample input information, and the simulation 3D digital person generation method may include the following steps S31 to S36. The method for generating a simulated 3D digital person provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S31: a plurality of sample images and sample description parameters corresponding to each sample image are acquired.

Step S32: and constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model.

As shown in fig. 9, the above step S32 may include the following steps S321 to S323.

Step S321: and acquiring sample semantic information corresponding to the sample input information.

In this embodiment, a corresponding processing manner may be adopted for the sample input information based on the type of the sample input information, and the intention represented by the sample input information is obtained, so as to obtain corresponding semantic information. For example, when the sample input information is voice, the voice may be subjected to voice recognition processing to obtain a text corresponding to the voice, an intention recognition model is used to recognize the text to obtain an intention represented by the sample input information, and then semantic information corresponding to the sample input information is obtained; when the sample input information is a character, the character can be identified by adopting an intention identification model, the intention represented by the sample input information is obtained, and then semantic information corresponding to the input information is obtained; when the sample input information is an image, the image can be subjected to image recognition processing to obtain the intention represented by the sample input information, and then semantic information corresponding to the input information is obtained. It should be noted that the input information may include a plurality of voices, characters, and images, and at this time, the input information may be processed in a manner corresponding to the voices, the characters, and the images, respectively, to obtain the intention represented by the sample input information, and then obtain the semantic information corresponding to the sample input information.

In this embodiment, the technician may predetermine various intentions and randomly derive different sample input information based on each intention for training of the simulated digital human model.

Step S322: and acquiring sample facial expression parameters of the target model according to the sample image, and associating sample semantic information with the sample facial expression parameters.

In the present embodiment, the sample facial expression parameters may be obtained by performing image recognition on the sample image. The sample facial expression parameters may include parameters representing the changing state of each region of the face of the target model. For example, the sample facial expression parameters may include parameters of changing states of the left corner of the eye, the right corner of the eye, the corner of the mouth, facial contours, eyebrows, nasal wings, and the like. In some examples, the sample image may be subjected to image recognition, the state of each region of the target model, such as the left corner of the eye, the right corner of the eye, the mouth corner, the facial contour, the eyebrows, and the nose wing, at each time node is obtained, and the state is combined with the time attribute to obtain the parameters of the change state of each region of the face of the target model.

In this embodiment, sample semantic information may be associated with sample facial expressions. For example, when the sample semantic information represents a happy event, a sample facial expression at a time period when the sample semantic information is a happy event is associated with the sample semantic information. Thus, a plurality of sample facial expressions under the same sample semantic information may be obtained and associated with the sample semantic information.

It should be noted that, since the process of the target model expressing the sample semantic information has a time attribute, the target model is also a dynamic process when making facial expressions. Therefore, in the process of identifying the sample image, the dynamic change process of each area of the target model face at each moment can be acquired, so as to obtain the sample facial expression parameters.

Step S323: and constructing a simulation digital human model according to the sample semantic information and the sample facial expression parameters to obtain a preset simulation digital human model.

In this embodiment, the sample semantic information may be classified according to the emotion classification of the target model. Wherein the emotion classification may be predefined, for example, the emotion classification may be happy, excited, angry, depressed, and the like.

In this embodiment, through the implementation of the above steps S321 to S323, the sample facial expression parameters of the target model can be specifically obtained, and the sample facial expression parameters are associated with the sample semantic information, so that the facial expression of the target model can be specifically defined, and when the simulated digital person image is subsequently output, the facial expression of the simulated digital person in the simulated digital person image can be matched with the semantic represented by the output feedback information, thereby effectively improving the fidelity of the output simulated digital person.

In order to improve the degree of association between the sample semantic information and the sample facial expression parameters, facial key points may be set and the facial key points may be corresponding to the sample semantic information, as shown in fig. 10, and the above step S322 may include the following steps S3221 to S3223.

Step S3221: a face region of a target model in a sample image is acquired.

In this embodiment, the face region of the target model may be acquired from the sample image by a target detection algorithm. Wherein the face region may be a face contour of the target model in the sample image. For example, the target detection algorithm may be a sliding window target detection, a two stage target detection algorithm, a one stage target detection algorithm, or the like.

Step S3222: facial keypoints in a facial region are acquired.

In this embodiment, the facial key points in the facial region may be obtained by manual labeling, or may be obtained by machine learning and automatic labeling.

In addition, in order to accurately acquire the facial key points in the facial region, the labeled facial key points can be corresponding to the facial key points of the target model by utilizing the characteristic that the sample facial expression parameters can be dynamically changed in a period of time. Specifically, a model image including a target model in which face key points have been formed in face labeling may be acquired, and variation differences of the face key points in the model image and the face key points in the face region within a preset time period may be calculated. And if the change difference is larger than the preset difference, correcting the key points of the face in the face area. And if the change difference is smaller than or equal to a preset difference, determining that the key points of the face in the current face region are in accordance with the expectation.

Illustratively, a model image of a target model with facial key points formed in facial labels can be obtained, the same facial key points in the continuous model image are associated, so that dynamic change tracks of all the facial key points of the model image in a preset time period are obtained, and facial change amplitudes of all the facial key points of the model image at all times are obtained based on the dynamic change tracks; associating the same face key point in the continuous sample image to obtain a dynamic change track of each face key point of the sample image in a preset time period, and obtaining the face change amplitude of each face key point of the sample image at each moment based on the dynamic change track; comparing the face change amplitude of each face key point of the model image at each moment with the face change amplitude of each face key point of the sample image at each moment to obtain a change difference value, and determining that the face key points in the sample image need to be corrected if the change difference value is greater than a preset amplitude threshold value at the same moment; and if the variation difference value is less than or equal to the preset amplitude threshold value at the same moment, determining that the key points of the face in the current sample image are in accordance with the expectation.

Step S3223: and processing the key points of the face in the face area, and determining sample facial expression parameters of the target model in the sample image.

In this embodiment, sample facial expression parameters may be defined, the sample facial expression parameters are classified based on the result of defining the sample facial expression parameters, and the sample facial expression parameters are associated with facial key points in the facial region. So that when the simulated digital human image is subsequently output, the corresponding image can be output based on the facial key points.

In this embodiment, through implementation of the above-mentioned steps S3221 to S3223, the sample facial expression parameters of the target model in the sample image may be predetermined by the facial key points, so that in outputting the simulated digital human image, the output expression may be determined based on the facial key points, so as to implement management of the simulated digital human facial expression in the simulated digital human image.

Step S33: and acquiring description parameters, wherein the description parameters comprise relative position information of the target object relative to a reference position, and the target object comprises an object for calibrating the angle of the simulated digital person.

Step S34: and determining the presentation angle of the simulated digital person to be presented according to the relative position information.

Step S35: and acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model.

Step S36: and outputting the simulated digital human image.

In order to enable the presentation effect of the subsequently output simulated digital human image to be in accordance with the expectation of a user, the simulated digital human model can be trained on the basis of the sample pronunciation information; the sample description parameters further include sample input information, and the simulation 3D digital person generation method may include the following steps S41 to S46. The method for generating a simulated 3D digital person provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S41: a plurality of sample images and sample description parameters corresponding to each sample image are acquired.

Step S42: and constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model.

As shown in fig. 11, the above step S42 may include the following steps S421 to S423.

Step S421: and acquiring sample pronunciation information corresponding to the sample input information.

In this embodiment, the sample pronunciation information may include the speech generated by the target model in describing the sample input information.

Step S422: and acquiring sample mouth shape parameters of the target model according to the sample image, and associating the sample pronunciation information with the sample mouth shape parameters.

In this embodiment, the change condition of the key point of the mouth of the target model when the target model emits the sound corresponding to the sample pronunciation information may be obtained, so as to obtain the mouth shape parameter for characterizing the change condition of the key point of the mouth. Wherein the mouth keypoints may include locations for identifying and locating various portions of the mouth. For example, the mouth keypoints may include the left corner of the mouth, the right corner of the mouth, the genioglossus sulcus, the bottom of the nose, and so forth.

In this embodiment, the sound emitted by the target model may be associated with the sample mouth shape parameters, specifically, the respective phonemes of the target model are associated with the various states of the mouth key points.

Step S423: and constructing a simulation digital human model according to the sample pronunciation information and the sample mouth shape parameters to obtain a preset simulation digital human model.

In the present embodiment, the mouth shape change of the simulated digital person in the simulated digital person image output can be controlled by controlling the position of the key point of the mouth. It should be noted that, because the mouth shape parameter may have a time attribute, each time node may control the mouth shape change of the simulated digital person in the simulated digital person image, so that the change process of the mouth shape of the simulated digital person may be presented accurately.

In this embodiment, through the implementation of the above steps S421 to S423, the sample mouth shape parameter of the target model can be specifically obtained, and the sample mouth shape parameter is associated with the sample pronunciation information, so that the sample mouth shape of the target model can be specifically defined, and when the simulated digital human image is subsequently output, the mouth shape of the simulated digital human in the simulated digital human image can be matched with the output voice, thereby effectively improving the fidelity of the output simulated digital human.

Step S43: and acquiring description parameters, wherein the description parameters comprise relative position information of the target object relative to a reference position, and the target object comprises an object for calibrating the angle of the simulated digital person.

Step S44: and determining the presentation angle of the simulated digital person to be presented according to the relative position information.

Step S45: and acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model.

Step S46: and outputting the simulated digital human image.

In order to enable the presentation effect of the subsequently output simulation digital human image to meet the expectation of a user, a simulation digital human model can be trained based on sample semantic information; the sample description parameters further include sample input information, and the simulation 3D digital person generation method may include the following steps S51 to S56. The method for generating a simulated 3D digital person provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S51: a plurality of sample images and sample description parameters corresponding to each sample image are acquired.

Step S52: and constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model.

As shown in fig. 12, the above step S52 may include the following steps S521 to S524.

Step S521: and acquiring sample semantic information corresponding to the sample input information.

Step S522: and acquiring the sample semantic category of the sample semantic information.

In this embodiment, the semantic category may include a result for semantically classifying the feedback information. For example, the same semantic meaning may have a plurality of different expression modes, and feedback information with different expression modes and the same semantic meaning may be divided into the same semantic category.

Step S523: and acquiring a sample trunk action parameter of the target model according to the sample image, and associating the sample semantic category with the sample trunk action parameter.

In this embodiment, image recognition may be performed on the sample image to obtain the sample torso motion parameter. The trunk key points in the sample image can be obtained first, and then the sample trunk action parameters of the target model are determined based on the trunk key points. Wherein the torso keypoints may include locations for identifying and locating various parts of the simulated digital human torso. For example, torso key points may include toes, knee joints, hand joints, and the like. The sample torso-motion parameters may be used to characterize the course of changes over time for various torso key points.

In this embodiment, the trunk action parameter of the target model may be associated with the sample semantic category, that is, when receiving the sample semantic category, the trunk action parameter corresponding to the sample semantic category may be output, so that the trunk action of the simulated digital person in the output simulated digital person image corresponds to the sample semantic category.

Further, in order to improve the degree of association between the sample semantic category and the sample trunk motion parameter, the sample image may be subjected to semantic segmentation processing to associate the sample semantic category with the sample trunk motion parameter, and as shown in fig. 13, the step S523 may include the following steps S5231 to S5232.

Step S5231: and acquiring an effective area including the target model in each sample image.

In this embodiment, the effective region may include a region in which the torso of the target model is located in the sample image. In some examples, the position of the torso of the target model may be detected by a target detection algorithm, and a rectangular box may be output, and the area framed by the rectangular box may be regarded as the valid area.

Step S5232: and performing semantic segmentation processing on the effective area, determining sample trunk action parameters of the target model in each sample image, and associating the sample semantic category with the sample trunk action parameters.

In this embodiment, the boundary between the trunk of the target model and the environment in the sample image may be obtained through a semantic segmentation algorithm, so that the trunk of the target model is extracted from the sample image, trunk key points are obtained on the basis of the trunk of the target model, further, the sample trunk action parameters are obtained, and the sample semantic category is related to the sample trunk action parameters. The semantic segmentation algorithm may include region-based semantic segmentation, full-convolution network semantic segmentation, weak supervision semantic segmentation, and the like, and the type of the semantic segmentation algorithm is not particularly limited herein.

Step S524: and constructing a simulation digital human model according to the semantic category and the trunk action parameters of the sample to obtain a preset simulation digital human model.

In this embodiment, the trunk variation of the simulated digital person in the simulated digital person image output can be controlled by controlling the positions of the trunk key points. It should be noted that, because the trunk motion parameter may have a time attribute, each time node may control the trunk change of the simulated digital person in the simulated digital person image, so that the change process of the trunk motion of the simulated digital person may be presented accurately.

In this embodiment, through the implementation of the above steps S521 to S524, the sample trunk action parameter of the target model may be specifically obtained, and the sample trunk action parameter is associated with the sample semantic category, so that the trunk action of the target model may be specially defined, when the simulated digital human image is subsequently output, information to be output may be subjected to semantic classification first, and then the trunk action of the simulated digital human in the simulated digital human image is determined based on the result of the semantic classification, so that the trunk action of the simulated digital human in the simulated digital human image may be matched with the semantic of the output information, and the fidelity of the output simulated digital human is effectively improved.

Step S53: and acquiring description parameters, wherein the description parameters comprise relative position information of the target object relative to a reference position, and the target object comprises an object for calibrating the angle of the simulated digital person.

Step S54: and determining the presentation angle of the simulated digital person to be presented according to the relative position information.

Step S55: and acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model.

Step S56: and outputting the simulated digital human image.

In order to clearly understand the content fed back by the 3D digital person, corresponding audio can be configured for the video of the simulated digital person; as shown in fig. 14, the method for generating a simulated 3D digital person provided by this embodiment may further include the following steps S61 to S69. The method for generating a simulated 3D digital person provided in this embodiment may include the same or similar steps as those in the above embodiments, and for the execution of the same or similar steps, reference may be made to the foregoing description, and details are not repeated in this specification.

Step S61: and acquiring description parameters, wherein the description parameters comprise relative position information of the target object relative to a reference position, and the target object comprises an object for calibrating the angle of the simulated digital person.

Step S62: and determining the presentation angle of the simulated digital person to be presented according to the relative position information.

Step S63: and acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model.

Step S64: and outputting the simulated digital human image.

Step S65: acquiring a plurality of simulation digital human images;

step S66: and determining the time sequence information of the output of at least two simulated digital human images.

Step S67: and generating the simulation digital human video based on the plurality of simulation digital human images according to the time sequence information.

In this embodiment, the time sequence of the plurality of simulated digital human images may be obtained from the time sequence information, and the plurality of simulated digital human images may be sequentially ordered according to the time sequence, so as to synthesize the simulated digital human video.

Step S68: and configuring corresponding audio information for the simulated digital human video according to the time sequence information.

In this embodiment, the audio information may be a voice fed back to the user based on the input information of the user, or a voice configured for simulating a digital person by itself, where the source of the audio information is not particularly limited. In addition, the duration of the audio information may be the same as the duration of the simulated digital human video.

Step S69: and synchronously playing the video and audio information of the simulated digital person.

In the present embodiment, through the implementation of the above steps S61 to S69, the device for presenting the simulated digital human picture can be used to play the simulated digital human video and audio information.

In this embodiment, by the method for generating a simulated 3D digital person provided in this embodiment, description parameters may be obtained, a presentation angle of a simulated digital person is determined based on the description parameters, and a simulated digital person image corresponding to the presentation angle is obtained and output, so that when a model for training to obtain the simulated digital person has no time, a simulated digital person model obtained by training the model may generate simulated digital person images at various angles, and it is not necessary to specially invite the model to participate in image shooting at a specific time and place, thereby reducing the cost of generating images, and at the same time, camera parameters may be obtained in advance, and the camera parameters may be combined with a sample image, so that in the simulated digital person image output by the simulated digital person model, the presentation angle of the simulated digital person may better match with an expected angle, and a plurality of target images corresponding to the presentation angle may be obtained, and a plurality of target images are spliced without constructing a complete 3D digital person, only the images displayed by the simulation digital person at the presentation angle need to be spliced and output, the time of the trained simulation digital person model is reduced, and the method is particularly suitable for scenes such as broadcasting hosts, deceased persons and private customized customer service, and can also simulate the real-time environment in which the user and the simulation digital person are in face-to-face communication, thereby improving the vivid effect of the displayed simulation digital person images and optimizing the human-computer interaction experience.

Referring to fig. 15, a block diagram of a simulated 3D digital human generation apparatus provided in an embodiment of the present application is shown, which may include a description parameter obtaining module 41, a rendering angle obtaining module 42, a simulated digital human image obtaining module 43, and a first simulated digital human image output module 44. The description parameter obtaining module 41 is configured to obtain description parameters, where the description parameters may include relative position information of a target object with respect to a reference position, and the target object includes an object for calibrating an angle of the simulated digital person. And a presentation angle obtaining module 42, configured to determine a presentation angle of the simulated digital person to be presented according to the relative position information. And the simulated digital human image obtaining module 43 is configured to obtain a simulated digital human image corresponding to the presentation angle according to the presentation angle and a preset simulated digital human model. A first emulated digital person image output module 44 for outputting an emulated digital person image.

Further, as an embodiment of the present embodiment, the description parameter acquiring module 41 may include an image acquiring unit, a position acquiring unit, and a relative position information acquiring unit. The image acquisition unit is used for acquiring an image containing a target object and determining the spatial position information of the target object based on the image. The position acquisition unit is used for acquiring reference position information of a reference position, and the reference position is used for representing the position of a reference object simulating the digital person. The relative position information acquisition unit is used for determining the relative position information of the target object relative to the reference position according to the spatial position information and the reference position information.

Further, as an implementation manner of this embodiment, the relative position information acquiring unit may include a target coordinate parameter subunit, a reference coordinate parameter subunit, and a reference coordinate parameter subunit. The target coordinate parameter subunit is used for acquiring a target coordinate parameter of the target object according to the spatial position information. The reference coordinate parameter subunit is used for acquiring reference coordinate parameters of the reference object according to the reference position information. The reference coordinate parameter subunit is used for comparing the target coordinate parameter with the reference coordinate parameter, and determining the relative distance and the relative angle between the target object and the reference object so as to obtain the relative position information which can comprise the relative distance and the relative angle.

Further, as an implementation manner of the present embodiment, the emulated digital human image capturing module 43 may include a target image capturing unit and an emulated digital human image capturing unit. The target image acquisition unit is used for acquiring a plurality of target images corresponding to the presentation angles from a preset simulation digital human model according to the presentation angles. The simulation digital human image acquisition unit is used for combining the plurality of target images to acquire the simulation digital human image corresponding to the description parameters.

Further, as an implementation manner of this embodiment, the simulation 3D digital human generation apparatus may further include a sample description parameter obtaining unit and a simulation digital human model obtaining unit. The sample description parameter acquiring unit is used for acquiring a plurality of sample images and sample description parameters corresponding to each sample image. The simulation digital human model obtaining unit is used for constructing a simulation digital human model according to the sample image and the sample description parameters to obtain a preset simulation digital human model.

Further, as an implementation manner of this embodiment, the sample description parameters may include camera parameters; the simulation digital human model obtaining unit can comprise a sample image configuration parameter obtaining subunit, an image configuration parameter associating subunit and a first simulation digital human model obtaining subunit. The sample image configuration parameter acquiring subunit is used for acquiring sample image configuration parameters corresponding to the camera parameters. The image configuration parameter association subunit is used for acquiring the angle information of the target model according to the sample image and associating the angle information with the sample image configuration parameters. The first simulation digital human model obtaining subunit is used for constructing a simulation digital human model according to the sample image configuration parameters and the angle information to obtain a preset simulation digital human model.

Further, as an implementation manner of this embodiment, the sample description parameter may further include sample input information; the simulated digital human model obtaining unit can comprise a first sample semantic information obtaining subunit, a sample facial expression parameter association subunit and a second simulated digital human model obtaining subunit. The first sample semantic information acquiring subunit is used for acquiring sample semantic information corresponding to the sample input information. And the sample facial expression parameter association subunit is used for acquiring sample facial expression parameters of the target model according to the sample image and associating the sample semantic information with the sample facial expression parameters. The second simulation digital human model obtaining subunit is used for constructing a simulation digital human model according to the sample semantic information and the sample facial expression parameters to obtain a preset simulation digital human model.

Further, as an implementation manner of the present embodiment, the sample facial expression parameter association subunit may include a facial region acquisition component, a facial key point acquisition component, and a sample facial expression parameter acquisition component. Wherein the facial region acquisition component is used for acquiring the facial region of the target model in the sample image. The facial keypoint acquisition component is for acquiring facial keypoints in a facial region. The sample facial expression parameter acquisition component is used for processing the facial key points in the facial area and determining sample facial expression parameters of the target model in the sample image.

Further, as an implementation manner of this embodiment, the sample description parameter may further include sample input information; the simulated digital human model obtaining unit can comprise a sample pronunciation information obtaining subunit, a sample mouth shape parameter correlation subunit and a third simulated digital human model obtaining subunit. The sample pronunciation information acquisition subunit is used for acquiring sample pronunciation information corresponding to the sample input information. The sample mouth shape parameter association subunit is used for acquiring sample mouth shape parameters of the target model according to the sample image and associating the sample pronunciation information with the sample mouth shape parameters. And the third simulated digital human model obtaining subunit is used for constructing a simulated digital human model according to the sample pronunciation information and the sample mouth shape parameters to obtain a preset simulated digital human model.

Further, as an implementation manner of this embodiment, the sample description parameter may further include sample input information; the simulation digital human model obtaining unit can comprise a second sample semantic information obtaining subunit, a sample semantic category obtaining subunit, a sample trunk action parameter association subunit and a fourth simulation digital human model obtaining subunit. The second sample semantic information acquiring subunit is used for acquiring sample semantic information corresponding to the sample input information. The sample semantic category acquiring subunit is used for acquiring the sample semantic category of the sample semantic information. The sample trunk action parameter association subunit is used for acquiring the sample trunk action parameters of the target model according to the sample images and associating the sample semantic categories with the sample trunk action parameters. And the fourth simulation digital human model obtaining subunit is used for constructing a simulation digital human model according to the sample semantic category and the sample trunk action parameters to obtain a preset simulation digital human model.

Further, as an implementation manner of the present embodiment, the sample trunk action parameter association subunit may include an effective region acquisition component and a sample trunk action parameter association component. The effective area acquisition component is used for acquiring an effective area which can comprise the target object in each sample image. The sample trunk action parameter association component is used for performing semantic segmentation processing on the effective area, determining sample trunk action parameters of the target model in each sample image, and associating the sample semantic categories with the sample trunk action parameters.

Further, as an implementation manner of this embodiment, the simulation 3D digital human generating apparatus may further include a second simulation digital human image output module, a timing information determining module, a simulation digital human video generating module, an audio information configuring module, and a playing module. The second simulation digital human image output module is used for acquiring a plurality of simulation digital human images. The time sequence information determining module is used for determining the time sequence information of the output of at least two simulated digital human images. The simulation digital human video generation module is used for generating simulation digital human videos based on the plurality of simulation digital human images according to the time sequence information. And the audio information configuration module is used for configuring corresponding audio information for the simulated digital human video according to the time sequence information. The playing module is used for synchronously playing the video and audio information of the simulation digital person.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the modules/units/sub-units/components in the above-described apparatus may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the coupling or direct coupling or communication connection between the modules shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be in an electrical, mechanical or other form.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Referring to fig. 16, an electronic device provided in an embodiment of the present application is shown, which includes a processor 810, a communication module 820, a memory 830, and a bus. The bus may be an ISA bus, PCI bus, EISA bus, CAN bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. Wherein:

and a memory 830 for storing programs. In particular, the memory 830 may be used to store software programs as well as various data. The memory 830 may mainly include a program storage area and a data storage area, wherein the program storage area may store a program required to operate at least one function and may include a program code including computer operation instructions. In addition to storing programs, the memory 830 may temporarily store messages or the like that the communication module 820 needs to send. The memory 830 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory), such as at least one Solid State Disk (SSD).

The processor 810 is configured to execute programs stored in the memory 830. The program when executed by a processor implements the steps of the simulated 3D digital person generation method of the various embodiments described above.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the simulation 3D digital human generation method according to the above embodiments, and can achieve the same technical effect, and is not described herein again to avoid repetition. The computer-readable storage medium includes, for example, a Read-Only Memory (ROM), a Random Access Memory (RAM), an SSD, a charged Erasable Programmable Read-Only Memory (EEPROM), or a Flash Memory (Flash).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, SSD, Flash), and includes several instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods of the embodiments of the present application.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method of generating a simulated 3D digital person, comprising:

obtaining description parameters, wherein the description parameters comprise relative position information of a target object relative to a reference position, and the target object comprises an object for calibrating the angle of the simulated digital person;

determining the presentation angle of the simulated digital person to be presented according to the relative position information;

acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model; and

and outputting the simulated digital human image.

2. The method of claim 1, wherein obtaining the description parameters comprises:

acquiring an image containing a target object, and determining spatial position information of the target object based on the image;

acquiring reference position information of the reference position, wherein the reference position is used for representing the position of a reference object of the simulation digital person; and

and determining the relative position information of the target object relative to the reference position according to the spatial position information and the reference position information.

3. The method of claim 2, wherein determining the relative position information of the target object with respect to the reference position based on the spatial position information and the reference position information comprises:

acquiring a target coordinate parameter of the target object according to the spatial position information;

acquiring a reference coordinate parameter of the reference object according to the reference position information; and

comparing the target coordinate parameter with the reference coordinate parameter, and determining a relative distance and a relative angle between the target object and the reference object to obtain the relative position information including the relative distance and the relative angle.

4. The method according to claim 1, wherein the obtaining of the simulated digital human image corresponding to the presentation angle according to the presentation angle and a preset simulated digital human model comprises:

acquiring a plurality of target images corresponding to the presentation angle from the preset simulation digital human model according to the presentation angle; and

and combining the plurality of target images to obtain the simulated digital human image corresponding to the description parameters.

5. The method of claim 1, wherein prior to obtaining the description parameters, the method further comprises:

acquiring a plurality of sample images and sample description parameters corresponding to each sample image; and

and constructing the simulation digital human model according to the sample image and the sample description parameters to obtain the preset simulation digital human model.

6. The method of claim 5, wherein the sample description parameters comprise camera parameters; the step of constructing the simulation digital human model according to the sample image and the sample description parameters to obtain the preset simulation digital human model comprises the following steps:

acquiring sample image configuration parameters corresponding to the camera parameters;

acquiring angle information of the target model according to the sample image, and associating the angle information with the sample image configuration parameters; and

and constructing the simulation digital human model according to the sample image configuration parameters and the angle information to obtain the preset simulation digital human model.

7. The method of claim 5, wherein the sample description parameters further include sample input information; the step of constructing the simulation digital human model according to the sample image and the sample description parameters to obtain the preset simulation digital human model comprises the following steps:

acquiring sample semantic information corresponding to the sample input information;

acquiring sample facial expression parameters of the target model according to the sample image, and associating the sample semantic information with the sample facial expression parameters; and

and constructing the simulation digital human model according to the sample semantic information and the sample facial expression parameters to obtain the preset simulation digital human model.

8. The method of claim 7, wherein the step of obtaining sample facial expression parameters of the target model from the sample image and associating the sample semantic information with the sample facial specification parameters comprises:

acquiring a face region of the target model in the sample image;

acquiring a face key point in the face area; and

and processing the key points of the face in the face area, and determining sample facial expression parameters of the target model in the sample image.

9. The method of claim 5, wherein the sample description parameters further include sample input information; the step of constructing the simulation digital human model according to the sample image and the sample description parameters to obtain the preset simulation digital human model comprises the following steps:

acquiring sample pronunciation information corresponding to the sample input information;

acquiring a sample mouth shape parameter of the target model according to the sample image, and associating the sample pronunciation information with the sample mouth shape parameter; and

and constructing the simulated digital human model according to the sample pronunciation information and the sample mouth shape parameters to obtain the preset simulated digital human model.

10. The method of claim 5, wherein the sample description parameters further include sample input information; the step of constructing the simulation digital human model according to the sample image and the sample description parameters to obtain the preset simulation digital human model comprises the following steps:

obtaining a sample semantic category of the sample semantic information;

acquiring a sample trunk action parameter of the target model according to the sample image, and associating the sample semantic category with the sample trunk action parameter; and

and constructing the simulation digital human model according to the sample semantic category and the sample trunk action parameters to obtain the preset simulation digital human model.

11. The method of claim 10, wherein obtaining a sample torso motion parameter of a target model from the sample image and associating the sample semantic category with the sample torso motion parameter comprises:

obtaining an effective area including the target model in each sample image; and

performing semantic segmentation processing on the effective region, determining the sample trunk action parameter of the target model in each sample image, and associating the sample semantic category with the sample trunk action parameter.

12. The method of any one of claims 1 to 11, wherein after said outputting said simulated digital human image, said method further comprises:

acquiring a plurality of simulation digital human images;

determining the time sequence information of at least two simulated digital human images;

generating a simulation digital human video based on the plurality of simulation digital human images according to the time sequence information;

configuring corresponding audio information for the simulated digital human video according to the time sequence information; and

and synchronously playing the simulated digital human video and the audio information.

13. An apparatus for simulating the generation of a 3D digital person, comprising:

the description parameter acquisition module is used for acquiring description parameters, wherein the description parameters comprise relative position information of a target object relative to a reference position, and the target object comprises an object used for calibrating the angle of the simulated digital person;

the presentation angle acquisition module is used for determining the presentation angle of the simulation digital person to be presented according to the relative position information;

the simulation digital human image acquisition module is used for acquiring a simulation digital human image corresponding to the presentation angle according to the presentation angle and a preset simulation digital human model; and

and the simulation digital human image output module is used for outputting the simulation digital human image.

14. An electronic device, comprising:

one or more processors;

a memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the simulated 3D digital person generation method of any of claims 1-12.

15. A computer-readable storage medium having program code stored therein, the program code being callable by a processor to perform the method of simulating 3D digital human generation as claimed in any one of claims 1 to 12.