CN117409119A

CN117409119A - Image display method and device based on virtual image and electronic equipment

Info

Publication number: CN117409119A
Application number: CN202210798176.0A
Authority: CN
Inventors: 赵瑞超; 陈莉莉; 简伟华
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-07-06
Filing date: 2022-07-06
Publication date: 2024-01-16

Abstract

The disclosure relates to a method and a device for displaying a picture based on an avatar and electronic equipment, belonging to the technical field of image processing, wherein the method comprises the following steps: the method comprises the steps of shooting a target object through first shooting equipment of a terminal to obtain a first picture, shooting a target scene through second shooting equipment of the terminal to obtain a second picture, driving a target avatar model to make corresponding expression and action according to the expression and action of the target object in the first picture based on the first picture, rendering the target avatar model to display a target avatar synchronous with the expression and action of the target object in the second picture, and therefore real-time synchronization between the avatar and the expression and action of a real person is achieved, real-time fusion between the avatar and the real scene is achieved, a real-time interactive display effect of the avatar and the real scene is presented, and user experience is improved.

Description

Image display method and device based on virtual image and electronic equipment

Technical Field

The disclosure relates to the technical field of image processing, and in particular relates to a picture display method and device based on an avatar, and electronic equipment.

Background

With the development of internet technology and the wide spread of mobile terminals, the generation of an avatar based on human motion and facial expression rendering has received increasing attention, and is widely used in various scenes.

For example, by capturing the actions and facial expressions of a real person, the avatar model is driven to make the same actions and facial expressions, and the avatar model is rendered to obtain a corresponding avatar, so that a user can record or live broadcast video based on the picture where the avatar is located, and the personalized requirements of the user are met.

However, most of the avatars generated based on motion capture and face capture rendering exist in the virtual space at present, that is, the background in the screen on which the avatars are positioned is also usually virtual, resulting in poor display effect of the avatars and affecting the user experience.

Disclosure of Invention

The present disclosure provides a method and an apparatus for displaying a picture based on an avatar, and an electronic device, which can present a display effect of real-time interaction between the avatar and a real scene, thereby improving user experience. The technical scheme of the present disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided an avatar-based screen display method performed by a terminal configured with a first photographing apparatus and a second photographing apparatus, the method including:

Acquiring expression information and action information of a target object based on a first picture of the target object obtained by shooting by the first shooting equipment;

based on the expression information and the action information of the target object, driving the target virtual image model to make corresponding expression and action;

rendering the target avatar model to display the target avatar of the target object in a second picture of the target scene shot by the second shooting device.

According to the method, a first shooting device of the terminal shoots a target object to obtain a first picture, a second shooting device of the terminal shoots a target scene to obtain a second picture, based on the first picture, a target avatar model is driven to make corresponding expression and action according to the expression and action of the target object in the first picture, the target avatar model is rendered, and a target avatar synchronous with the expression and action of the target object is displayed in the second picture, so that real-time synchronization between the avatar and the expression and action of a real person is realized, real-time fusion between the avatar and the real scene is realized, a real-time interactive display effect of the avatar and the real scene is presented, and user experience is improved. In addition, in the process, the real-time interaction between the virtual image and the real scene is realized by utilizing a plurality of shooting devices of the terminal, and the expression information and the motion information of the target object are not required to be acquired through expensive motion capturing devices and expression capturing devices, so that the cost is greatly saved.

In some embodiments, rendering the target avatar model to display a target avatar of the target object in a second picture of the target scene photographed by the second photographing apparatus, including:

rendering the target avatar model based on a position of the target object in the second screen to display the target avatar in the second screen, the target avatar not occluding the target object in the second screen.

By the method, the target virtual image does not shade the target object on the basis of displaying the target virtual image in the second picture, so that the real-time interaction display effect of the virtual image and the real scene is displayed, the display effect of the target object is ensured, and the user experience is further improved.

In some embodiments, the method further comprises:

identifying at least one item in the second frame;

the target item is determined from the at least one item based on the item indicated by the item list, the item list being associated with the target scene, the degree of match between the target item and the item indicated by the item list meeting a target condition.

By the method, the terminal can automatically identify the target object in the second picture based on the object list associated with the target scene, and a basis is provided for the follow-up rendering of the avatar model.

In some embodiments, the method further comprises:

acquiring scene information of the target scene based on the second picture;

based on the scene information, the target avatar model corresponding to the scene information is determined from at least one avatar model.

By the method, the terminal can automatically determine the target avatar model according to the real scene, so that the target avatar rendered based on the target avatar model can be matched with the real scene, and the display effect of the avatar is improved.

In some embodiments, the determining the target avatar corresponding to the scene information from at least one avatar model based on the scene information includes at least one of:

determining the target avatar model from the at least one avatar model based on the item type of the target item indicated by the scene information;

determining the target avatar model from the at least one avatar model based on the weather type indicated by the scene information;

The target avatar model is determined from the at least one avatar model based on the sight type indicated by the scene information.

In some embodiments, the method further comprises:

displaying an avatar list indicating a plurality of avatar models;

the target avatar model is determined from the plurality of avatar models in response to an avatar model selection operation for the avatar list.

In this way, the target object is able to select a target avatar model from the avatar list displayed by the terminal to meet the personalized demand.

In some embodiments, the determining the target avatar model from the plurality of avatar models in response to an avatar model selection operation for the avatar list includes:

and in response to identifying a target action made by the target object from the first screen, displaying a next avatar model of the first avatar model in the avatar list as a selected state, the first avatar model being any one of the plurality of avatar models, in a case that the first avatar model in the avatar list is in the selected state.

In this way, the target object can perform the avatar model selection operation to select the avatar model to be driven by making the target action based on the avatar list displayed by the terminal, without the need of the target object to operate the terminal, thereby improving the human-computer interaction efficiency while satisfying the personalized demand.

In some embodiments, the method further comprises:

carrying out matting on the area except the target object in the first picture to obtain a first background picture;

based on the region of the green curtain in the second picture, carrying out matting on the region of the target object in the second picture to obtain a target object picture corresponding to the target object;

rendering the target avatar model based on the first background picture and the target item picture to display the target avatar and the target item in the first background picture.

By the method, the object is clearly displayed on the basis of displaying the object virtual image in the background where the object is located, the real-time interaction display effect of the virtual image and the real scene is displayed, the display effect of the object is ensured, and the user experience is improved.

In some embodiments, the method further comprises:

carrying out matting on the area except the target object and the first object in the first picture to obtain a second background picture under the condition that the first picture comprises the first object;

carrying out matting on the area where the first object is located in the first picture to obtain a first object picture corresponding to the first object;

rendering the target avatar model based on the second background screen and the first item screen to display the target avatar and the first item in the second background screen.

By the method, the fitting degree between the target virtual image and the first object is ensured on the basis of displaying the target virtual image in the background where the target object is located, the real-time interactive display effect of the virtual image and the real scene is displayed, the display effect of the first object is ensured, and the user experience is improved.

In some embodiments, the method further comprises:

acquiring contour information of the target object based on the first picture, wherein the contour information indicates the head-to-body ratio of the target object;

the rendering of the target avatar model based on the second background screen and the first item screen to display the target avatar and the first item in the second background screen, comprising:

Rendering the target avatar model based on the second background picture, the first object picture and the contour information to display the target avatar and the first object in the second background picture, wherein the head-body ratio of the target avatar is the same as the head-body ratio of the target object.

In this way, the terminal renders the target avatar model based on the head-to-body ratio of the target object to obtain the target avatar identical to the head-to-body ratio of the target object, thereby further ensuring the display effect of the first article.

In some embodiments, the method further comprises at least one of:

in the case that a first expression is recognized based on the expression information of the target object, displaying first media information corresponding to the first expression in the second screen;

when a first motion is recognized based on the motion information of the target object, second media information corresponding to the first motion is displayed on the second screen.

By the mode, the terminal displays corresponding media information in the second picture based on the expression information and the action information of the target object, so that the picture display interestingness is improved.

According to a second aspect of embodiments of the present disclosure, there is provided an avatar-based screen display apparatus applied to a terminal configured with a first photographing device and a second photographing device, the apparatus comprising:

An acquisition unit configured to perform acquisition of expression information and motion information of a target object based on a first screen of the target object captured by the first capturing device;

a driving unit configured to perform driving of the target avatar model to make a corresponding expression and action based on the expression information and the action information of the target object;

and a display unit configured to perform rendering of the target avatar model to display a target avatar of the target object in a second picture of the target scene photographed by the second photographing apparatus.

In some embodiments, the display unit is configured to perform:

In some embodiments, the apparatus further comprises a target item determination unit configured to perform:

identifying at least one item in the second frame;

In some embodiments, the apparatus further comprises an adjustment unit configured to perform at least one of:

adjusting a display position of the target avatar in the second screen in response to a moving operation of the target avatar;

in response to a zoom operation on the target avatar, a display size of the target avatar in the second screen is adjusted.

In some embodiments, the apparatus further comprises a model determination unit configured to perform:

acquiring scene information of the target scene based on the second picture;

In some embodiments, the model determination unit is configured to perform at least one of:

In some embodiments, the model determination unit is further configured to perform:

displaying an avatar list indicating a plurality of avatar models;

In some embodiments, the model determination unit is configured to perform:

In some embodiments, the apparatus further comprises:

the first matting unit is configured to perform matting on the area except the target object in the first picture to obtain a first background picture;

the second image matting unit is configured to perform image matting on the area where the target object is located in the second image based on the area where the green screen is located in the second image, so as to obtain a target object image corresponding to the target object;

The display unit is further configured to perform rendering of the target avatar model based on the first background screen and the target item screen to display the target avatar and the target item in the first background screen.

In some embodiments, the apparatus further comprises:

a third matting unit configured to perform matting of an area of the first picture except for the target object and the first object, to obtain a second background picture, in a case where the first picture includes the first object;

a fourth matting unit configured to perform matting on an area where the first article is located in the first picture, so as to obtain a first article picture corresponding to the first article;

the display unit is further configured to perform rendering of the target avatar model based on the second background screen and the first item screen to display the target avatar and the first item in the second background screen.

In some embodiments, the apparatus further comprises:

a contour information acquisition unit configured to perform acquisition of contour information of the target object based on the first screen, the contour information indicating a head-to-body ratio of the target object;

The display unit is configured to perform rendering of the target avatar model based on the second background screen, the first item screen, and the profile information to display the target avatar and the first item in the second background screen, the head-to-body ratio of the target avatar being the same as the head-to-body ratio of the target object.

In some embodiments, the display unit is further configured to perform at least one of:

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device comprising:

one or more processors;

a memory for storing the processor-executable program code;

wherein the processor is configured to execute the program code to implement the avatar-based picture display method described above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium comprising: the program code in the computer readable storage medium, when executed by a processor of the electronic device, enables the electronic device to perform the avatar-based picture display method described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the above-described avatar-based picture display method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a schematic view of an implementation environment of an avatar-based picture display method according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a picture display method based on an avatar provided in an embodiment of the present disclosure;

fig. 3 is a schematic view of a picture display method based on an avatar according to an embodiment of the present disclosure;

FIG. 4 is a schematic view showing a target avatar provided by an embodiment of the present disclosure;

fig. 5 is a flowchart of another avatar-based picture display method provided in an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a picture display method based on an avatar according to an embodiment of the present disclosure;

Fig. 7 is a schematic view showing a target avatar provided by an embodiment of the present disclosure;

fig. 8 is a flowchart of another avatar-based picture display method provided by an embodiment of the present disclosure;

fig. 9 is a schematic view of a picture display method based on an avatar provided in an embodiment of the present disclosure;

fig. 10 is a schematic view showing a target avatar provided in an embodiment of the present disclosure;

fig. 11 is a flowchart of another avatar-based picture display method provided by an embodiment of the present disclosure;

fig. 12 is a schematic view of a picture display method based on an avatar provided in an embodiment of the present disclosure;

fig. 13 is a schematic view showing a target avatar provided by an embodiment of the present disclosure;

fig. 14 is a block diagram of a picture display device based on an avatar provided in an embodiment of the present disclosure;

fig. 15 is a block diagram of a terminal provided in an embodiment of the present disclosure.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, expression information, motion information, and the like of the target object involved in the embodiments of the present disclosure are acquired with sufficient authorization. In some embodiments, the present disclosure provides a permission query page for querying whether to grant the right to acquire the above information, in which a grant authorization control and a deny authorization control are displayed, and in case a trigger operation of the grant authorization control is detected, the above information is acquired using the avatar-based screen display method provided by the present disclosure.

Fig. 1 is a schematic view of an implementation environment of an avatar-based screen display method according to an embodiment of the present disclosure, and referring to fig. 1, the implementation environment includes: a terminal 101 and a server 102. The terminal 101 and the server 102 are directly or indirectly connected through wired or wireless communication, which is not limited by the embodiment of the present disclosure.

The terminal 101 is at least one of a smart phone, a smart watch, a desktop computer, a laptop computer, a virtual reality terminal, an augmented reality terminal, a wireless terminal, and a laptop portable computer. In the embodiment of the present disclosure, the terminal 101 is configured with a first photographing device and a second photographing device, where the first photographing device is configured to photograph a target object, so as to obtain a first picture corresponding to the target object. For example, the first photographing apparatus is a front photographing apparatus of the terminal 101, with which the target object can perform self-photographing. The second shooting device is used for shooting the target scene to obtain a second picture corresponding to the target scene. For example, the second photographing apparatus is a post-photographing apparatus of the terminal 101, and the target scene refers to a real scene. In some embodiments, the first photographing device and the second photographing device can be activated simultaneously to achieve synchronous photographing of the target object and the target scene.

Illustratively, the terminal 101 is installed and operated with an application for providing an avatar-based screen display function, such as a photographing-type application, a live-broadcast-type application, a game-type application, and the like, without limitation. Taking a shooting application program as an example, a target object can call a first shooting device and a second shooting device of the terminal 101 through the shooting application program, the first shooting device shoots the target object to obtain a first picture, the second shooting device shoots a real scene to obtain a second picture, so that the terminal drives an avatar model to make a corresponding expression and action based on expression information and action information of the target object in the first picture, renders the avatar model to display the avatar of the target object in the second picture, realizes driving the avatar in real time based on the expression and action of the target object, fuses the avatar with the real scene, presents a display effect of real-time interaction of the avatar and the real scene, and facilitates video recording of the target object based on the picture displayed by the terminal. This process will be described in detail in the following method embodiments, and will not be described in detail here.

In addition, terminal 101 may be referred to generally as one of a plurality of terminals, with embodiments of the disclosure being illustrated only by terminal 101. Those skilled in the art will recognize that the number of terminals may be greater or lesser.

The server 102 may be an independent physical server, a server cluster or a distributed file system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligence platform. Illustratively, the server 102 is configured to provide a background service for an application program executed by the terminal 101, for example, taking the terminal 101 running a shooting application program as an example, the target object can record a video based on a picture in which an avatar displayed by the terminal 101 and a real scene are fused, and the recorded video is distributed by the server 102. Of course, the server 102 can also include other functional servers to provide more comprehensive and diverse services. It should be appreciated that the number of servers 102 described above may be greater or lesser, and embodiments of the present disclosure are not limited in this regard.

Fig. 2 is a flowchart of a picture display method based on an avatar according to an embodiment of the present disclosure. As shown in fig. 2, the method is performed by a terminal configured with a first photographing apparatus and a second photographing apparatus. Illustratively, the method includes steps 201 through 204 described below.

In step 201, the terminal photographs a target object through a first photographing device to obtain a first picture of the target object, photographs a target scene through a second photographing device to obtain a second picture of the target scene.

In the embodiment of the disclosure, the terminal is a terminal used by a target object, and a target application program is run on the terminal and is used for providing a picture display function based on the avatar. The target scene refers to a real scene. The terminal is used for displaying an application interface of the target application program, the application interface comprises a target control, the target control indicates that picture display is performed based on the virtual image, the terminal responds to triggering operation of the target control of a target object, the target object is shot through a first shooting device to obtain a first picture, and a target scene is shot through a second shooting device to obtain a second picture.

For example, taking a target application program as an example of a shooting application program, when a target object visits a scenic spot, the target object triggers an image display function based on an avatar provided by the target application program, the terminal shoots the target object through a first shooting device to obtain a self-shooting image of the target object, and shoots the scenic spot through a second shooting device to obtain a scenic spot image.

For another example, taking a target application program as a live broadcast application program as an example, when a target object introduces an object through a live broadcast room, the target object triggers an image display function based on an avatar provided by the target application program, and the terminal shoots the target object through a first shooting device to obtain a self-shooting image of the target object, and shoots an environment where the object is located through a second shooting device to obtain an object image.

In some embodiments, the terminal displays the first screen in the first area and displays the second screen in the second area, so that the target object can view the photographed screen in real time, and the target object can conveniently adjust the photographing angle, the photographing focal length and the like in time, which is not limited. For example, the application interface of the target application program is divided into an upper area and a lower area, the upper area is used for displaying a first screen, and the lower area is used for displaying a second screen.

In step 202, the terminal obtains expression information and motion information of the target object based on the first picture of the target object captured by the first capturing device.

In the embodiment of the disclosure, the terminal identifies the expression and the action of the target object based on the region where the target object is located in the first picture, and obtains the expression information and the action information of the target object. Schematically, the terminal obtains position information of a plurality of feature points (such as feature points corresponding to each five sense organs of a face, feature points corresponding to each joint of limbs, and the like) of a target object in a first picture based on a region where the target object is located in the first picture, and identifies expression and action of the target object to obtain corresponding expression information and action information. For example, the expression information indicates a pitch between facial feature points of the target object, an expression type of the target object, and the like; the motion information indicates a distance between joint feature points of the target object, a type of motion of the target object, and the like. It should be noted that, the embodiment of the present disclosure is not limited to a specific algorithm for acquiring the expression information and the action information of the target object by the terminal. In some embodiments, the target object is located in front of the green screen, so that the first picture obtained by shooting the target object by the terminal includes a green screen area and an area where the target object is located. By the method, the accuracy of the action information and the expression information can be improved, and therefore the display effect of the follow-up virtual image is improved.

In some embodiments, the terminal identifies the expression and the action of the target object in each video frame based on the video frame corresponding to the first picture, so as to obtain corresponding expression information and action information. For example, the terminal shoots a target object through the first shooting device, acquires a corresponding video frame (such as 30 frames of 1 second, which is not limited in this way) to obtain a first picture, and in this process, the terminal recognizes the expression and the action of the target object in the video frame every time the terminal acquires a video frame. By means of the method, the terminal recognizes the expression and the action of the target object frame by frame, fine-grained expression information and action information can be provided for the follow-up driving virtual image model, so that the virtual image rendered based on the virtual image model can achieve fine-grained synchronization on the expression and the action with the target object, and the display effect of the virtual image is improved. Of course, in other embodiments, the terminal recognizes the expression and the motion of the target object in the corresponding video frames every interval of the target number of video frames, so as to save the computing resources on the basis of providing the expression information and the motion information for the subsequently driven avatar model, for example, the target number is 2 frames, which is not limited.

In step 203, the terminal drives the target avatar model to make corresponding expression and action based on the expression information and action information of the target object.

In the embodiment of the present disclosure, the target avatar model is a two-dimensional model or a three-dimensional model of the avatar, which is not limited. Illustratively, the target avatar model may be constructed based on an original character, may be constructed based on an authorized animation character, a cartoon character, a movie character, a game character, or the like, or may be constructed based on a target object (which may also be understood as a super-realistic avatar of the target object), which is not limited by the embodiments of the present disclosure. In some embodiments, the terminal configures corresponding expression parameters and action parameters for the head model and the body model in the target avatar model based on the expression information and the action information of the target object, so that the target avatar model makes corresponding expressions (such as smiling, laughing, crying, tongue spitting, etc.) and actions (such as nodding, lifting hands, lifting legs, dancing, etc.).

In some embodiments, the terminal provides a selection function for the avatar model, and the target object is able to select the target avatar model from a plurality of avatar models provided by the terminal to meet the personalized demand. The terminal displays an avatar selection control, displays an avatar list indicating a plurality of avatar models for selection by a target object in response to a trigger operation of the avatar selection control, determines the target avatar model from the plurality of avatar models in response to an avatar model selection operation for the avatar list, and drives the target avatar model to make a corresponding expression and action based on expression information and action information of the target object.

Illustratively, the above-described process of determining the target avatar model based on the avatar model selection operation is described below in several alternative implementations.

(1) The avatar model selection operation is implemented based on a click operation of the target object on the avatar model. Illustratively, the terminal determines the avatar model as the target avatar model in response to a selected operation of any one of the avatar models in the avatar list. For example, the terminal displays thumbnails of a plurality of avatar models in the avatar list, and determines the avatar model as a target avatar model in response to a click operation on a thumbnail of any one of the avatar models. It should be understood that the example is only illustrative, and the terminal may also display relevant information such as profiles of a plurality of avatar models in the avatar list, so that the target object can know the specific situation of the avatar model in detail, thereby improving the man-machine interaction efficiency.

(2) The avatar model selection operation is based on a target action performed by the target object. Illustratively, the terminal displays a next avatar model of a first avatar model in the avatar list as the selected state in response to a target action made by identifying a target object from the first screen in a case that the first avatar model is in the selected state in the avatar list. The target motion is a preset motion, for example, the target motion is a nodding, a head is turned from left to right, a head is turned from right to left, an arm is swung up and down or left and right, a leg is lifted left or right, a left hand loop or a right hand loop, or the like, which is not limited thereto. The first avatar model is any one of the plurality of avatar models. For example, the first avatar model is the first-ranked avatar model in the avatar list, that is, the terminal defaults to display the first-ranked avatar model as a selected state when displaying the avatar list. For another example, the first avatar model is an avatar model that is currently synchronized with the expression and motion of the target object in real time, and this is not limited. It should be noted that, in the case that the next avatar model is displayed as the selected state, the terminal may directly determine the next avatar model as the target avatar model, so that the target object may quickly switch the avatar models by making the target action, thereby improving the man-machine interaction efficiency. Of course, the terminal may determine whether to determine the next avatar model as the target avatar model according to the next action of the target object, which is not limited.

Illustratively, taking an example of determining whether to determine the next avatar model as the target avatar model based on a next action of the target object by the terminal, in some embodiments, determining the next avatar model as the target avatar model if the terminal does not recognize the target action again within the target duration, and displaying one avatar model after the next avatar model as the selected state if the terminal recognizes the target action again within the target duration, so as to be further omitted herein. Wherein the target duration is a preset duration (e.g., 3 seconds). For another example, the terminal displays a hint message indicating that the next avatar model is determined as the target avatar model, and determines the next avatar model as the target avatar model in response to a confirmation operation of the hint message. For example, the confirmation operation is not limited to the movement of the target subject such as nodding a head or waving an arm downward.

In this way, the target object can execute the avatar model selection operation by making a target action based on the avatar list displayed by the terminal, and select the avatar model to be driven without the need of the target object to operate the terminal, thereby improving the man-machine interaction efficiency while meeting the personalized demand.

In some embodiments, the terminal determines the target avatar model from the at least one avatar model based on the second picture. Schematically, the terminal acquires scene information of the target scene based on the second picture; based on the scene information, the target avatar model corresponding to the scene information is determined from at least one avatar model. That is, the terminal can automatically determine the target avatar model according to the real scene, so that the target avatar finally rendered based on the target avatar model can be matched with the real scene, thereby improving the display effect of the avatar. For example, the scene information indicates a weather type, a building type, a sight spot type, a vegetation type, and an item type, etc., to which embodiments of the present disclosure are not limited.

Illustratively, the manner in which the terminal determines the target avatar based on the scene information is illustrated below.

(1) The terminal determines a target avatar model from at least one avatar model based on an item type of the target item indicated by the scene information. For example, taking the article type of the target article as an example of kitchen ware, the target avatar model is constructed based on chef-like roles; taking the article type of the target article as an example of a book, the target avatar model is constructed based on a teacher-like character, and the like, which is not limited.

(2) The terminal determines the target avatar model from at least one avatar model based on the weather type indicated by the scene information. For example, taking weather type as an example in rainy days, the target avatar model is constructed based on the character wearing the raincoat; taking weather type as a sunny day as an example, the target avatar model is constructed based on a character wearing a sunglasses, etc., which is not limited.

(3) The terminal determines the target avatar model from the at least one avatar model based on the type of attraction indicated by the scene information. For example, taking a scenic spot type as an ancient architecture, the target avatar model is constructed based on historical characters; taking the scenic spot type as an art exhibition hall as an example, the target virtual image model is constructed based on artistic characters, etc., which is not limited thereto.

It should be understood that the above examples are only illustrative, and in some embodiments, in the case where the scene information indicates a plurality of contents, the terminal determines the target avatar model from the at least one avatar model based on the plurality of contents. For example, the scene information indicates a historic building in a rainy day, and the target avatar model is constructed based on a history of people wearing a raincoat.

In addition, in some embodiments, the terminal determines at least one first avatar model corresponding to the scene information from among the at least one avatar models based on the scene information, displays the at least one first avatar model for the target object to select, and meets the personalized demand.

In step 204, the terminal renders the target avatar model to display the target avatar of the target object in the second picture of the target scene photographed by the second photographing apparatus.

In the embodiment of the present disclosure, the expression of the target avatar is synchronized with the expression of the target object, and the motion of the target avatar is synchronized with the motion of the target object. The terminal renders the target virtual image with the corresponding expression and action to obtain a target image layer, and the target image layer is overlapped on the image layer corresponding to the second picture, so that real-time synchronization between the virtual image and the expression and action of the real person is realized, real-time fusion between the virtual image and the real scene is realized, a real-time interactive display effect of the virtual image and the real scene is presented, and user experience is improved.

In some embodiments, the terminal renders the target avatar model based on a default size, and displays the target avatar at a default position in the second screen. For example, the default position is the lower left corner of the second screen, which is not limited. In other embodiments, the terminal provides an adjustment function for the target avatar, and the target object can adjust the display position and display size of the target avatar in the second screen according to the need. Illustratively, taking an example of adjusting the display position, the terminal adjusts the display position of the target avatar in the second screen in response to a moving operation (e.g., a drag operation) of the target avatar. Taking the example of adjusting the display size, the terminal adjusts the display size of the target avatar in the second screen in response to a zoom operation (e.g., a two-finger sliding operation) on the target avatar.

In some embodiments, the terminal can also display corresponding media information in the second screen based on the expression information and the action information of the target object, so as to improve the display effect of the avatar and increase the interestingness of the screen display.

Schematically, the terminal displays first media information corresponding to a first expression in the second screen in a case where the first expression is recognized based on the expression information of the target object. The first media information may be a still picture, a moving picture, a text, a pendant, a dynamic special effect, or the like, which is not limited. For example, if a smiling expression is recognized based on the expression information of the target subject, the terminal displays a smiling expression pack in the vicinity of the target avatar in the second screen.

Schematically, when a first motion is recognized based on motion information of the target object, the terminal displays second media information corresponding to the first motion on the second screen. The second media information may be a still picture, a moving picture, a text, a pendant, a dynamic special effect, or the like, which is not limited. For example, if the dance motion is recognized based on the motion information of the target object, the terminal displays a dynamic special effect of the stage light class in the vicinity of the target avatar in the second screen.

It should be noted that in the above steps 201 to 204, the terminal obtains the expression information and the action information of the target object while shooting the target object by the first shooting device and shooting the target scene by the second shooting device, and drives the target avatar model to make a corresponding expression and action, so as to render the target avatar model, so as to display the target avatar in the second picture, and realize real-time synchronization of the target object and the target avatar, or the display effect of real-time fusion of the target avatar and the real scene. For example, under the condition that the terminal identifies the expression and the action of the target object in each video frame based on the video frame corresponding to the first picture, the terminal identifies the expression and the action of the target object in each video frame every time the terminal acquires the video frame corresponding to the target object, corresponding expression information and action information are obtained, the target avatar model is driven to make the corresponding expression and action, and the target avatar model is rendered so as to display the target avatar in the second picture.

In the related art, for example, in a scheme of driving an avatar model, expression information and motion information of a target object are often required to be acquired through expensive motion capturing equipment and expression capturing equipment, the target object is often positioned in front of a green screen, the target object cannot interact with a real scene, corresponding actions and expressions can only be made through imagination or video watching and other modes, and an avatar background is added for rendering the generated avatar through later image processing, so that an effect of real-time interaction between the avatar and the real scene cannot be presented. By adopting the method provided by the embodiment of the disclosure, the real-time fusion of the virtual image and the real scene can be realized through two shooting devices of one device, so that the cost can be greatly saved, the effect of real-time interaction of the virtual image and the real scene can be presented, and the requirements of target objects in various scenes can be met.

For example, in a scene where a target object is in front of a green screen, the target object is shot through a front shooting device to obtain expression information and motion information of the target object with higher accuracy, a real scene is shot through a rear shooting device, and an avatar of the target object is displayed in a picture corresponding to the real scene, so that the cost for displaying the avatar is greatly saved, and the accuracy of motion capture and expression capture is ensured. For example, in a scene of shooting a landscape, the front shooting device is more suitable for shooting a target object, and the definition of a picture shot by the rear shooting device is higher and is more suitable for shooting the landscape, so that by adopting the method, the virtual image of the target object is displayed in the landscape picture shot by the rear shooting device, the landscape picture can be clearly displayed, and the real-time interaction effect of the virtual image and the landscape can be presented, so that the functions of different shooting devices are fully utilized, and the cost for displaying the virtual image is saved. For another example, in the scene of directly broadcasting the introduction of the object, for some objects needing to show details, as the definition of the picture shot by the rear-mounted shooting equipment is often higher, the virtual image of the target object is displayed in the object picture shot by the rear-mounted shooting equipment, so that the cost for displaying the virtual image is saved, the details of the object can be clearly displayed, the real-time interaction effect of the virtual image and the object is presented, and the interest of directly broadcasting is increased.

In addition, in some embodiments, under the condition that the terminal synchronously displays the first picture and the second picture based on different areas, the terminal continuously displays the first picture in the first area, and displays the second picture containing the target virtual image in the second area, so that the target object can view the photographed picture and the fused picture in real time, and the target object can timely adjust own expression, action and the like according to requirements, thereby improving the man-machine interaction efficiency. Of course, in other embodiments, the terminal may display the first screen, the second screen, and the second screen including the target avatar simultaneously in a plurality of areas, or the terminal may display only the second screen including the target avatar according to the requirement of the target object, etc., which is not limited in the embodiments of the present disclosure.

The above steps 201 to 204 are schematically described below with reference to fig. 3. Referring to fig. 3 schematically, fig. 3 is a schematic view of an avatar-based picture display method provided in an embodiment of the present disclosure. As shown in fig. 3, the first photographing device photographs the target object to obtain a first picture, and the terminal obtains expression information and action information of the target object based on the first picture and drives the target avatar model to make corresponding expression and action; simultaneously, the second shooting equipment shoots the target scene to obtain a second picture; the terminal renders the target virtual image model with the expression and the action in real time so as to display the target virtual image in the second picture, and display the real-time interaction effect of the virtual image and the real scene.

Referring to fig. 4 schematically, fig. 4 is a schematic view showing a target avatar provided by an embodiment of the present disclosure. As shown in fig. 4, the terminal photographs the target object through the first photographing device to obtain a first screen 401, photographs the target scene (e.g., natural wind and light such as a mountain river) through the second photographing device to obtain a second screen 402, fuses the target avatar with the second screen 402 based on the steps 202 to 204, and displays a second screen 403 including the target avatar, wherein the expression and the motion of the target avatar are consistent with those of the target object in the second screen 403.

In summary, in the image display method based on the avatar provided in the embodiments of the present disclosure, a first shooting device of a terminal shoots a target object to obtain a first image, a second shooting device of the terminal shoots a target scene to obtain a second image, based on the first image, according to the expression and the action of the target object in the first image, a target avatar model is driven to make a corresponding expression and action, and the target avatar model is rendered to display a target avatar synchronized with the expression and the action of the target object in the second image, thereby realizing real-time synchronization between the avatar and the expression and the action of a real person, realizing real-time fusion between the avatar and the real scene, presenting a display effect of real-time interaction between the avatar and the real scene, and improving the user experience. In addition, in the process, the real-time interaction between the virtual image and the real scene is realized by utilizing a plurality of shooting devices of the terminal, and the expression information and the motion information of the target object are not required to be acquired through expensive motion capturing devices and expression capturing devices, so that the cost is greatly saved.

Based on the embodiment shown in fig. 2, the basic flow of the avatar-based screen display method provided by the embodiment of the present disclosure is introduced, and by taking the target application running on the terminal as a live broadcast application as an example in the embodiment shown in fig. 5 to 13, several other avatar-based screen display methods provided by the embodiment of the present disclosure are introduced.

Fig. 5 is a flowchart of another avatar-based picture display method provided in an embodiment of the present disclosure. As shown in fig. 5, the method is performed by a terminal configured with a first photographing apparatus and a second photographing apparatus. Illustratively, the method includes steps 501 through 506 described below.

In step 501, the terminal photographs a target object through a first photographing device to obtain a first picture of the target object, photographs a target scene through a second photographing device to obtain a second picture of the target scene.

In an embodiment of the present disclosure, the target scene includes at least one item. Schematically, the target application running on the terminal is a live broadcast application, the target object is a main broadcast object, and the target scene is a live broadcast room where the main broadcast object is located. It should be understood that the "live room" referred to in this disclosure is not limited to being indoors, and in some scenarios, a host may also live outdoors, as is not limited. In addition, the process how the terminal obtains the first frame and the second frame through the first photographing device and the second photographing device is the same as the step 201 in the embodiment shown in fig. 2, and is not described herein again.

In step 502, the terminal obtains expression information and motion information of the target object based on a first picture of the target object obtained by the first photographing device.

In the embodiment of the present disclosure, step 502 is the same as step 202 in the embodiment shown in fig. 2, and is not described herein.

In step 503, the terminal drives the target avatar model to make a corresponding expression and action based on the expression information and action information of the target object.

In the embodiment of the present disclosure, step 503 is the same as step 203 in the embodiment shown in fig. 2, and is not described herein.

In step 504, the terminal identifies at least one item in the second screen.

In the embodiment of the disclosure, the target scene includes at least one item, and the terminal identifies the item in each video frame based on the video frame corresponding to the second picture, to obtain item information of the at least one item. In other embodiments, the terminal identifies the object in the corresponding video frame every interval of the target number of video frames, and obtains corresponding object information, which is not limited.

In some embodiments, the item information indicates an item name of the item. Schematically, the terminal identifies an item name of at least one item based on an area in which the at least one item is located in the second screen. In other embodiments, the item information also indicates an item picture of the item. Schematically, the terminal intercepts an item picture of at least one item based on an area where the at least one item is located in the second picture.

In step 505, the terminal determines a target item from the at least one item based on the items indicated by the item list.

In an embodiment of the disclosure, the item list is associated with a target scene, and the degree of matching between the target item and the item indicated by the item list meets a target condition. Illustratively, the terminal obtains an item list associated with the target scene, the item list indicating at least one live item published within the live room, for any one of the live items, the terminal matches the live item with the at least one item, and determines the target item from among the at least one item based on a degree of matching between the live item and the at least one item. In some embodiments, the terminal matches the at least one item with at least one live item in the item list based on item information of the at least one item identified from the second screen, and determines an item, of the at least one item, matching a target condition, as the target item, wherein the target condition is that the matching between the target item and any live item is greater than a preset threshold. It should be understood that the number of target items may be one or more, and is not limited thereto.

In some embodiments, the item list indicates an item name of at least one live item. Illustratively, taking any one of the items in the second picture as an example, the terminal obtains at least one name matching degree between the item and at least one live item based on the item name of the item and the item name of the at least one live item, and determines the item as the target item when the target name matching degree exists in the at least one name matching degree and is greater than a preset threshold.

In some embodiments, the item list further indicates an item picture of at least one live item. Illustratively, taking any one of the items in the second picture as an example, the terminal obtains at least one picture matching degree between the item and at least one live item based on the item picture of the item and the item picture of the at least one live item, and determines the item as the target item when the target picture matching degree exists in the at least one picture matching degree and is greater than a preset threshold.

In other embodiments, the terminal can also determine the target item in combination with both of the above. Illustratively, taking any one of the items in the second picture as an example, the terminal obtains at least one name matching degree between the item and the at least one live item based on the item name of the item and the item name of the at least one live item, obtains at least one picture matching degree between the item and the at least one live item based on the item picture of the item and the item picture of the at least one live item, obtains at least one target matching degree between the item and the at least one live item based on the at least one name matching degree and the at least one picture matching degree, and determines the item as a target item when the first target matching degree is greater than a preset threshold value in the at least one target matching degree. By the method, accuracy of identifying the target object can be improved, the virtual image rendered based on the virtual image model can not shade the target object in the second picture, and display effect of the target object is ensured.

Through the above steps 504 and 505, the terminal can automatically identify the target item in the second screen based on the item list associated with the target scene, and provide a basis for the subsequent rendering of the avatar model. In other embodiments, the terminal may also be capable of determining a target item based on the item specified by the target object, e.g., the terminal provides a target item selection function for target object selection, as the embodiments of the present disclosure are not limited in this regard.

It should be noted that, in some embodiments, the terminal performs steps 504 and 505 first, and then performs steps 502 and 503. In other embodiments, the terminal performs step 502 and step 503 simultaneously with performing step 504 and step 505, which is not limited.

In step 506, the terminal renders the target avatar model based on the position of the target item in the second screen to display the target avatar in the second screen, the target avatar not occluding the target item in the second screen.

In the embodiment of the present disclosure, the terminal determines a display size and a display position of the target avatar based on a position of the target object in the second screen, and renders the target avatar model making the expression and the action based on the determined display size and display position to display the target avatar in the second screen. The expression of the target virtual image is synchronous with the expression of the target object, the action of the target virtual image is synchronous with the action of the target object, and the target virtual image does not shade the target object on the basis of displaying the target virtual image in the second picture, so that the display effect of real-time interaction between the virtual image and the real scene is presented, the display effect of the target object is ensured, and the user experience is improved.

It should be understood that, based on the same procedure as step 204 in the embodiment shown in fig. 2, the terminal provides an adjustment function for the target avatar, and the target object can also adjust the position and size of the target avatar in real time according to the requirement so as to avoid the display of the target object, which is not described herein.

In addition, it should be noted that, based on the embodiment shown in fig. 5, in some embodiments, the terminal can automatically determine the target avatar model according to the real scene, so that the target avatar rendered based on the target avatar model can be matched with the real scene, thereby improving the display effect of the avatar. In a live broadcast scene, a plurality of objects are often introduced in sequence in a live broadcast room, based on the fact that the objects in the second picture can be identified in real time by the terminal, a target virtual image model is automatically determined based on the identified object information, automatic switching of the target virtual image in the live broadcast process is achieved, the target virtual image can be matched with a real scene, and therefore interestingness of the live broadcast room is improved, and the retention rate and conversion rate of the live broadcast room can be effectively improved.

The above steps 501 to 506 are schematically described below with reference to fig. 6. Referring to fig. 6 schematically, fig. 6 is a schematic diagram of an avatar-based picture display method provided in an embodiment of the present disclosure. As shown in fig. 6, the first photographing device photographs the target object to obtain a first picture, and the terminal obtains expression information and action information of the target object based on the first picture and drives the target avatar model to make corresponding expression and action; meanwhile, the second shooting equipment shoots a target scene (the target scene comprises a target object) to obtain a second picture; the terminal renders the target virtual image model for making expressions and actions in real time based on the position of the target object in the second picture so as to display the target virtual image in the second picture, so that the target virtual image does not shade the target object, and a display effect of real-time interaction of the virtual image and the real scene is presented.

Referring to fig. 7, schematically, fig. 7 is a schematic view showing a target avatar provided by an embodiment of the present disclosure. As shown in fig. 7 (a), taking a target scene as an example of introducing furniture in a live broadcasting room, a terminal displays a target avatar with a smaller size in a picture corresponding to the target scene, so as to avoid blocking the display effect of the furniture. As shown in fig. 7 (b), taking a target scene as an example of introducing a pet in a living broadcast room, the terminal displays a target avatar with a smaller size in a picture corresponding to the target scene, thereby avoiding blocking the display effect of the pet. It should be understood that the examples herein are merely illustrative, and in some embodiments, the target scene may also be a production process that introduces the item in the living room, etc., which is not limited thereto.

In this case, the terminal displays the target virtual image of the target object in the picture according to the embodiment shown in fig. 5, so as to clearly display the object, display the real-time interaction effect between the virtual image and the real scene, increase the interest of the living broadcasting room, and improve the conversion rate of the living broadcasting room.

It should be understood that the embodiment shown in fig. 5 is exemplified by a live scene, and in other scenes where the object avatar needs to be avoided from blocking the object, the method may be used to display the image, which is not described herein.

In summary, in the image display method based on the avatar provided in the embodiment of the present disclosure, a first image is obtained by shooting a target object through a first shooting device of a terminal, a second image is obtained by shooting a target scene through a second shooting device of the terminal, based on the first image, according to the expression and the action of the target object in the first image, a target avatar model is driven to make a corresponding expression and action, and based on the position of a target object in the second image, the target avatar model making the expression and the action is rendered in real time, so that a target avatar synchronized with the expression and the action of the target object is displayed in the second image, so that the target avatar does not obstruct the target object, not only is a display effect of real-time interaction between the avatar and the real scene presented, but also the display effect of the target object is ensured, and the user experience is further improved.

Fig. 8 is a flowchart of another avatar-based picture display method provided in an embodiment of the present disclosure. As shown in fig. 8, the method is performed by a terminal configured with a first photographing apparatus and a second photographing apparatus. Illustratively, the method includes steps 801 through 806 described below.

In step 801, a terminal photographs a target object through a first photographing device to obtain a first picture of the target object, photographs a target scene through a second photographing device to obtain a second picture of the target scene.

In the disclosed embodiment, the target scene includes a green curtain and a target item, i.e., an item introduced within the living room. This step is similar to step 501 in the embodiment shown in fig. 5 and is not described here again.

In step 802, the terminal obtains expression information and motion information of the target object based on the first screen.

In the embodiment of the present disclosure, step 802 is the same as step 502 in the embodiment shown in fig. 5 and is not described herein.

In step 803, the terminal drives the target avatar model to make a corresponding expression and action based on the expression information and action information of the target object.

In the embodiment of the present disclosure, step 803 is the same as step 503 in the embodiment shown in fig. 5, and is not described herein.

In step 804, the terminal performs matting on the area except the target object in the first picture to obtain a first background picture.

In the embodiment of the disclosure, the terminal identifies a target object in the first picture, and based on the area where the target object is located, the terminal performs matting on the area except for the target object in the first picture to obtain the first background picture. It should be understood that the terminal may obtain the first background picture by frame matting based on the video frames corresponding to the first picture, or may perform matting on a target number of video frames at each interval to obtain the first background picture, which is not limited.

In some embodiments, a target region exists in the first background screen, and a contour of the target region is identical to a contour of the target object, so that a subsequent terminal can render a target avatar model based on the contour of the target region. For example, the target area is a blank area, an area after adding a shadow, or the like, and this is not limited. In other embodiments, the first background frame does not have a target area, that is, after the terminal performs matting on an area other than the target object, the terminal complements the area where the target object is located based on the scene information of the first frame, so that the target avatar model is conveniently rendered based on the user requirement (such as display size, display position, etc.).

In step 805, the terminal performs matting on the area where the target object is located in the second screen based on the area where the green screen is located in the second screen, so as to obtain a target object screen corresponding to the target object.

In the embodiment of the disclosure, the terminal identifies the region where the target object is located and the region where the green curtain is located in the second picture, and performs matting on the region where the target object is located in the second picture based on the region where the target object is located and the region where the green curtain is located, so as to obtain a target object picture. It should be understood that the terminal may obtain the target object picture by frame matting based on the video frame corresponding to the second picture, or may perform matting on a target number of video frames at intervals to obtain the target object picture, which is not limited.

It should be noted that, in some embodiments, the terminal performs step 805 first and then step 804. In other embodiments, the terminal performs step 804 and step 805 simultaneously, which is not limited.

In step 806, the terminal renders the target avatar model based on the first background screen and the target item screen to display the target avatar and the target item in the first background screen.

In the embodiment of the disclosure, under the condition that the target area exists in the first background picture, the terminal renders the target virtual image model based on the target area so as to display the target virtual image at the target area in the first background picture, and superimposes the target object picture on the first background picture, so that the target object is clearly displayed on the basis of displaying the target virtual image in the background where the target object is, the display effect of real-time interaction between the virtual image and the real scene is presented, the display effect of the target object is ensured, and the user experience is improved.

And rendering the target avatar model by the terminal based on a default size under the condition that no target area exists in the first background picture, and displaying the target avatar at a default position in the first background picture. It should be understood that, based on the same procedure as that of step 204 in the embodiment shown in fig. 2, the terminal provides an adjustment function for the target avatar, and the target object can also adjust the position and size of the target avatar in real time according to the requirement, which is not described herein.

It should be noted that, in other embodiments, the terminal does not perform the above step 804 to scratch the first screen, but directly renders the target avatar model based on the first screen and the target object screen, so as to display the target avatar and the target object in the first screen. This process may also be understood as directly superimposing the layer corresponding to the target avatar and the layer corresponding to the target object picture on the first picture, which may save the computing resources of the terminal, which is not limited by the embodiments of the present disclosure.

The above steps 801 to 805 are schematically described below with reference to fig. 9. Fig. 9 is a schematic view illustrating an avatar-based picture display method according to an embodiment of the present disclosure. As shown in fig. 9, the first photographing device photographs the target object to obtain a first picture, the terminal obtains expression information and action information of the target object based on the first picture, drives the target avatar model to make corresponding expression and action, and performs matting on the first picture to obtain a first background picture; simultaneously, the second shooting equipment shoots the target scene to obtain a second picture, and the second picture is scratched to obtain a target object picture; the terminal renders the target virtual image model with the expression and the action in real time so as to display the target virtual image in the first background picture, and display the real-time interaction effect of the virtual image and the real scene.

Referring to fig. 10 schematically, fig. 10 is a schematic view showing a target avatar provided by an embodiment of the present disclosure. As shown in fig. 10, the terminal photographs a target object through a first photographing device to obtain a first screen 1001, photographs a target scene (e.g., a table placed in front of a green screen) through a second photographing device to obtain a second screen 1002, fuses the target avatar with the first screen based on the above steps 802 to 805, and displays a first background screen 1003 including the target avatar, in which the expression and the motion of the target avatar are consistent with the target object.

It should be understood that the embodiment shown in fig. 8 is exemplified by a live broadcast scene, and in other scenes where objects need to be clearly displayed, the method may be used to display the picture, which is not described herein.

In summary, in the avatar-based image display method provided in the embodiments of the present disclosure, a first image is obtained by shooting a target object through a first shooting device of a terminal, a second image is obtained by shooting a target scene through a second shooting device of the terminal, based on the first image, according to the expression and the action of the target object in the first image, a target avatar model is driven to make a corresponding expression and action, and based on a first background image corresponding to the target object and a target object image corresponding to a target object, the target avatar model making the expression and the action is rendered in real time, so that a target avatar synchronized with the expression and the action of the target object is displayed in the first background image, thereby realizing real-time synchronization between the avatar and the expression and the action of a real person, realizing real-time fusion between the avatar and the real scene, presenting a display effect of real-time interaction between the avatar and the real scene, and improving the user experience.

The embodiments shown in fig. 5 to 10 provide several methods for displaying images based on an avatar in a live broadcast scene, in which, by using a first photographing device and a second photographing device configured by a terminal, real-time synchronization between the avatar and the expression and action of a real person is realized, and real-time fusion between the avatar and the real scene is also realized, so that a display effect of real-time interaction between the avatar and the real scene is presented, and user experience feeling can be effectively improved, and user retention rate and conversion rate in a live broadcast room are improved. In other live broadcast scenes, the terminal can realize real-time fusion of the virtual image and the real scene based on the first picture shot by the first shooting equipment, for example, a target object issues a scene of wearing articles (such as clothes, ornaments and the like) in a live broadcast room. This screen display method will be described by the following embodiment shown in fig. 11.

Fig. 11 is a flowchart of another avatar-based picture display method provided in an embodiment of the present disclosure. As shown in fig. 11, the method is performed by a terminal configured with a first photographing apparatus. Illustratively, the method includes steps 1101-1106 described below.

In step 1101, the terminal photographs a target object through a first photographing apparatus, and obtains a first screen of the target object.

In an embodiment of the present disclosure, the first screen includes a target object and a first item. For example, the first article is a wearing article. The procedure of obtaining the first image by the first photographing device in the foregoing embodiment is the same as that of the previous embodiment, and will not be described herein again.

In step 1102, the terminal obtains expression information and motion information of the target object based on the first picture of the target object captured by the first capturing device.

In the embodiment of the present disclosure, step 1102 is the same as step 502 in the embodiment shown in fig. 5 and is not described herein.

In step 1103, the terminal drives the target avatar model to make a corresponding expression and action based on the expression information and action information of the target object.

In the embodiment of the present disclosure, step 1103 is the same as step 503 in the embodiment shown in fig. 5, and will not be described herein.

In step 1104, the terminal performs matting on the area except the target object and the first object in the first picture to obtain a second background picture.

In the embodiment of the disclosure, the terminal identifies a target object and a first object in the first picture, and based on the area where the target object is located and the area where the first object is located, the terminal performs matting on the area, except for the target object and the first object, in the first picture, so as to obtain the second background picture. It should be understood that the terminal may obtain the second background picture by matting frame by frame based on the video frame corresponding to the first picture, or may perform matting on a target number of video frames at each interval to obtain the second background picture, which is not limited.

In step 1105, the terminal performs matting on the area where the first article is located in the first image, so as to obtain a first article image corresponding to the first article.

The second background picture and the first object picture are obtained by matting the first picture, a foundation is provided for the subsequent rendering of the target virtual image model, so that the target virtual image rendered based on the target virtual image model can be in seamless fit with the first object, and the display effect of the target virtual image is ensured.

It should be noted that, in some embodiments, the terminal performs step 1105 first and then step 1104. In other embodiments, the terminal performs step 1104 and step 1105 simultaneously, which is not limited.

In step 1106, the terminal renders the target avatar model based on the second background screen and the first item screen to display the target avatar and the first item in the second background screen.

In the embodiment of the disclosure, the terminal renders the target virtual image model based on the size of the first object in the first object picture so as to display the target virtual image and the first object in the second background picture, thereby ensuring the fitting degree between the target virtual image and the first object on the basis of displaying the target virtual image in the background of the target object, presenting a display effect of real-time interaction between the virtual image and the real scene, ensuring the display effect of the first object, and improving the user experience.

In some embodiments, the terminal may further be capable of rendering the target avatar model based on a head-to-body ratio of the target object to obtain a target avatar identical to the head-to-body ratio of the target object, further ensuring a display effect of the first item. Schematically, the terminal acquires contour information of a target object based on a first screen, the contour information indicating a head-to-body ratio of the target object; rendering the target avatar model based on the second background picture, the first object picture and the contour information to display the target avatar and the first object in the second background picture, wherein the head-body ratio of the target avatar is the same as the head-body ratio of the target object.

It should be noted that, in other embodiments, the terminal may further render the head model of the target avatar model based on the expression information and the head motion information of the target object, so that the target avatar and the first object are displayed in the second background frame, and under this scenario, the target avatar in the second background frame may be understood to be a head-covering avatar, that is, only the head is the avatar, and other body parts or the target object itself, so that not only a display effect of real-time interaction between the avatar and the real scene may be presented, but also the first object may be clearly presented, thereby improving the user experience.

The above steps 1101 to 1106 are schematically described with reference to fig. 12. Fig. 12 is a schematic view illustrating an avatar-based picture display method according to an embodiment of the present disclosure. As shown in fig. 12, the first photographing device photographs the target object and the first object to obtain a first picture, the terminal obtains expression information and action information of the target object based on the first picture, drives the target avatar model to make corresponding expression and action, and performs matting on the first picture to obtain a second background picture and a first object picture; and rendering the target avatar model for making the expression and the action in real time so as to display the target avatar and the first object in the second background picture, so that a display effect of real-time interaction between the avatar and the real scene is presented, and the head-body ratio of the target avatar is the same as that of the target object.

Referring to fig. 13, fig. 13 is a schematic view showing a target avatar provided in an embodiment of the present disclosure. As shown in fig. 13, the terminal photographs a target object and a first object (e.g., a watch) through a first photographing device to obtain a first screen 1301, fuses the target avatar, the first object, and a second background screen based on the steps 1102 to 1106, and displays a second background screen 1302 including the target avatar and the first object, wherein the expression and the motion of the target avatar are consistent with the target object in the second background screen 1302.

It should be understood that, in the embodiment shown in fig. 11, a live scene is taken as an example for illustration, and in other scenes where an object is required to be seamlessly attached to an avatar, the method may be used to display a picture, which is not described herein.

In summary, in the image display method based on the avatar provided in the embodiments of the present disclosure, the first photographing device of the terminal photographs the target object and the first object to obtain the first image, based on this, according to the expression and the action of the target object in the first image, the target avatar model is driven to make the corresponding expression and action, and based on the second background image corresponding to the target object and the first object image corresponding to the first object, the target avatar model making the expression and the action is rendered in real time, so that the target avatar synchronized with the expression and the action of the target object is displayed in the second background image, so that the real scene where the avatar and the target object are located and the real object are fused in real time, a display effect of real-time interaction of the avatar and the real scene is presented, and the user experience is improved.

Fig. 14 is a block diagram of a picture display device based on an avatar provided in an embodiment of the present disclosure. Referring to fig. 14, the apparatus is applied to a terminal configured with a first photographing device and a second photographing device, and includes an acquisition unit 1401, a driving unit 1402, and a display unit 1403.

An acquisition unit 1401 configured to perform acquisition of expression information and motion information of a target object based on a first screen of the target object captured by the first capturing device;

a driving unit 1402 configured to perform driving of the target avatar model to make a corresponding expression and action based on the expression information and action information of the target object;

a display unit 1403 configured to perform rendering of the target avatar model to display a target avatar of the target object in a second screen of the target scene photographed by the second photographing apparatus.

The method comprises the steps of shooting a target object through first shooting equipment of a terminal to obtain a first picture, shooting a target scene through second shooting equipment of the terminal to obtain a second picture, driving a target avatar model to make corresponding expression and action according to the expression and action of the target object in the first picture based on the first picture, rendering the target avatar model to display a target avatar synchronous with the expression and action of the target object in the second picture, and therefore real-time synchronization between the avatar and the expression and action of a real person is achieved, real-time fusion between the avatar and the real scene is achieved, a real-time interactive display effect of the avatar and the real scene is presented, and user experience is improved.

In some embodiments, the display unit 1403 is configured to perform:

identifying at least one item in the second frame;

acquiring scene information of the target scene based on the second picture;

displaying an avatar list indicating a plurality of avatar models;

In some embodiments, the model determination unit is configured to perform:

In some embodiments, the apparatus further comprises:

the display unit 1403 is further configured to perform rendering of the target avatar model based on the first background screen and the target item screen to display the target avatar and the target item in the first background screen.

In some embodiments, the apparatus further comprises:

the display unit 1403 is further configured to perform rendering of the target avatar model based on the second background screen and the first item screen to display the target avatar and the first item in the second background screen.

In some embodiments, the apparatus further comprises:

the display unit 1403 is configured to perform rendering of the target avatar model based on the second background screen, the first item screen, and the contour information to display the target avatar and the first item in the second background screen, the head-to-body ratio of the target avatar being the same as the head-to-body ratio of the target object.

In some embodiments, the display unit 1403 is further configured to perform at least one of:

It should be noted that: in the image display device based on the avatar provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the image display device based on the avatar provided in the above embodiment and the image display method based on the avatar belong to the same concept, and the detailed implementation process of the image display device based on the avatar is detailed in the method embodiment, which is not described herein.

In an exemplary embodiment, there is also provided an electronic device including a processor and a memory for storing at least one computer program loaded and executed by the processor to implement the above avatar-based picture display method.

Fig. 15 is a block diagram of a terminal according to an embodiment of the present disclosure when the electronic device is configured as a terminal. The terminal 1500 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 1500 can also be referred to as a user device, portable terminal, laptop terminal, desktop terminal, and the like.

In general, the terminal 1500 includes: a processor 1501 and a memory 1502.

The processor 1501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1501 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Progr ammable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1501 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1501 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and rendering of content to be displayed by the display screen. In some embodiments, the processor 1501 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 1502 may include one or more computer-readable storage media, which may be non-transitory. Memory 1502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1502 is used to store at least one program code for execution by processor 1501 to implement processes performed by a terminal in embodiments of the methods of the present disclosure.

In some embodiments, the terminal 1500 may further optionally include: a peripheral interface 1503 and at least one peripheral device. The processor 1501, memory 1502 and peripheral interface 1503 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1503 via a bus, signal lines, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1504, a display screen 1505, a camera assembly 1506, audio circuitry 1507, a positioning assembly 1508, and a power supply 1509.

A peripheral interface 1503 may be used to connect I/O (Input/Output) related at least one peripheral device to the processor 1501 and the memory 1502. In some embodiments, processor 1501, memory 1502, and peripheral interface 1503 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 1501, the memory 1502, and the peripheral interface 1503 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 1504 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1504 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1504 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1504 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 1504 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuit 1504 may also include NFC (Near Field Communication, short range wireless communication) related circuits, which are not limited in this application.

Display 1505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When display screen 1505 is a touch display screen, display screen 1505 also has the ability to collect touch signals at or above the surface of display screen 1505. The touch signal may be input to the processor 1501 as a control signal for processing. At this point, display 1505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1505 may be one, disposed on the front panel of the terminal 1500; in other embodiments, the display 1505 may be at least two, respectively disposed on different surfaces of the terminal 1500 or in a folded design; in other embodiments, display 1505 may be a flexible display disposed on a curved surface or a folded surface of terminal 1500. Even more, the display 1505 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display screen 1505 may be made of LCD (Liquid Crystal Display ), OLED (Organic light-Emitting Diode) or other materials.

The camera assembly 1506 is used to capture images or video. Optionally, the camera assembly 1506 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 1506 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuitry 1507 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 1501 for processing, or inputting the electric signals to the radio frequency circuit 1504 for voice communication. For purposes of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 1500. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1501 or the radio frequency circuit 1504 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 1507 may also include a headphone jack.

The positioning component 1508 is for positioning a current geographic location of the terminal 1500 to enable navigation or LBS (Location Bas ed Service, location-based services).

The power supply 1509 is used to power the various components in the terminal 1500. The power supply 1509 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 1509 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 1500 also includes one or more sensors 1510. The one or more sensors 1510 include, but are not limited to: acceleration sensor 1511, gyroscope sensor 1512, pressure sensor 1513, fingerprint sensor 1514, optical sensor 1515, and proximity sensor 1516.

The acceleration sensor 1511 may detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 1500. For example, the acceleration sensor 1511 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1501 may control the display screen 1505 to display the user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 1511. The acceleration sensor 1511 may also be used for the acquisition of motion data of a game or user.

The gyro sensor 1512 may detect a body direction and a rotation angle of the terminal 1500, and the gyro sensor 1512 may collect 3D motion of the terminal 1500 by a user in cooperation with the acceleration sensor 1511. The processor 1501, based on the data collected by the gyro sensor 1512, may implement the following functions: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 1513 may be disposed on a side frame of the terminal 1500 and/or under the display 1505. When the pressure sensor 1513 is disposed on the side frame of the terminal 1500, a grip signal of the user on the terminal 1500 may be detected, and the processor 1501 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 1513. When the pressure sensor 1513 is disposed at the lower layer of the display screen 1505, the processor 1501 realizes control of the operability control on the UI interface according to the pressure operation of the user on the display screen 1505. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 1514 is used for collecting the fingerprint of the user, and the processor 1501 recognizes the identity of the user according to the collected fingerprint of the fingerprint sensor 1514, or the fingerprint sensor 1514 recognizes the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 1501 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1514 may be disposed on the front, back, or side of the terminal 1500. When a physical key or vendor Logo is provided on the terminal 1500, the fingerprint sensor 1514 may be integrated with the physical key or vendor Logo.

The optical sensor 1515 is used to collect the ambient light intensity. In one embodiment, processor 1501 may control the display brightness of display screen 1505 based on the intensity of ambient light collected by optical sensor 1515. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1505 is turned up; when the ambient light intensity is low, the display luminance of the display screen 1505 is turned down. In another embodiment, the processor 1501 may also dynamically adjust the shooting parameters of the camera assembly 1506 based on the ambient light intensity collected by the optical sensor 1515.

A proximity sensor 1516, also referred to as a distance sensor, is typically provided on the front panel of the terminal 1500. The proximity sensor 1516 is used to collect the distance between the user and the front of the terminal 1500. In one embodiment, when the proximity sensor 1516 detects a gradual decrease in the distance between the user and the front of the terminal 1500, the processor 1501 controls the display 1505 to switch from the on-screen state to the off-screen state; when the proximity sensor 1516 detects that the distance between the user and the front surface of the terminal 1500 gradually increases, the processor 1501 controls the display screen 1505 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the structure shown in fig. 15 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A picture display method based on an avatar, the method being performed by a terminal configured with a first photographing apparatus and a second photographing apparatus, the method comprising:

Rendering the target avatar model to display the target avatar of the target object in a second picture of the target scene shot by the second shooting equipment.

2. The avatar-based picture display method of claim 1, wherein the rendering the target avatar model to display the target avatar of the target object in the second picture of the target scene photographed by the second photographing apparatus, comprises:

rendering the target avatar model based on a position of a target item in the second screen to display the target avatar in the second screen, the target avatar not obscuring the target item in the second screen.

3. The avatar-based picture display method of claim 2, wherein the method further comprises:

identifying at least one item in the second screen;

the target item is determined from the at least one item based on the item indicated by the item list, the item list being associated with the target scene, a degree of match between the target item and the item indicated by the item list meeting a target condition.

4. The avatar-based picture display method of claim 1, wherein the method further comprises at least one of:

and adjusting a display size of the target avatar in the second screen in response to a zoom operation of the target avatar.

5. The avatar-based picture display method of claim 1, wherein the method further comprises:

acquiring scene information of the target scene based on the second picture;

6. The avatar-based picture display method of claim 5, wherein the determining the target avatar corresponding to the scene information from at least one avatar model based on the scene information comprises at least one of:

determining the target avatar model from the at least one avatar model based on an item type of the target item indicated by the scene information;

7. The avatar-based picture display method of claim 1, wherein the method further comprises:

displaying an avatar list, the avatar list indicating a plurality of avatar models;

8. The avatar-based picture display method of claim 7, wherein the determining the target avatar model from the plurality of avatar models in response to an avatar model selection operation for the avatar list comprises:

and in the case that a first avatar model in the avatar list is in a selected state, displaying a next avatar model of the first avatar model in the avatar list as the selected state in response to the target action made by the target object identified from the first screen, the first avatar model being any one of the plurality of avatar models.

9. The avatar-based picture display method of claim 1, wherein the method further comprises:

carrying out matting on the areas except the target object in the first picture to obtain a first background picture;

10. The avatar-based picture display method of claim 1, wherein the method further comprises:

11. The avatar-based picture display method of claim 10, wherein the method further comprises:

the rendering the target avatar model based on the second background screen and the first item screen to display the target avatar and the first item in the second background screen, comprising:

rendering the target avatar model based on the second background picture, the first object picture and the profile information to display the target avatar and the first object in the second background picture, wherein the head-body ratio of the target avatar is the same as that of the target object.

12. The avatar-based picture display method of claim 1, wherein the method further comprises at least one of:

And when a first action is identified based on the action information of the target object, displaying second media information corresponding to the first action on the second screen.

13. An avatar-based picture display apparatus, the apparatus being applied to a terminal configured with a first photographing device and a second photographing device, the apparatus comprising:

a driving unit configured to perform driving of the target avatar model to make a corresponding expression and action based on the expression information and action information of the target object;

14. An electronic device, the electronic device comprising:

one or more processors;

a memory for storing the processor-executable program code;

Wherein the processor is configured to execute the program code to implement the avatar-based picture display method of any one of claims 1 to 12.

15. A computer readable storage medium, characterized in that program code in the computer readable storage medium, when executed by a processor of an electronic device, enables the electronic device to perform the avatar-based picture display method of any one of claims 1 to 12.

16. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the avatar-based picture display method of any one of claims 1 to 12.