CN112750186B

CN112750186B - Virtual image switching method, device, electronic equipment and storage medium

Info

Publication number: CN112750186B
Application number: CN202110069031.2A
Authority: CN
Inventors: 杨国基; 常向月; 刘云峰
Original assignee: Shenzhen Zhuiyi Technology Co Ltd
Current assignee: Shenzhen Zhuiyi Technology Co Ltd
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2024-02-23
Anticipated expiration: 2041-01-19
Also published as: CN112750186A

Abstract

The application discloses an avatar switching method, an avatar switching device, electronic equipment and a storage medium, and relates to the technical field of electronic equipment, wherein the avatar switching method comprises the following steps: acquiring a real image of a displayed current frame and an action intention of a target person in the real image; according to the action intention, acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video; and if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image. According to the method and the device for switching the virtual images, the transition images used for connection can be displayed when the real images corresponding to the real person customer service in the video are towards the virtual images corresponding to the switching robot customer service, so that the real images can be ensured to be smoothly transited to the virtual images, the switching action is not perceived by a user, and the user experience is improved.

Description

Virtual image switching method, device, electronic equipment and storage medium

Technical Field

The present invention relates to the technical field of electronic devices, and in particular, to an avatar switching method, an apparatus, an electronic device, and a storage medium.

Background

At present, mobile terminal equipment such as mobile phones and the like have higher popularization rate, and smart phones become necessary personal belongings for people to travel. With the rapid development of the mobile internet, various applications have appeared on the mobile terminal, many of which can provide a customer service function to a user so that the user can perform a product consultation and the like through the customer service.

In general, the customer service functions that may be provided to a user within a mobile terminal application typically include both visual robotic customer service and manual customer service. In order to solve some simple or common problems, the robot customer service can answer the problems of the user, and the complex or special problems can be processed by switching the manual customer service, so that the mutual switching of the manual customer service and the robot customer service is often involved when the customer service function is used.

However, in the current customer service video, when the pictures of the robot customer service and the pictures of the manual customer service are switched, the pictures can be directly switched, and the two pictures cannot be well connected, so that the picture switching is unnatural, and the user experience is reduced.

Disclosure of Invention

In view of the above problems, the present application proposes an avatar switching method, apparatus, electronic device, and storage medium.

In a first aspect, an embodiment of the present application provides an avatar switching method, including: acquiring a real image of a displayed current frame and an action intention of a target person in the real image; according to the action intention, acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video; if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of a target person, and the similarity of the inserted frame image and the real image is larger than that of the real image and the virtual image; and if the transition image is matched with the virtual image, switching the transition image into a virtual video.

Further, before generating the transition image based on the real image and the virtual image and switching the real image to the transition image if the real image and the virtual image do not match, the method further includes: extracting a first characteristic parameter from the real image and extracting a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person; and if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than a similarity threshold value, determining that the real image and the virtual image are not matched.

Further, the first feature parameter includes a first feature point, the second feature parameter includes a second feature point, the first feature point and the second feature point are the same feature point of the target person, and before determining that the real image and the virtual image are not matched if the similarity between the first feature parameter and the second feature parameter is smaller than a similarity threshold, the method further includes: determining whether a distance between the first feature point and the second feature point is not less than a distance threshold; and if the distance between the first characteristic point and the second characteristic point is not smaller than the distance threshold value, determining that the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value.

Further, generating a transition image based on the real image and the virtual image, and switching the real image to the transition image, includes: generating a third characteristic parameter based on the first characteristic parameter of the real image and the second characteristic parameter of the virtual image, wherein the third characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity of the third characteristic parameter and the first characteristic parameter is larger than that of the first characteristic parameter and the second characteristic parameter; generating an interpolated image based on the third feature parameter; and generating a transition image based on the interpolated image, and switching the real image to the transition image.

Further, generating an interpolated image based on the third feature parameter includes: taking the similarity between the third characteristic parameter and the first characteristic parameter as a first similarity, taking the similarity between the first characteristic parameter and the second characteristic parameter as a second similarity, and calculating the difference between the first similarity and the second similarity; and if the difference value between the first similarity and the second similarity is not smaller than the specified value, generating an interpolated frame image based on the third characteristic parameter.

Further, generating a transition image based on the interpolated image includes: if the similarity between the third characteristic parameter and the first characteristic parameter is smaller than a similarity threshold value, determining the frame inserting image as a first frame image of the transition image, and determining whether the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value; if the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, generating a fourth characteristic parameter based on the second characteristic parameter and the third characteristic parameter, wherein the fourth characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity between the second characteristic parameter and the third characteristic parameter; generating a target frame inserting image based on the fourth characteristic parameter; a transition image is generated based on the interpolated image and the target interpolated image.

Further, generating the target interpolated image based on the fourth feature parameter includes: taking the similarity of the fourth characteristic parameter and the second characteristic parameter as a third similarity, taking the similarity of the third characteristic parameter and the second characteristic parameter as a fourth similarity, and calculating a difference value between the third similarity and the fourth similarity; and if the difference value between the third similarity and the fourth similarity is not smaller than the specified value, generating a target frame inserting image based on the fourth characteristic parameter.

Further, generating a transition image based on the interpolated image and the target interpolated image, comprising: determining whether the similarity of the fourth characteristic parameter and the second characteristic parameter is smaller than a similarity threshold value; and if the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining the target frame inserting image as the last frame image of the transition image, and generating the transition image through the frame inserting image and the target frame inserting image.

Further, generating an interpolated image based on the third feature parameter includes: and inputting the third characteristic parameters into the pre-trained avatar model, and acquiring the frame inserting images corresponding to the third characteristic parameters.

Further, inputting the third feature parameter into the pre-trained avatar model, obtaining an interpolated image corresponding to the third feature parameter, including: acquiring a currently recorded real video comprising a real image, and performing fine adjustment on a pre-trained virtual image model through the real video; and inputting the third characteristic parameters into the finely tuned virtual image model, and obtaining the frame inserting images corresponding to the third characteristic parameters.

Further, before generating the interpolated image based on the third feature parameter, the method further includes: acquiring a sample image of a target person; extracting sample characteristic parameters and sample frame inserting images of a target person from the sample image; and inputting the sample characteristic parameters and the sample frame inserting images into a machine learning model for training to obtain a pre-trained avatar model.

Further, before generating the transition image based on the real image and the virtual image and switching the real image to the transition image if the real image and the virtual image do not match, the method further includes: and determining whether the real image and the virtual image are matched or not through an optical flow method.

Further, before acquiring the virtual video corresponding to the action intention and the virtual image of the first frame of the virtual video, the method further includes: determining whether a virtual video corresponding to the action intention exists; if yes, executing to acquire a virtual video corresponding to the action intention; if the answer template does not exist, acquiring an answer template corresponding to the action intention; and extracting characteristic parameters of the target person from the real image, and generating a virtual video corresponding to the action intention based on the characteristic parameters and the answer template.

Further, before acquiring the real image of the displayed current frame and the action intention of the target person in the real image, the method further comprises: determining whether the real image satisfies a switching condition when the real image including the target person is played; and if the real image meets the switching condition, executing the acquisition of the real image of the displayed current frame and the action intention of the target person in the real image.

Further, determining whether the real image satisfies the switching condition includes: determining whether a switching instruction is received; if a switching instruction is received, determining that the real image including the target person is played to meet the switching condition.

In a second aspect, an embodiment of the present application provides an avatar switching device including: the device comprises a first acquisition module, a second acquisition module, a first switching module and a second switching module. Wherein: the first acquisition module is used for acquiring a real image of the current frame and an action intention of a target person in the real image; the second acquisition module is used for acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention; the first switching module is used for generating a transition image based on the real image and the virtual image if the real image and the virtual image are not matched, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of a target person, and the similarity between the inserted frame image and the real image is larger than that between the real image and the virtual image; and the second switching module is used for switching the transition image into the virtual video if the transition image is matched with the virtual image.

Further, the avatar switching device further includes:

the feature parameter extraction module is used for extracting a first feature parameter from the real image and extracting a second feature parameter from the virtual image, wherein the first feature parameter and the second feature parameter are the same feature parameter of the target person.

And the matching determining module is used for determining that the real image and the virtual image are not matched if the similarity of the first characteristic parameter and the second characteristic parameter is smaller than a similarity threshold value.

Further, the first feature parameter includes a first feature point, the second feature parameter includes a second feature point, the first feature point and the second feature point are the same feature point of the target person, and the avatar switching device further includes:

and the distance judging module is used for determining whether the distance between the first characteristic point and the second characteristic point is not smaller than a distance threshold value.

And the similarity determining module is used for determining that the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value if the distance between the first characteristic point and the second characteristic point is not smaller than the distance threshold value.

Further, the first switching module includes:

and the third characteristic parameter sub-module is used for generating a third characteristic parameter based on the first characteristic parameter of the real image and the second characteristic parameter of the virtual image, wherein the third characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity of the third characteristic parameter and the first characteristic parameter is larger than the similarity of the first characteristic parameter and the second characteristic parameter.

And the frame inserting image generating sub-module is used for generating frame inserting images based on the third characteristic parameters.

And the first switching sub-module is used for generating a transition image based on the frame inserting image and switching the real image into the transition image.

Further, the frame-inserting image generating submodule is specifically configured to take a similarity between the third feature parameter and the first feature parameter as a first similarity, take a similarity between the first feature parameter and the second feature parameter as a second similarity, and calculate a difference between the first similarity and the second similarity; and if the difference value between the first similarity and the second similarity is not smaller than the specified value, generating an interpolated frame image based on the third characteristic parameter.

Further, the first switching module further includes:

and the similarity comparison sub-module is used for determining the frame inserting image as the first frame image of the transition image if the similarity between the third characteristic parameter and the first characteristic parameter is smaller than a similarity threshold value, and determining whether the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value.

And the fourth characteristic parameter sub-module is used for generating a fourth characteristic parameter based on the second characteristic parameter and the third characteristic parameter if the similarity of the third characteristic parameter and the second characteristic parameter is smaller than a similarity threshold, wherein the fourth characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity of the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity of the second characteristic parameter and the third characteristic parameter.

And the target frame inserting image generating sub-module is used for generating a target frame inserting image based on the fourth characteristic parameters.

And the transition image generation sub-module is used for generating a transition image based on the frame inserting image and the target frame inserting image.

Further, the target frame-inserted image generating sub-module is specifically configured to take a similarity between the fourth feature parameter and the second feature parameter as a third similarity, take the similarity between the third feature parameter and the second feature parameter as a fourth similarity, and calculate a difference between the third similarity and the fourth similarity; and if the difference value between the third similarity and the fourth similarity is not smaller than the specified value, generating a target frame inserting image based on the fourth characteristic parameter.

Further, the transition image generation sub-module is specifically configured to determine whether the similarity between the fourth feature parameter and the second feature parameter is less than a similarity threshold; and if the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining the target frame inserting image as the last frame image of the transition image, and generating the transition image through the frame inserting image and the target frame inserting image.

Further, the frame inserting image generating submodule is specifically configured to input a third feature parameter into the pre-trained avatar model, and obtain a frame inserting image corresponding to the third feature parameter.

Further, the avatar switching device further includes:

and the sample image acquisition module is used for acquiring a sample image of the target person.

And the sample extraction module is used for extracting sample characteristic parameters and sample plug-in frame images of the target person from the sample images.

And the training module is used for inputting the sample characteristic parameters and the sample frame inserting images into the machine learning model for training to obtain a pre-trained virtual image model.

Further, the avatar switching device further includes:

and the virtual video detection module is used for determining whether a virtual video corresponding to the action intention exists.

And the first execution module is used for executing the virtual video corresponding to the action intention if the virtual video exists.

And the answer template acquisition module is used for acquiring an answer template corresponding to the action intention if the virtual video does not exist.

And the virtual video generation module is used for extracting the characteristic parameters of the target person from the real image and generating a virtual video corresponding to the action intention based on the characteristic parameters and the answer template.

Further, the avatar switching device further includes:

and the switching detection module is used for determining whether the real image meets the switching condition when the real image comprising the target person is played.

And the second execution module is used for executing the acquisition of the displayed real image of the current frame and the action intention of the target person in the real image if the real image meets the switching condition.

Further, the switching detection module is specifically configured to determine whether a switching instruction is received; if a switching instruction is received, determining that the real image including the target person is played to meet the switching condition.

In a third aspect, embodiments of the present application provide an electronic device, including: memory, one or more processors, and one or more applications. Wherein the one or more processors are coupled to the memory. One or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method as described above in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having program code stored therein, the program code being callable by a processor to perform a method as described above in the first aspect.

According to the virtual image switching method, the device, the electronic equipment and the storage medium, the real image of the current frame and the action intention of the target person in the real image are obtained; according to the action intention, acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video; if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of a target person, and the similarity of the inserted frame image and the real image is larger than that of the real image and the virtual image; if the transition image is matched with the virtual image, the transition image is switched into the virtual video, so that the smooth and natural transition of the real image into the virtual image is realized, which is equivalent to the smooth switching of the manual customer service picture into the virtual customer service picture when the user watches the customer service video, and the user cannot feel the switching process, thereby improving the user experience.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 illustrates an application environment of an avatar switching method provided in a first embodiment of the present application.

Fig. 2 is a flowchart illustrating an avatar switching method according to a first embodiment of the present application.

Fig. 3 is a flowchart illustrating an avatar switching method according to a second embodiment of the present application.

Fig. 4 is a flowchart illustrating a procedure of S240 in the avatar switching method provided in the second embodiment of the present application.

Fig. 5 is a flowchart illustrating an avatar switching method according to a third embodiment of the present application.

Fig. 6 is a flowchart illustrating a procedure of S360 in the avatar switching method provided in the third embodiment of the present application.

Fig. 7 is a flowchart illustrating a procedure of S370 in the avatar switching method provided in the third embodiment of the present application.

Fig. 8 is a flowchart illustrating a procedure of S373 in the avatar switching method provided in the third embodiment of the present application.

Fig. 9 is a flowchart illustrating a procedure of S374 in the avatar switching method provided in the third embodiment of the present application.

Fig. 10 is a flowchart illustrating an avatar switching method according to a fourth embodiment of the present application.

Fig. 11 is a flowchart illustrating an avatar switching method according to a fifth embodiment of the present application.

Fig. 12 is a flowchart illustrating an avatar switching method according to a sixth embodiment of the present application.

Fig. 13 is a flowchart illustrating an avatar switching method according to a seventh embodiment of the present application.

Fig. 14 is a block diagram illustrating an avatar switching device according to an eighth embodiment of the present application.

Fig. 15 is a block diagram of an electronic apparatus for performing an avatar switching method according to an embodiment of the present application according to a ninth embodiment of the present application.

Fig. 16 is a storage unit for storing or carrying program codes for implementing the avatar switching method according to the embodiment of the present application according to the tenth embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

With the development of technology, the demand of people for humanized experience in the use process of various intelligent products is gradually increasing, and in the communication process with customer service, users hope to communicate not only by obtaining text or voice replies, but also by a more natural interaction mode similar to interpersonal communication in real life. Therefore, the current intelligent product can communicate with the user by playing the video containing the virtual image of the robot customer service so as to meet the visual requirement of the user.

In the process of actually using the customer service function, when the robot customer service encounters a problem that cannot be answered, the robot customer service needs to be switched to the manual customer service to answer the problem of the user, after the manual customer service is answered, the robot customer service can be switched to continue to communicate, meanwhile, virtual images corresponding to the customer service robot displayed in the customer service video are also converted into real images corresponding to the manual customer service, and the real images are converted back after the answer is completed.

However, in the current switching manner, the current frame image displayed in the video is generally directly switched to the virtual image corresponding to the robot customer service or the real image corresponding to the manual customer service, so that if the difference between the person in the real image and the person in the virtual image is large in the process of switching the virtual image and the real image, the user feels abrupt and unnatural in picture switching, and the user experience is reduced.

The inventor finds that if the characteristics of the virtual image of the customer service robot in the virtual image and the characteristics of the action, the expression, the gesture and the like of the artificial customer service in the real image are kept consistent as much as possible in the switching process, the switching process of the two images can be enabled to be smoother, and therefore user experience can be improved.

However, in the actual research process, the inventor also found that, because the picture of the virtual customer service robot and the picture of the real human customer service are always in a dynamic state, it is difficult to find the time when the picture of one customer service robot and the picture of the human customer service are synchronous, that is, the time when the virtual image of the customer service robot in the virtual image is consistent with the characteristics of the motion, expression, gesture and the like of the human customer service in the real image, especially when the real image switches the virtual image, the real person displayed in the real image cannot predict the motion, expression, gesture and the like thereof, so that the virtual image is difficult to switch at the synchronous time.

In order to improve the problems, the inventor provides an avatar switching method, an apparatus, an electronic device and a storage medium in the embodiments of the present application, which can display a transition image for linking when a real image corresponding to a real person customer service in a video is switched to a virtual image corresponding to a robot customer service, thereby ensuring that the real image can be smoothly transited to the virtual image, making a user not aware of a switching action, and improving user experience.

The method, the device, the electronic equipment and the storage medium for switching the avatar, which are provided by the embodiment of the application, are described in detail below through specific embodiments.

First embodiment

Referring to fig. 1, fig. 1 shows a schematic view of an application environment suitable for use in an embodiment of the present application. The avatar switching method provided in the embodiment of the present application may be applied to the interactive system 100 as shown in fig. 1. The interactive system 100 comprises a terminal device 101 and a server 102, the server 102 being in communication connection with the terminal device 101. The server 102 may be a conventional server or a cloud server, which is not specifically limited herein.

The terminal device 101 may be various electronic devices having a display screen, having a data processing module, having a photographing camera, having audio input/output, etc., and supporting data input, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a projector, a desktop computer, a self-service terminal, a wearable electronic device, etc. Specifically, the data input may be inputting voice based on a voice module provided on the electronic device, inputting characters by a character input module, and the like.

The terminal device 101 may have a client application program installed thereon, and the user may be based on the client application program (for example, APP, weChat applet, etc.), where the conversation robot of the present embodiment may also be a client application program configured in the terminal device 101. A user can register a user account in the server 102 based on a client application program, and communicate with the server 102 based on the user account, for example, the user logs in the user account in the client application program, inputs text information or voice information based on the user account through the client application program, and the like, after receiving the information input by the user, the client application program can send the information to the server 102, so that the server 102 can receive the information, process and store the information, the server 102 can also receive the information and return a corresponding output information to the terminal device 101 according to the information, and the specifically output information can be a pre-stored virtual video corresponding to the customer service of the robot in the server 102 for replying to the customer problem, or can be a real video corresponding to the real customer service acquired by the server 102 in real time.

In some embodiments, the device for performing the avatar switching may also be disposed on the terminal device 101, so that the terminal device 101 may implement interaction with the user without relying on the server 102 to establish communication, at this time, the terminal device 101 may store a virtual video corresponding to the robot customer service and receive or collect a real video corresponding to the real person customer service in real time, and the interaction system 100 may only include the terminal device 101.

Referring to fig. 2, fig. 2 is a flowchart illustrating an avatar switching method according to an embodiment of the present application. The method may include:

s110, acquiring a real image of the displayed current frame and an action intention of a target person in the real image.

The real image may be an image photographed by the terminal device through the camera in real time, or may be an image photographed by other devices through the camera in real time and sent to the terminal device, for example, a video call, an online video, etc. The real image may or may not include a real person, for example, a real image may be displayed as an object that the real person is demonstrating.

The target person may be a person actually existing in reality, and the information such as appearance, identity, etc. of the target person is determined. For example, in real person customer service a and real person customer service B, the target person may be real person customer service a.

In some embodiments, the terminal device may display a real image including the target person, for example, the real image may include a real video of the target person, the real video including the target person may be played on a display screen of the terminal device, and the terminal device may take a current frame picture displayed by the real video as a real image of the current frame and acquire the real image of the current frame.

The action intention may be an intention that the target person in the real image is to switch the real image displayed by the terminal device to the virtual image, for example, a real person customer service displayed by the device terminal is to switch the current manual service to a service of the robot customer service after the problem of the user is solved. For another example, the action intention may be an intention of the target person in the real image to switch the real image displayed by the terminal device to one of a plurality of virtual videos, which may include, as an example, a virtual video for explaining the service a, a virtual video for explaining the service B, a virtual video for ending the conversation, and the like.

Wherein the virtual video may be a video generated in advance according to character features of the target character, and the virtual video may be composed of a plurality of frames of virtual images.

As one way, when the action intention of the target person in the real image is acquired, the action intention may be determined by recognizing the gesture of the target person in the real image, specifically, the mapping relationship between the various gestures of the target person and the various action intentions may be stored in the terminal device in advance, and when the gesture a is recognized by the target person in the real image, the action intention corresponding to the gesture a may be obtained, and in the above example, the action intention corresponding to the gesture a may be an intention to switch the real image displayed by the terminal device to a virtual video for explaining the service a.

Alternatively, when the real image is a real video including audio information, the audio information may be extracted from the real video, and then the action intention of the target person may be recognized based on the audio information, which is equivalent to predicting the next action of the real person customer service through the voice of the real person customer service. Alternatively, the voice information may be converted into text information, and the text information may be input into a pre-trained intent recognition model, so as to obtain an action intent corresponding to the text information output by the model. The intention recognition model can be trained by a plurality of sample text information and a plurality of sample action intentions in advance. Alternatively, dialogue information may be generated from audio information in a real video and audio information of a user collected by a terminal device, and then an action intention may be determined based on the dialogue information, wherein the manner of determining the action intention from the dialogue information may refer to the manner of determining the action intention from text information as described above. Since the dialogue information can accurately reflect the action intention of the target person in the real video, the action intention can be accurately identified by determining the action intention through the dialogue information.

As still another mode, the pronunciation of the target person can be identified according to the lip action of the target person in the real image, then text information is generated according to the pronunciation, and then the text information is input into a pre-trained intention identification model to obtain the action intention corresponding to the text information output by the model. Therefore, the action intention of the target person in the real image can be accurately identified under the condition that the audio information cannot be obtained well.

Alternatively, the real image may be an image of a real target person, such as a photograph, video, etc., of the target person, which is captured by the terminal device through the camera. Wherein the virtual image and the real image include at least the face of the target person, optionally the virtual image and the real image may also include the body shape, gestures, actions, etc. of the target person.

Alternatively, the virtual image of the target person may be an image generated from the person characteristics of the target person, and thus, the target person (hereinafter, referred to as an avatar) displayed in the virtual image may be very similar to the features of the target person in reality, such as appearance, body shape, expression, and the like. The character features may include facial features, body type features, gesture features, and the like, among others.

S120, according to the action intention, obtaining a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video.

In some embodiments, a mapping relationship between a plurality of action intents and a plurality of virtual videos may be pre-established, and optionally, the plurality of action intents may correspond to the plurality of virtual videos one by one. As an example, a mapping relationship table of a plurality of action intents and a plurality of virtual videos may be as shown in table 1:

TABLE 1

Action intention	Virtual video
		Action intention a1	Virtual video a1
Action intention a2	Virtual video a2
		Action intention a3	Virtual video a3

It can be seen that after determining the action intention, the corresponding virtual video may be queried in combination with the action intention and table 1, for example, when the action intention is the action intention a2, the corresponding virtual video a2 may be queried from table 1, thereby obtaining the virtual video corresponding to the action intention.

Alternatively, table 1 may be stored locally at the terminal device or in a cloud server in communication with the terminal device, so as to be callable from the cloud server when needed for use.

When a virtual video corresponding to an action intention is obtained, a first frame image of the virtual video extracted from the virtual video may be used as a virtual image (hereinafter, referred to as a virtual image) of a first frame of the virtual video.

And S130, if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of the target person, and the similarity between the inserted frame image and the real image is larger than that between the real image and the virtual image.

In some embodiments, the terminal device may compare the similarity between the real image and the virtual image to determine whether the real image and the virtual image are matched, specifically, may calculate the similarity between the real image and the virtual image according to the feature parameters (such as the action feature, the expression feature, the facial feature, etc.) of the target person in the real image and the corresponding feature parameters of the target person in the real image, and if the similarity exceeds the similarity threshold, may determine that the real image and the virtual image are matched. If the similarity does not exceed the similarity threshold, it may be determined that the real image and the virtual image do not match.

It will be appreciated that the greater the similarity between two images, the closer the two images are.

When the real image and the virtual image do not match, a transition image may be generated based on the real image and the virtual image, in particular, the transition image may be generated by means of image-like interpolation, wherein the transition image may include a plurality of interpolation images including the target person, and when the transition image includes a plurality of interpolation images, the transition image is a transition video. Alternatively, the transition image may include only one interpolation image. The number of the frame inserting images in the transition image can be determined according to the similarity between the real image and the virtual image, when the similarity is large, the transition image may only comprise one frame inserting image, and when the similarity is small, the transition image needs to comprise a plurality of frame inserting images. Wherein, since the similarity between the frame inserting image and the real image is larger than that between the real image and the virtual image, the frame inserting image is closer to the virtual image than the real image.

In other embodiments, it may also be determined by optical flow that whether the real image and the virtual image match, as an example, an optical flow vector for a specified point of the target person in the real image (which may be an edge point) and an optical flow vector for a specified point of the target person in the virtual image may be obtained by optical flow, and then a difference between the two optical flow vectors may be compared, and if the difference is large, it may be determined that the real image and the virtual image do not match, and if the difference is small, it may be determined that the real image and the virtual image match.

And S140, switching the transition image into a virtual video if the transition image is matched with the virtual image.

In some embodiments, the embodiment of determining whether the transition image and the virtual image match may refer to the embodiment of whether the real image and the virtual image match, that is, through similarity comparison between the two images, so that the description is omitted.

If the transition image and the virtual image are matched, the display picture of the terminal equipment facing the user can be switched into a virtual video.

It can be seen that, in this embodiment, by acquiring the real image of the current frame displayed and the action intention of the target person in the real image; according to the action intention, acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video; if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of a target person, and the similarity of the inserted frame image and the real image is larger than that of the real image and the virtual image; if the transition image is matched with the virtual image, the transition image is switched into the virtual video, so that the smooth and natural transition of the real image into the virtual image is realized, which is equivalent to the smooth switching of the manual customer service picture into the virtual customer service picture when the user watches the customer service video, and the user cannot feel the switching process, thereby improving the user experience.

Second embodiment

Referring to fig. 3, fig. 3 is a flowchart illustrating an avatar switching method according to an embodiment of the present disclosure. The method may be applied to the interactive system 100 provided in the first embodiment, and in particular, may be applied to the terminal device 101 or the server 102 in the interactive system, where the method may include:

s210, acquiring a real image of the displayed current frame and action intention of a target person in the real image.

S220, according to the action intention, obtaining a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video.

The specific embodiments of S210 to S220 can refer to S110 to S120, and are not described herein.

S230, extracting a first characteristic parameter from the real image and extracting a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person.

The first feature parameter and the second feature parameter may include one or more combinations of feature points (which may be called key points), actions, gestures, expressions, sizes, angles, and other feature parameters of the target person. It will be appreciated that the angle characteristic parameter may characterize the angle at which the target person is displayed in the real or virtual image, e.g., the angle of the side, front, etc., of the target person. The size characteristic parameter may characterize a display size of the target person in the real image or the virtual image. The motion and gesture feature parameters may characterize the location of each portion of the target person in the image of the current frame in the video.

As one way, when the first feature parameter is extracted from the real image, when the first feature parameter is a key point, coordinates of the key point in the real image may be extracted as the feature parameter. When the first characteristic parameter is of a size, a ratio of the outline of the target person to the size of the real image can be extracted as the characteristic parameter. When the first feature parameter is an expression, the expression parameter of the target person in the real image can be extracted and identified, the expression parameter is compared with the expression parameter marked in advance, for example, the expression parameter of the target person with four expressions of happiness, anger, grime and happiness is marked in advance, if the extracted expression parameter is matched with the expression parameter of happiness, the expression corresponding to the extracted expression parameter is determined to be happiness, and if the label corresponding to the expression happiness is 1, the feature parameter is determined to be 1 when the first feature parameter is the expression. Alternatively, the feature parameter extraction of the motion and the gesture may refer to the feature parameter extraction of the expression. Similarly, the second feature parameter may be extracted from the virtual image by referring to the manner in which the first feature parameter is extracted from the real image.

Wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person, for example, when the first characteristic parameter is an angle characteristic parameter of the target person, the second characteristic parameter is also an angle characteristic parameter of the target person. For another example, when the first characteristic parameter is a key point corresponding to an eye portion of the target person, the second characteristic parameter is also a key point corresponding to an eye portion of the target person.

And S240, if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than a similarity threshold value, determining that the real image and the virtual image are not matched.

In some embodiments, a plurality of characteristic parameters in the first characteristic parameters may be used as a first vector, and a plurality of characteristic parameters in the second characteristic parameters may be used as a second vector, wherein the number and types of characteristic parameters in the first vector are the same as the number and types of characteristic parameters in the second vector. And then, according to the first vector and the second vector, the distance between the first characteristic parameter and the second characteristic parameter is obtained, and the distance can represent the similarity between the first characteristic parameter and the second characteristic parameter, and the smaller the distance is, the larger the similarity is.

As an example, assume that the first feature parameter comprises n feature parameters, which may be referred to as frame parameters of the current frame of the virtual image, which frame parameters represent a first vector x [1], x [2], x [3] … x [ n ]. Wherein each feature parameter in the first vector may be used to represent a feature value of one dimension, for example, x 1 may represent coordinates of a key point of a target person in the virtual image, x 2 may represent a feature parameter of an expression of the target person in the virtual image, x 3 may represent a feature parameter of an action of the target person in the virtual image, and so on, feature parameters of n dimensions may be obtained. Similarly, the second characteristic parameter may be expressed as a vector y [1], y [2], y [3] … y [ n ]. For example, x [1] may be represented as the keypoint coordinates of the target person in the virtual image, and y [1] may be represented as the keypoint coordinates of the target person in the real image. The first and second characteristic parameters may then be taken into a distance function f for calculating the characteristic parameter distance, and the distance between the first and second characteristic parameters may be calculated by f (x 1, x 2, x 3 … x n, y 1, y 2, y 3 … y n), wherein the distance output by the distance function f may be a value of the float type. And finally, according to the distance, a comparison result of the similarity and the similarity threshold value can be analyzed, for example, the similarity threshold value corresponds to the distance threshold value in advance, if the distance between the first characteristic parameter and the second characteristic parameter is smaller than the distance threshold value, the similarity between the first characteristic parameter and the second characteristic parameter is determined to be larger than the similarity threshold value, otherwise, if the distance between the first characteristic parameter and the second characteristic parameter is determined to be larger than the distance threshold value, the similarity between the first characteristic parameter and the second characteristic parameter is determined to be smaller than the similarity threshold value.

And when the comparison result shows that the similarity of the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value, determining that the real image and the virtual image are not matched.

In some embodiments, the first feature parameter includes a first feature point, and the second feature parameter includes a second feature point, as shown in fig. 4, S240 may specifically include the following steps:

s241, it is determined whether the distance between the first feature point and the second feature point is not less than a distance threshold.

And S242, if the distance between the first characteristic point and the second characteristic point is not smaller than the distance threshold value, determining that the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value.

As an example, assume a distance threshold of 3mm and a similarity threshold of 80. When the distance between the first characteristic point and the second characteristic point is smaller than the distance threshold value, the similarity between the corresponding first characteristic parameter and the second characteristic parameter is larger than 80; when the distance between the first feature point and the second feature point is not smaller than the distance threshold, the similarity between the corresponding first feature parameter and the second feature parameter is smaller than 80. Therefore, when the distance between the first feature point and the second feature point is 2mm, the similarity between the corresponding third feature parameter and the second feature parameter is greater than 80, and it may be determined that the similarity between the first feature parameter and the second feature parameter is not less than the similarity threshold. When the distance between the first feature point and the second feature point is 4mm, the similarity between the corresponding first feature parameter and the second feature parameter is smaller than 80, and it can be determined that the similarity between the first feature parameter and the second feature parameter is smaller than the similarity threshold.

Considering that the feature points of the person can show the gesture of the person more than other features of the person, the similarity condition of the first feature parameter and the second feature parameter can be effectively judged by the distance between the first feature point of the first feature parameter and the second feature point of the second feature parameter.

S250, if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of the target person, and the similarity between the inserted frame image and the real image is larger than that between the real image and the virtual image.

And S260, if the transition image is matched with the virtual image, switching the transition image into the virtual video.

The specific embodiments of S250 to S260 can refer to S130 to S140, and are not described herein.

In the present embodiment, the first characteristic parameter is extracted from the real image and the second characteristic parameter is extracted from the virtual image; if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, the real image and the virtual image are determined to be not matched, so that whether the characteristics of the target person in the virtual image and the characteristics of the target person in the real image are close enough can be accurately judged, and if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, the difference between the characteristics of the target person in the virtual image and the target person in the real image is larger, and the real image and the virtual image are determined to be not matched.

Third embodiment

Referring to fig. 5, fig. 5 is a flowchart illustrating an avatar switching method according to an embodiment of the present disclosure. The method may be applied to the interactive system 100 provided in the first embodiment, and in particular, may be applied to the terminal device 101 or the server 102 in the interactive system, where the method may include:

s310, acquiring a real image of the displayed current frame and action intention of a target person in the real image.

S320, according to the action intention, obtaining a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video.

S330, extracting a first characteristic parameter from the real image and a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person.

And S340, if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than a similarity threshold value, determining that the real image and the virtual image are not matched.

The specific embodiments of S310 to S340 can refer to S210 to S240, and are not described herein.

S350, if the real image and the virtual image are not matched, generating a third characteristic parameter based on the first characteristic parameter of the real image and the second characteristic parameter of the virtual image, wherein the third characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity of the third characteristic parameter and the first characteristic parameter is larger than the similarity of the first characteristic parameter and the second characteristic parameter.

Wherein the type and the number of the characteristic parameters in the third characteristic parameters are the same as the type and the number of the first characteristic parameters.

In some embodiments, the similarity of the first and second characteristic parameters may be represented by a distance between the first and second characteristic parameters, since the smaller the distance between the first and second characteristic parameters, the greater the similarity. It may be determined whether the similarity of the first and second characteristic parameters is less than a similarity threshold based on determining a distance between the first and second characteristic parameters. When the similarity of the first characteristic parameter and the second characteristic parameter is determined to be smaller than the similarity threshold, a third characteristic parameter can be obtained according to the first characteristic parameter and the second characteristic parameter, and the similarity of the third characteristic parameter and the second characteristic parameter is larger than the similarity of the first characteristic parameter and the second characteristic parameter.

In some embodiments, a preset number of feature parameters may be obtained from a local database of the electronic device, and then the first feature parameters are sequentially compared with the preset number of feature parameters in terms of similarity, and a target feature parameter is selected, where the similarity between the target feature parameter and the first feature parameter is the largest among the preset number of feature parameters. And then comparing whether the similarity between the target characteristic parameter and the first characteristic parameter is larger than the similarity between the first characteristic parameter and the second characteristic parameter, and if so, taking the target characteristic parameter as a third characteristic parameter.

In other embodiments, when the similarity between the first feature parameter and the second feature parameter is less than the similarity threshold, the first feature parameter and the second feature parameter may be input into a pre-trained prediction model, and a third feature parameter output by the pre-trained prediction model may be obtained. The similarity between the third characteristic parameter and the second characteristic parameter is greater than that between the first characteristic parameter and the second characteristic parameter.

Wherein the predictive model may be a neural network model. The prediction model is used for acquiring a third characteristic parameter which is more similar to the second characteristic parameter than the first characteristic parameter according to the first characteristic parameter and the second characteristic parameter. As an example, for example, one of the first feature parameters is the coordinates (3, 4) of the keypoint of the eye region, the corresponding feature parameter of the second feature parameter is the coordinates (1, 0) of the keypoint of the eye region, and the prediction module may predict that the abscissa of the target keypoint is between 1 and 3 and the ordinate is between 0 and 4, thereby obtaining a coordinate range, and taking out one coordinate from the coordinate range as the coordinates of the target keypoint, and determining the coordinates of the target keypoint as the coordinates of the keypoint of the eye region of the third feature parameter.

And S360, generating an interpolation frame image based on the third characteristic parameters.

In some embodiments, the third feature parameter may be used to generate a portrait representation of the target person, as it includes features such as an expression, a key point, an action, a gesture, etc. of the target person. Alternatively, the number of third feature parameters may be one or more, and each third feature parameter may generate one interpolated image, specifically, the third feature parameter may be input to a pre-trained machine learning model, and then an interpolated image corresponding to the third feature parameter may be acquired.

As shown in fig. 6, in some embodiments, S360 includes:

s361, taking the similarity between the third characteristic parameter and the first characteristic parameter as a first similarity, taking the similarity between the first characteristic parameter and the second characteristic parameter as a second similarity, and calculating the difference between the first similarity and the second similarity.

As an example, for example, the first similarity is 95, the second similarity is 85, and the difference between the first similarity and the second similarity is 10.

S362, if the difference between the first similarity and the second similarity is not smaller than the specified value, generating an interpolated image based on the third feature parameter.

With the above example in mind, if the specified value is 5, it may be determined that the difference between the first similarity and the second similarity is not smaller than the specified value, and at this time, it may be indicated that the obtained third characteristic parameter is closer to the second characteristic parameter than the first characteristic parameter, and the degree of the approach is not too small. Then, an inter-frame image may be generated with the third feature parameter, so that the degree of proximity change between the target person in the inter-frame image generated based on the third feature parameter is not too small compared with the target person in the virtual image.

As another example, if the first similarity is 95 and the second similarity is 94, the difference between the first similarity and the second similarity is 1, and if the specified value is 5, it may be determined that the difference between the first similarity and the second similarity is smaller than the specified value, which may indicate that the obtained third feature parameter is closer to the second feature parameter than the first feature parameter, but is very close to the second feature parameter, and if the interpolated image is generated based on the third feature parameter, when the real image is switched to the transition image in the video, the tendency that the target person in the interpolated image in the transition image approaches the target person in the real image may not be substantially seen, thereby causing a problem that the quality of the generated interpolated image is poor.

In this embodiment, the similarity between the third feature parameter and the first feature parameter is taken as the first similarity, the similarity between the first feature parameter and the second feature parameter is taken as the second similarity, the difference between the first similarity and the second similarity is calculated, and if the difference between the first similarity and the second similarity is not smaller than the specified value, the interpolated image is generated based on the third feature parameter. Since the difference between the first similarity and the second similarity is within the specified range, the degree of change of the target person in the interpolated image generated based on the third characteristic parameter is not too large nor too small, and the situation that the degree of change is too large to cause excessive unsmooth and the situation that the degree of change is too small to cause low quality of the generated interpolated image are avoided.

And S370, generating a transition image based on the interpolated image, and switching the real image into the transition image.

In some embodiments, as shown in fig. 7, S370 may include:

s371, if the similarity between the third characteristic parameter and the first characteristic parameter is smaller than the similarity threshold, determining the frame inserting image as the first frame image of the transition image, and determining whether the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold.

In some embodiments, if the similarity between the third feature parameter and the first feature parameter is smaller than the similarity threshold, it indicates that the target person in the interpolated image generated based on the third feature parameter is very close to the target person in the real image, so that the interpolated image generated at this time can be determined as the first frame image of the transition image, and when the real image is switched to the transition image, it is equivalent to switching from the real image to the interpolated image directly, so that the user cannot perceive any change in the target person displayed in the switching process. In addition, since the transition image also needs to be switched to the virtual image, whether the target person in the transition image and the target person in the virtual image are close or not can be judged by determining whether the similarity of the third characteristic parameter and the second characteristic parameter is smaller than a similarity threshold value, and if the similarity of the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value, the difference between the target person in the transition image and the target person in the virtual image is larger.

And S372, if the similarity between the third characteristic parameter and the second characteristic parameter is smaller than a similarity threshold, generating a fourth characteristic parameter based on the second characteristic parameter and the third characteristic parameter, wherein the fourth characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity between the second characteristic parameter and the third characteristic parameter.

In some embodiments, since the difference between the target person in the transition image and the target person in the virtual image is larger when the similarity between the third feature parameter and the second feature parameter is smaller than the similarity threshold, it is further necessary to insert an additional frame image after the first frame image in the transition image, so that the additional frame image is closer to the virtual image, and smooth transition from the transition image to the virtual image is achieved. For a specific implementation of performing frame interpolation between the first frame interpolation image of the transition image and the virtual image, reference may be made to S350, and an implementation of generating the third feature parameter based on the first feature parameter of the real image and the second feature parameter of the virtual image is not described herein.

S373, generating a target frame interpolation image based on the fourth feature parameter.

In S373, the specific embodiment of generating the target frame image based on the fourth feature parameter may refer to S360, and the embodiment of generating the frame image based on the third feature parameter is not described herein.

As shown in fig. 8, in some embodiments, generating the target interpolated image based on the fourth feature parameter S373 may include:

and S3731, taking the similarity between the fourth characteristic parameter and the second characteristic parameter as a third similarity, taking the similarity between the third characteristic parameter and the second characteristic parameter as a fourth similarity, and calculating the difference between the third similarity and the fourth similarity.

And S3732, if the difference value between the third similarity and the fourth similarity is not smaller than a specified value, generating a target frame inserting image based on the fourth characteristic parameter.

The specific embodiments of S3731 to 3732 can refer to S361 to 362, and thus are not described herein. As can be seen from S361 to 362, it is ensured by S3731 to 3732 that the degree of change of the target person in the target frame image generated based on the fourth characteristic parameter is not too large or too small, and the situation that the degree of change is too large to cause excessive unsmooth and the situation that the degree of change is too small to cause low quality of the generated target frame image are avoided.

S374, a transition image is generated based on the inter-frame image and the target inter-frame image.

As shown in fig. 9, S374, generating a transition image based on the inter-frame image and the target inter-frame image may include:

s3741, it is determined whether the similarity between the fourth feature parameter and the second feature parameter is less than a similarity threshold.

Wherein when the similarity between the fourth characteristic parameter and the second characteristic parameter is less than the similarity threshold, it indicates that the distance between the fourth characteristic parameter and the second characteristic parameter is very close.

And S3742, if the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value, determining the target frame image as the last frame image of the transition image, and generating the transition image through the frame image and the target frame image.

When the similarity between the fourth feature parameter and the second feature parameter is smaller than the similarity threshold, the distance between the fourth feature parameter and the second feature parameter is indicated to be very close, so that the target person in the target frame inserting image is very close to the target person in the virtual image, therefore, the target frame inserting image can be determined to be the last frame image of the transition image, and the transition image is generated through the frame inserting image and the target frame inserting image, so that the last frame image of the transition image can be smoothly transited to the virtual image, and further the transition image can be smoothly switched to the virtual image.

It can be understood that, in the present embodiment, the similarity between the feature parameters may be compared by comparing the distances with the corresponding feature points according to the above embodiment.

In this embodiment, if the similarity between the third feature parameter and the first feature parameter is smaller than the similarity threshold, determining the frame-inserted image as the first frame image of the transition image, and determining whether the similarity between the third feature parameter and the second feature parameter is smaller than the similarity threshold; if the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, generating a fourth characteristic parameter based on the second characteristic parameter and the third characteristic parameter, wherein the fourth characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity between the second characteristic parameter and the third characteristic parameter; generating a target frame inserting image based on the fourth characteristic parameter; a transition image is generated based on the interpolated image and the target interpolated image. Thus, the generated transition image can be ensured to be smoothly transited with the real image and the virtual image.

And S380, switching the transition image into a virtual video if the transition image is matched with the virtual image.

The specific embodiment of S380 may refer to S260, and thus will not be described herein.

In this embodiment, if the real image and the virtual image are not matched, a third feature parameter is generated based on the first feature parameter of the real image and the second feature parameter of the virtual image, an interpolation frame image is generated based on the third feature parameter, a transition image is generated based on the interpolation frame image, and the real image is switched to the transition image, so that smooth transition between the real image and the transition image can be ensured.

Fourth embodiment

Referring to fig. 10, fig. 10 is a flowchart illustrating an avatar switching method according to an embodiment of the present disclosure. The method may be applied to the interactive system 100 provided in the first embodiment, and in particular, may be applied to the terminal device 101 or the server 102 in the interactive system, where the method may include:

referring to fig. 10, fig. 10 is a flowchart illustrating an avatar switching method according to an embodiment of the present disclosure. The method may include:

s410, acquiring a real image of the displayed current frame and action intention of a target person in the real image.

S420, according to the action intention, obtaining a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video.

S430, extracting a first characteristic parameter from the real image and extracting a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person.

S440, if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining that the real image and the virtual image are not matched.

S450, if the real image and the virtual image are not matched, generating a third characteristic parameter based on the first characteristic parameter of the real image and the second characteristic parameter of the virtual image, wherein the third characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity of the third characteristic parameter and the first characteristic parameter is larger than the similarity of the first characteristic parameter and the second characteristic parameter.

The specific embodiments of S410 to S450 may refer to S310 to S350, and are not described herein.

S460, inputting the third characteristic parameters into the pre-trained avatar model, and acquiring the frame inserting images corresponding to the third characteristic parameters.

As an example, for example, when the third feature parameter may be a target person feature face feature point, and the face feature point is input into the pre-trained avatar model, an interpolation frame image corresponding to the face feature point may be obtained, where the interpolation frame image includes a face of the target person. For example, when the third feature parameter may be a bone feature point of the target person feature, and the bone feature point is input into the pre-trained avatar model, an interpolated image corresponding to the bone feature point may be obtained, where the interpolated image includes a pose of the target person.

In some embodiments, at S460, the third feature parameter is input into the pre-trained avatar model, and the pre-trained avatar model may be pre-trained before the interpolated image corresponding to the third feature parameter is acquired, and the pre-trained avatar model may include:

first, a sample image of a target person is acquired.

In some embodiments, a sample image of the target person may be acquired by a camera, where the sample image may include a picture, video, or the like. When the terminal device locally or at the cloud end stores the sample image of the target person, the terminal device can directly extract the sample image from the terminal device locally or at the cloud end.

And secondly, extracting sample characteristic parameters and sample frame inserting images of the target person from the sample image.

In some embodiments, the sample characteristic parameters of the target person may be extracted from the sample image by a face recognition technology, a motion recognition technology, an expression recognition technology, or the like, and then the image including the target person in the sample image is taken as a sample frame-inserted image.

And finally, inputting the sample characteristic parameters and the sample frame inserting images into a machine learning model for training to obtain a pre-trained avatar model.

Optionally, in practical application, after the avatar model is trained, a section of real video including the target person may be recorded on site, for example, a real video of real person customer service about one minute is recorded, and then fine tuning (finishing) is performed on the avatar model through the real video, so as to improve the quality of the image generated by the avatar model.

In some embodiments, the machine learning model may select a GAN (Generative Adversarial Networks) model to generate an antagonism network, which can continuously optimize its output through mutual game learning of a Generator (Generator) and a Discriminator (Discriminator), and under the condition that the number of training samples is large enough, a face image approaching a real face infinitely can be obtained through the GAN model, so as to realize the effect of "false spurious". Further, the face image may be a two-dimensional face image, that is, the facial feature points are input into the GAN model, and a two-dimensional face image corresponding to the facial feature points may be obtained.

Alternatively, the process of pre-training the avatar model may be performed before any one of the steps before S440, for example, may be performed before S460, may be performed before S450, may be performed before S410, and is not limited herein.

And S470, generating a transition image based on the interpolated image, and switching the real image into the transition image.

And S480, switching the transition image into a virtual video if the transition image is matched with the virtual image.

In the embodiment, the virtual image is generated by extracting the characteristic parameters of the target person, so that the appearance of the virtual image and the appearance of the target person can be highly similar, and therefore, a user can not easily perceive the switching between the virtual image and the real target person, the switching is more natural, and the user experience is improved. And the third characteristic parameters are input into the pre-trained virtual image model to obtain the frame inserting image corresponding to the third characteristic parameters, so that the generation efficiency of the frame inserting image can be improved, and further, the smooth playing of the picture displayed by the terminal equipment during the switching of the frame inserting image and the real image can be ensured.

Fifth embodiment

Referring to fig. 11, fig. 11 is a flowchart illustrating an avatar switching method according to an embodiment of the present disclosure. The method may be applied to the interactive system 100 provided in the first embodiment, and in particular, may be applied to the terminal device 101 or the server 102 in the interactive system, where the method may include:

S510, acquiring a real image of the displayed current frame and action intention of a target person in the real image.

The specific embodiment of S510 may refer to S110, and thus will not be described herein.

S520, determining whether a virtual video corresponding to the action intention exists.

The terminal device may detect whether a virtual video corresponding to the action intention exists in the virtual video library. Wherein, a plurality of action intents and virtual videos corresponding to the plurality of action intents can be stored in the virtual video library in advance. Specifically, the relationship between the action intent and the virtual video may be as shown in table 1. Alternatively, the virtual video library may be stored locally at the terminal device or may be stored in a server.

And S530, if the virtual video corresponding to the action intention exists, acquiring the virtual video.

If the virtual video corresponding to the action intention is detected, the virtual video corresponding to the action intention can be obtained from the virtual video library.

S540, if the answer template does not exist, acquiring an answer template corresponding to the action intention.

If the virtual video corresponding to the action intention is detected in the virtual video stock, an answer template corresponding to the action intention can be obtained from an answer template library, wherein the answer template is a conversation template for carrying out conversation with a client, and the answer template comprises text information which needs to be answered by customer service in the conversation with the client. Wherein, different reply templates may be associated with different action intents in advance, for example, when the action intention is the intention of customer service to end the dialogue, the corresponding reply templates may include text information such as "this service ends up", "welcome you to use next time", and the like. The reply template library may be stored locally in the terminal device or may be stored in an end server, which is not limited herein.

S550, extracting characteristic parameters of the target person from the real image, generating a virtual video corresponding to the action intention based on the characteristic parameters and the answer template, and acquiring a virtual image of a first frame of the virtual video.

In some embodiments, an avatar generation model may be trained in advance, the avatar generation model may be generated from sample feature parameters, sample images and sample text information of the target person, different sample texts after training correspond to different sample feature parameters, the sample images may be trained to obtain basic face images, basic action images, basic gesture images and the like of the target person, the feature parameters and the answer templates are input into the avatar generation model, feature parameters corresponding to the answer templates may be obtained, and then the basic face images, the basic action images, the basic gesture images and the like of the target person are updated through the feature parameters corresponding to the answer templates, so that a virtual video corresponding to the answer templates is obtained. The first frame image is then truncated from the virtual video as a virtual image of the first frame of the virtual video.

S560, if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of the target person, and the similarity between the inserted frame image and the real image is larger than that between the real image and the virtual image.

S570, if the transition image and the virtual image match, switching the transition image to the virtual video.

The specific embodiments of S560 to S570 refer to S130 to S140, and are not described herein.

Considering that the virtual image corresponding to the action intention may not be preset, in the present embodiment, by determining whether or not there is a virtual video corresponding to the action intention; if yes, obtaining a virtual video corresponding to the action intention; if the answer template does not exist, acquiring an answer template corresponding to the action intention; the method comprises the steps of extracting characteristic parameters of a target person from a real image, generating a virtual video corresponding to an action intention based on the characteristic parameters and an answer template, and acquiring a virtual image of a first frame of the virtual video, so that when the virtual image corresponding to the action intention is not available, the virtual image corresponding to the action intention can be generated on site, and the virtual image corresponding to the action intention can be ensured to be obtained.

Sixth embodiment

Referring to fig. 12, fig. 12 is a flowchart illustrating an avatar switching method according to an embodiment of the present disclosure. The method may be applied to the interactive system 100 provided in the first embodiment, and in particular, may be applied to the terminal device 101 or the server 102 in the interactive system, where the method may include:

s610, when a real image including a target person is displayed, it is determined whether the real image satisfies a switching condition.

In some embodiments, the switching condition may be a condition for determining whether to switch the real image to the virtual image. As an example, for example, when a real person customer service finishes the explanation of a customer and needs to switch to a virtual robot customer service to talk with the customer, the terminal device recognizes that the explanation of the real person customer service is finished, and determines that the switching condition is satisfied. For example, if the customer service presses the switch button at the customer service end, it may be determined that the switch condition is satisfied. For another example, if the real person customer service makes a gesture for switching, and the terminal device recognizes the gesture, it may be determined that the switching condition is satisfied. For example, if the real person customer service speaks a sentence for switching, the terminal device recognizes the sentence, and it may be determined that the switching condition is satisfied.

S620, if the real image meets the switching condition, acquiring the real image of the displayed current frame and the action intention of the target person in the real image.

S630, according to the action intention, obtaining a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video.

And S640, if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of the target person, and the similarity between the inserted frame image and the real image is larger than that between the real image and the virtual image.

And S650, switching the transition image into a virtual video if the transition image and the virtual image are matched.

The specific embodiments of S620 to S650 can refer to S110 to S140, and are not described herein.

In the present embodiment, by determining whether or not the real image satisfies the switching condition when the real image including the target person is displayed; if the real image meets the switching condition, acquiring the real image of the displayed current frame and the action intention of the target person in the real image, so that the real image can be switched at a proper time.

Seventh embodiment

Referring to fig. 13, fig. 13 is a flowchart illustrating an avatar switching method according to an embodiment of the present disclosure. The method may be applied to the interactive system 100 provided in the first embodiment, and in particular, may be applied to the terminal device 101 or the server 102 in the interactive system, where the method may include:

s710, when displaying a real image including a target person, determining whether a switching instruction is received.

In some embodiments, the switching instruction may be an instruction for instructing to switch the real image displayed by the terminal device into the virtual image. The terminal device can detect whether a switching instruction is received or not in real time. Optionally, the switching instruction may be generated according to a touch operation performed by a person customer service on the customer service side, or may be generated automatically based on certain specific information (for example, time information, statement information, etc.), or may be generated based on some statement information and action information of a customer collected on site, which is not limited herein.

S720, if a switching instruction is received, determining that the real image including the target person is displayed and meets the switching condition.

As an example, the switching instruction is statement information or gesture information with an end dialog intention uttered by a person or a customer, and when the terminal device receives (or recognizes) the statement information or gesture information, it may be determined that playing a real image including a target person satisfies the switching condition.

And S730, if the real image meets the switching condition, acquiring the real image of the displayed current frame and the action intention of the target person in the real image.

S740, acquiring the real image of the displayed current frame and the action intention of the target person in the real image.

S750, according to the action intention, obtaining a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video.

S760, if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of the target person, and the similarity between the inserted frame image and the real image is larger than that between the real image and the virtual image.

And S770, switching the transition image into a virtual video if the transition image and the virtual image are matched.

The specific embodiments of S740 to S770 may refer to S110 to S140, and are not described herein.

In the present embodiment, by determining whether a switching instruction is received when a real image including a target person is displayed; if a switching instruction is received, determining that the real image including the target person is displayed and meets the switching condition; if the real image meets the switching condition, the real image of the displayed current frame and the action intention of the target person in the real image are acquired, so that the switching between the real image and the virtual image can be accurately and flexibly carried out.

Eighth embodiment

Referring to fig. 14, fig. 14 illustrates an avatar switching device provided in an embodiment of the present application, and the avatar switching device 800 includes: a first acquisition module 810, a second acquisition module 820, a first switching module 830, and a second switching module 840. Wherein: the first obtaining module 810 is configured to obtain a real image of a current frame displayed and an action intention of a target person in the real image; the second obtaining module 820 is configured to obtain, according to the action intention, a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video; the first switching module 830 is configured to generate a transition image based on the real image and the virtual image if the real image and the virtual image are not matched, and switch the real image to the transition image, where the transition image includes an interpolated image of the target person, and a similarity between the interpolated image and the real image is greater than a similarity between the real image and the virtual image; the second switching module 840 is configured to switch the transition image to the virtual video if the transition image and the virtual image match.

Optionally, the avatar switching device 800 further includes:

Optionally, the first feature parameter includes a first feature point, the second feature parameter includes a second feature point, the first feature point and the second feature point are the same feature point of the target person, and the avatar switching device 800 further includes:

Optionally, the first switching module 830 includes:

Optionally, the frame-inserting image generating submodule is specifically configured to take a similarity between the third feature parameter and the first feature parameter as a first similarity, take a similarity between the first feature parameter and the second feature parameter as a second similarity, and calculate a difference between the first similarity and the second similarity; and if the difference value between the first similarity and the second similarity is not smaller than the specified value, generating an interpolated frame image based on the third characteristic parameter.

Optionally, the first switching module 830 further includes:

Optionally, the target frame-inserted image generating sub-module is specifically configured to take a similarity between the fourth feature parameter and the second feature parameter as a third similarity, take the similarity between the third feature parameter and the second feature parameter as a fourth similarity, and calculate a difference between the third similarity and the fourth similarity; and if the difference value between the third similarity and the fourth similarity is not smaller than the specified value, generating a target frame inserting image based on the fourth characteristic parameter.

Optionally, the transition image generation sub-module is specifically configured to determine whether the similarity between the fourth feature parameter and the second feature parameter is less than a similarity threshold; and if the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining the target frame inserting image as the last frame image of the transition image, and generating the transition image through the frame inserting image and the target frame inserting image.

Optionally, the frame-inserting image generating submodule is specifically configured to input a third feature parameter into the pre-trained avatar model, and obtain a frame-inserting image corresponding to the third feature parameter.

Optionally, the frame-inserting image generating sub-module is specifically configured to input a third feature parameter into the pre-trained avatar model, and acquire a frame-inserting image corresponding to the third feature parameter, where the frame-inserting image generating sub-module includes: acquiring a currently recorded real video comprising a real image, and performing fine adjustment on a pre-trained virtual image model through the real video; and inputting the third characteristic parameters into the finely tuned virtual image model, and obtaining the frame inserting images corresponding to the third characteristic parameters.

Optionally, the matching determination module is further configured to determine whether the real image and the virtual image match through an optical flow method.

Optionally, the avatar switching device 800 further includes:

Optionally, the switching detection module is specifically configured to determine whether a switching instruction is received; if a switching instruction is received, determining that the real image including the target person is played to meet the switching condition.

The avatar switching device 800 provided in the embodiment of the present application is configured to implement the corresponding avatar switching method in the foregoing method embodiment, and has the beneficial effects of the corresponding method embodiment, which is not described herein again.

It can be clearly understood by those skilled in the art that the avatar switching device provided in the embodiment of the present application can implement each process in the foregoing method embodiment, and for convenience and brevity of description, the specific working process of the foregoing description device and module may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In the embodiments provided herein, the modules shown or discussed may be coupled or directly coupled or communicatively connected to each other via some interface, whether an apparatus or module is indirectly coupled or communicatively connected, whether electrically, mechanically or otherwise.

In addition, each functional module in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

Ninth embodiment

Referring to fig. 15, a block diagram of an electronic device 900 according to an embodiment of the present application is shown. The electronic device 900 may be an electronic device capable of running applications, such as a smart phone, tablet computer, etc. The electronic device 900 in this application may include one or more of the following components: a processor 910, a memory 920, and one or more applications, wherein the one or more applications may be stored in the memory 920 and configured to be executed by the one or more processors 910, the one or more applications configured to perform the method as described in the foregoing method embodiments.

Processor 910 may include one or more processing cores. The processor 910 utilizes various interfaces and lines to connect various portions of the overall electronic device 900, perform various functions of the electronic device 900, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 920, and invoking data stored in the memory 920. Alternatively, the processor 910 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 910 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 910 and may be implemented solely by a single communication chip.

The Memory 920 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Memory 920 may be used to store instructions, programs, code, sets of codes, or instruction sets. The memory 920 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (e.g., a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, and the like. The storage data area may also store data created by the electronic device 900 in use (e.g., phonebook, audiovisual data, chat log data), and the like. The electronic device may be a terminal device in the above embodiment, or may be a server in the above embodiment.

Tenth embodiment

Referring to fig. 16, a block diagram of a computer readable storage medium according to an embodiment of the present application is shown. The computer readable storage medium 1000 has stored therein program code that can be invoked by a processor to perform the methods described in the method embodiments described above.

The computer readable storage medium 1000 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, computer readable storage medium 1000 includes non-volatile computer readable media (non-transitory computer-readable storage medium). The computer readable storage medium 1000 has storage space for program code 1010 that performs any of the method steps described above. The program code can be read from or written to one or more computer program products. Program code 1010 may be compressed, for example, in a suitable form.

In summary, the method, the device, the electronic device and the storage medium for switching the virtual image provided by the embodiment of the application are an virtual image switching system composed of a camera array, a mobile platform and a controller, wherein the camera array is arranged around a shooting area, and each camera in the camera array is arranged according to a preset lens view angle; the mobile platform is arranged in the shooting area in a sliding manner and is used for bearing the shooting object to move in the shooting area; the controller is respectively and electrically connected with the camera array and the mobile platform, and is used for controlling the mobile platform to move to a specified position and controlling a specified camera in the camera array to shoot, so that image data of a shooting object at a specified shooting angle is collected to serve as digital person data, and therefore a model can be shot from a plurality of shooting angles quickly and automatically, the digital person data can be collected, and the collection efficiency of the digital person data is effectively improved.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, one of ordinary skill in the art will appreciate that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not drive the essence of the corresponding technical solutions to depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. An avatar switching method, comprising:

acquiring a real image of a displayed current frame and an action intention of a target person in the real image;

according to the action intention, obtaining a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video;

if the real image and the virtual image are not matched, generating a transition image based on the real image and the virtual image, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of the target person, and the similarity of the inserted frame image and the real image is larger than that of the real image and the virtual image;

and if the transition image is matched with the virtual image, switching the transition image into the virtual video.

2. The method of claim 1, further comprising, prior to the if the real image and the virtual image do not match, generating a transition image based on the real image and the virtual image and switching the real image to the transition image:

extracting a first characteristic parameter from the real image and a second characteristic parameter from the virtual image, wherein the first characteristic parameter and the second characteristic parameter are the same characteristic parameter of the target person;

And if the similarity between the first characteristic parameter and the second characteristic parameter is smaller than a similarity threshold value, determining that the real image and the virtual image are not matched.

3. The method of claim 2, wherein the first feature parameter comprises a first feature point and the second feature parameter comprises a second feature point, the first feature point and the second feature point being the same feature point of the target person, and wherein before determining that the real image and the virtual image do not match if the similarity of the first feature parameter and the second feature parameter is less than a similarity threshold, further comprising:

determining whether a distance between the first feature point and the second feature point is not less than a distance threshold;

and if the distance between the first characteristic point and the second characteristic point is not smaller than the distance threshold, determining that the similarity between the first characteristic parameter and the second characteristic parameter is smaller than the similarity threshold.

4. The method of claim 2, wherein the generating a transition image based on the real image and the virtual image and switching the real image to the transition image comprises:

Generating a third characteristic parameter based on the first characteristic parameter of the real image and the second characteristic parameter of the virtual image, wherein the third characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity of the third characteristic parameter and the first characteristic parameter is larger than the similarity of the first characteristic parameter and the second characteristic parameter;

generating the frame inserting image based on the third characteristic parameter;

and generating the transition image based on the frame inserting image, and switching the real image into the transition image.

5. The method of claim 4, wherein generating the interpolated image based on the third feature parameter comprises:

taking the similarity between the third characteristic parameter and the first characteristic parameter as a first similarity, taking the similarity between the first characteristic parameter and the second characteristic parameter as a second similarity, and calculating the difference between the first similarity and the second similarity;

and if the difference value between the first similarity and the second similarity is not smaller than a specified value, generating the frame inserting image based on the third characteristic parameter.

6. The method of claim 4, wherein the generating the transition image based on the interpolated image comprises:

if the similarity between the third characteristic parameter and the first characteristic parameter is smaller than the similarity threshold, determining the frame inserting image as the first frame image of the transition image, and determining whether the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold;

if the similarity between the third characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, generating a fourth characteristic parameter based on the second characteristic parameter and the third characteristic parameter, wherein the fourth characteristic parameter and the first characteristic parameter are the same characteristic parameter of the target person, and the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity between the second characteristic parameter and the third characteristic parameter;

generating a target frame inserting image based on the fourth characteristic parameter;

the transition image is generated based on the interpolated image and the target interpolated image.

7. The method of claim 6, wherein generating the target interpolated image based on the fourth characteristic parameter comprises:

Taking the similarity of the fourth characteristic parameter and the second characteristic parameter as a third similarity, taking the similarity of the third characteristic parameter and the second characteristic parameter as a fourth similarity, and calculating a difference value between the third similarity and the fourth similarity;

and if the difference value between the third similarity and the fourth similarity is not smaller than a specified value, generating the target interpolation frame image based on the fourth characteristic parameter.

8. The method of claim 7, wherein the generating the transition image based on the interpolated image and the target interpolated image comprises:

determining whether the similarity of the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold value;

and if the similarity between the fourth characteristic parameter and the second characteristic parameter is smaller than the similarity threshold, determining the target frame inserting image as the last frame image of the transition image, and generating the transition image through the frame inserting image and the target frame inserting image.

9. The method of claim 4, wherein generating the interpolated image based on the third feature parameter comprises:

And inputting the third characteristic parameters into a pre-trained avatar model, and acquiring the frame inserting image corresponding to the third characteristic parameters.

10. The method of claim 9, wherein the inputting the third feature parameter into a pre-trained avatar model, obtaining the interpolated image corresponding to the third feature parameter, comprises:

acquiring a currently recorded real video comprising the real image, and performing fine adjustment on a pre-trained virtual image model through the real video;

and inputting the third characteristic parameters into the finely tuned virtual image model, and obtaining the frame inserting image corresponding to the third characteristic parameters.

11. The method of claim 9, further comprising, prior to the generating the interpolated image based on the third characteristic parameter:

acquiring a sample image of the target person;

extracting sample characteristic parameters and sample frame inserting images of the target person from the sample image;

and inputting the sample characteristic parameters and the sample frame inserting images into a machine learning model for training to obtain a pre-trained virtual image model.

12. The method according to any one of claims 1 to 11, further comprising, before said if said real image and said virtual image do not match, generating a transition image based on said real image and said virtual image and switching said real image to said transition image:

determining whether the real image and the virtual image are matched through an optical flow method.

13. The method of any one of claims 1 to 11, further comprising, prior to the acquiring the virtual video corresponding to the action intent and the virtual image of the first frame of the virtual video:

determining whether a virtual video corresponding to the action intention exists;

if yes, executing the virtual video corresponding to the action intention;

if not, obtaining an answer template corresponding to the action intention;

and extracting characteristic parameters of the target person from the real image, and generating a virtual video corresponding to the action intention based on the characteristic parameters and the answer template.

14. The method according to any one of claims 1 to 11, further comprising, prior to said acquiring a true image of a current frame displayed and an action intention of a target person in said true image:

Determining whether a real image including a target person satisfies a switching condition when the real image is displayed;

and if the real image meets the switching condition, executing the real image of the current frame and the action intention of the target person in the real image.

15. The method of claim 14, wherein the determining whether the real image satisfies a switching condition comprises:

determining whether a switching instruction is received;

if a switching instruction is received, determining that the real image including the target person is played to meet the switching condition.

16. An avatar switching device, comprising:

the first acquisition module is used for acquiring a real image of the displayed current frame and an action intention of a target person in the real image;

the second acquisition module is used for acquiring a virtual video corresponding to the action intention and a virtual image of a first frame of the virtual video according to the action intention;

the first switching module is used for generating a transition image based on the real image and the virtual image if the real image and the virtual image are not matched, and switching the real image into the transition image, wherein the transition image comprises an inserted frame image of the target person, and the similarity of the inserted frame image and the real image is larger than that of the real image and the virtual image;

And the second switching module is used for switching the transition image into the virtual video if the transition image is matched with the virtual image.

17. An electronic device, comprising:

a memory;

one or more processors coupled with the memory;

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-15.

18. A computer readable storage medium having stored therein program code which is callable by a processor to perform the method of any one of claims 1 to 15.