WO2015090147A1

WO2015090147A1 - Virtual video call method and terminal

Info

Publication number: WO2015090147A1
Application number: PCT/CN2014/093187
Authority: WO
Inventors: 李刚
Original assignee: 百度在线网络技术（北京）有限公司
Priority date: 2013-12-20
Filing date: 2014-12-05
Publication date: 2015-06-25
Also published as: JP2016537922A; CN103647922A; KR101768980B1; KR20160021146A

Abstract

Provided are a virtual video call method and terminal, the method comprising: capturing a video image of a first terminal user; conducting face recognition on the video image to obtain facial expression information; sending the facial expression information to a second terminal establishing a call connection with the first terminal, the facial expression information being used to enable the second terminal to compose and display a video image according to the facial expression information and a facial image model preset in the second terminal. The method in an embodiment of the present invention uses face recognition technology to extract the facial expression information in a sending terminal (for example, the first terminal), and composes and restores the facial image in a receiving terminal (for example, the second terminal) according to the sent facial expression information and a preset facial image module. The extremely small amount of transmitted facial expression data results in a significant reduction in the amount of data transmitted in a video call process, thus providing a smoother video call, and reducing the impact of limited network bandwidth or limited traffic on the video call.

Description

Virtual video calling method and terminal

Cross-reference to related applications

This application claims the priority of Chinese Patent Application No. 201310714667.3, filed on Dec. 20, 2013 by Baidu Online Network Technology (Beijing) Co., Ltd., entitled "Virtual Video Calling Method and Terminal".

Technical field

The present invention relates to the field of communications technologies, and in particular, to a virtual video calling method and terminal.

Background technique

With the rapid increase of network broadband and the development and popularization of hardware devices, the market for video calls has also entered the fast lane of development. At present, the main method of virtual video call is to collect an image on the transmitting end, determine a face area in the image, extract facial feature information in the face area, and send the extracted facial feature information to the receiving end, and use it at the receiving end. The facial feature information reproduces the facial expression of the corresponding user.

The current drawback is that since the facial features of each person are different, the data of the extracted facial feature information is still very large, and the above method also needs to reconstruct a specific target facial model according to the facial feature information (for example, the facial model of the user at the transmitting end) ). Therefore, it can be seen that the amount of video data transmitted in the prior art is very large, consumes a large amount of data traffic, and can also cause video calls to be unsmooth, and is not suitable for mobile networks with limited bandwidth or limited traffic, thus seriously hindering video. The popularity and promotion of calls.

Summary of the invention

The present invention aims to solve at least one of the above technical problems.

To this end, a first object of the present invention is to propose a virtual video calling method. The method greatly reduces the amount of data transmitted during the video call, saves data traffic, thereby making the video call more smooth, reducing the impact of limited network bandwidth or limited traffic on the video call, and improving the user experience.

A second object of the present invention is to propose another virtual video calling method.

A third object of the present invention is to propose a terminal.

A fourth object of the present invention is to propose another terminal.

A fifth object of the present invention is to provide a terminal device.

A sixth object of the present invention is to propose another terminal device.

In order to achieve the above object, a virtual video calling method according to an embodiment of the first aspect of the present invention includes: collecting a first terminal a video image of the user; facial recognition of the video image to obtain facial expression information; transmitting the facial expression information to a second terminal that establishes a call with the first terminal, the facial expression information being used to cause the The second terminal synthesizes a video image according to the facial expression information and a face image model preset to the second terminal and displays the video image.

The virtual video calling method of the embodiment of the present invention uses facial recognition technology to extract facial expression information at a transmitting end (for example, a first terminal), and at the receiving end (for example, the second terminal) according to the sent facial expression information and the preset facial face. The image model realizes simple synthesis and restoration of the face image, since the information transmitted at the transmitting end and the receiving end is limited to the facial expression information, and since the facial expression information does not need to synthesize a complete facial image, the amount of information included is small, and the encoding is small. After that, the data volume of the facial expression information can occupy only a few bit bytes, so compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, and the video call is smoother. It reduces the impact of limited network bandwidth or limited traffic on video calls, and is especially suitable for transmission in mobile networks, which improves the user experience. In addition, the second terminal does not need to reconstruct the face image model of the first terminal user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust. .

In order to achieve the above object, the virtual video calling method of the second aspect of the present invention includes: receiving facial expression information of a video image sent by a first terminal that establishes a call with the second terminal; and according to the facial expression information and the preset The face image model of the second terminal synthesizes a video image and displays it.

The terminal of the third aspect of the present invention includes: an acquisition module, configured to collect a video image of a user; and an identification module, configured to perform facial recognition on the video image to obtain facial expression information; and send a module, Sending the facial expression information to a second terminal that establishes a call with the terminal, the facial expression information is used to cause the second terminal to preset a face according to the facial expression information and the second terminal The image model synthesizes the video image and displays it.

The terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, so that the second terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, due to transmission The information is limited to facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image, The amount of information included is small, and the amount of data of the facial expression information after encoding can occupy only a few bit bytes, so the amount of data transmitted during the video call is greatly reduced, and the data traffic is saved, compared with the information transmitted by the background art. Therefore, the video call is smoother, and the influence of limited network bandwidth or limited traffic on the video call is reduced, which is particularly suitable for transmission in the mobile network, thereby improving the user experience. In addition, there is no need to reconstruct the face image model of the user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust.

The terminal of the fourth aspect of the present invention includes: a receiving module, configured to receive facial expression information of a video image sent by a first terminal that establishes a call with the terminal; and a synthesizing module, configured to use the facial expression according to the facial expression The information and the preset face image model of the terminal synthesize a video image and display.

The terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, and the first terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, because the sending The information transmitted by the terminal and the receiving end is limited to the facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image, the amount of information included is small, and the data amount of the facial expression information after encoding can occupy only a few bit bytes. Therefore, compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, the video call is smoother, and the influence of limited network bandwidth or limited traffic on the video call is reduced. Particularly suitable for transmission in mobile networks, improving the user experience. In addition, there is no need to reconstruct the face image model, and only need to display the corresponding facial expression on the preset face image model according to the facial expression information, which is easy to adjust, so that the terminal is easy to adjust.

In order to achieve the above object, a terminal device according to an embodiment of the fifth aspect of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs are stored in the memory, when When the one or more processors are executed, the following operations are performed: collecting a video image of the user of the terminal device; performing facial recognition on the video image to obtain facial expression information; and transmitting the facial expression information to the terminal device Establishing a second terminal of the call, the facial expression information is used to cause the second terminal to synthesize and display a video image according to the facial expression information and a face image model preset in the second terminal.

In order to achieve the above object, a terminal device of a sixth aspect of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs are stored in the memory, when When the one or more processors are executed, performing the following operations: receiving facial expression information of a video image sent by the first terminal that establishes a call with the terminal device; according to the facial expression information and a person preset in the terminal device The face image model synthesizes the video image and displays it.

The additional aspects and advantages of the invention will be set forth in part in the description which follows.

DRAWINGS

The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from

1 is a flow chart of a virtual video call method according to an embodiment of the present invention;

2 is a flowchart of a virtual video call method according to another embodiment of the present invention;

3 is a flowchart of a virtual video call method according to still another embodiment of the present invention;

4 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a terminal according to another embodiment of the present invention; and

FIG. 6 is a schematic structural diagram of a terminal according to still another embodiment of the present invention.

detailed description

The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are intended to be illustrative of the invention and are not to be construed as limiting. Rather, the invention is to cover all modifications, modifications and equivalents within the spirit and scope of the appended claims.

In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present invention, it should be noted that the terms "connected" and "connected" are to be understood broadly, and may be, for example, a fixed connection, a detachable connection, or an integral, unless otherwise explicitly defined and defined. Ground connection; it can be mechanical connection or electrical connection; it can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention can be understood in a specific case by those skilled in the art. Further, in the description of the present invention, the meaning of "a plurality" is two or more unless otherwise specified.

Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code that includes one or more executable instructions for implementing the steps of a particular logical function or process. And the scope of the preferred embodiments of the invention includes additional implementations, in which the functions may be performed in a substantially simultaneous manner or in an opposite order depending on the functions involved, in the order shown or discussed. It will be understood by those skilled in the art to which the embodiments of the present invention pertain.

In order to solve the problem that the amount of video data transmitted during a video call is excessive, the present invention provides a virtual video calling method and terminal. A virtual video call method and terminal according to an embodiment of the present invention are described below with reference to the accompanying drawings.

A virtual video calling method, comprising the steps of: collecting a video image of a first terminal user; performing face recognition on the video image to obtain facial expression information; and transmitting the facial expression information to a second terminal that establishes a call with the first terminal, the face The expression information is used to enable the second terminal to combine the facial expression information with the face image model preset in the second terminal. Become a video image and display it.

1 is a flow chart of a virtual video call method in accordance with one embodiment of the present invention.

As shown in FIG. 1, the virtual video calling method includes the following steps:

S101. Collect a video image of the first terminal user.

Specifically, the first terminal may perform shooting by using a camera of a self-contained or peripheral device to collect a video image of the first terminal user.

S102. Perform facial recognition on the video image to obtain facial expression information.

Specifically, the first terminal may perform facial recognition on the video image by using various computer image processing technologies to obtain facial expression information, such as face recognition by genetic algorithm, face recognition of a neural network, and the like. The amount of data on facial expressions is very small. The process of acquiring facial expressions will be described in detail in the subsequent embodiments.

S103. Send the facial expression information to the second terminal that establishes a call with the first terminal, where the facial expression information is used to cause the second terminal to synthesize and display the video image according to the facial expression information and the facial image model preset in the second terminal.

The first terminal sends a video call request to the second terminal by using the server, or the second terminal sends a video call request to the first terminal by using the server. If the second terminal agrees to the video call request of the first terminal, or the first terminal agrees to the video call request of the second terminal, the server can establish a video call between the first terminal and the second terminal.

Specifically, the first terminal may encode the facial expression information of the first terminal user to form a digital expression, and send the facial expression information to the second terminal by using the video call established by the server.

After the first terminal sends the facial expression information of the first terminal user to the second terminal, the second terminal may synthesize according to the facial expression information of the first terminal user and the preset facial image model to reproduce the first terminal user. The facial image is displayed in the video call interface of the second terminal. The preset face image model can be set by the user himself or by default. In addition, the user of the second terminal may also synthesize with his own photo or photo and facial expression information of the first terminal user to reproduce the facial image of the first terminal user.

In addition, the video can be regarded as a video image of one frame and one frame. In the first terminal, the facial expression information of each frame image is acquired, and in the second terminal, the facial expression information is also synthesized in each frame image, thereby realizing the virtual video. The call, wherein the synthesis process is prior art, is not described here.

The virtual video calling method of the embodiment of the present invention uses facial recognition technology to extract facial expression information at a transmitting end (for example, a first terminal), and at the receiving end (for example, the second terminal) according to the sent facial expression information and the preset facial face. The image model realizes simple synthesis and restoration of the face image, since the information transmitted at the transmitting end and the receiving end is limited to the facial expression information, and since the facial expression information does not need to synthesize a complete facial image, the amount of information included is small, and the encoding is small. After that, the data volume of the facial expression information can occupy only a few bit bytes, so compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, and the video call is smoother. Lower The limited network bandwidth or limited traffic impact on video calls, especially suitable for transmission in mobile networks, improving the user experience. In addition, the second terminal does not need to reconstruct the face image model of the first terminal user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust. .

In one embodiment of the present aspect, facial recognition of the video image to obtain facial expression information (ie, S102) includes facial recognition of the video image to obtain facial features, and facial expression information is extracted in the facial features.

Specifically, first, facial features are extracted from a video image, which may be, but is not limited to, geometric information including facial features such as eyes, nose, mouth, ears, etc., such as the position of the eyebrows, the angle of the mouth, the eyes Size and so on. It should be understood that the facial features can also be obtained by other methods. For the new face recognition technology in the future, the first terminal of the embodiment can use the facial recognition of the video image to obtain facial features. Thereafter, the facial expression information is extracted in the facial feature, and the first terminal may analyze the facial feature to obtain facial expression information of the first terminal user.

In one embodiment of the present aspect, the facial expression information includes one or more of the following: whether to frown, open or close the mouth, the curvature of the corner of the mouth, the opening or closing of the eye, the size of the eye, whether there is tears Wait.

In addition, the facial expression information mainly reflects the emotional information of the person. For example, by analyzing the position of the eyebrows, the angle of the mouth, the size of the eyes, etc., the expression of the user can be obtained by smiling, laughing, crying, depressed, excited or angry. Wait. Similarly, the existing multiple facial expression information analysis techniques can be used for analysis, for example, a machine learning algorithm, etc. In addition, in the future, an algorithm having similar functions can be used by the first terminal of the embodiment to perform facial feature analysis. Get facial expression information.

The first terminal may encode the facial expression information of the first terminal user to form a digital expression. For example, it may be a simple number of characters and only occupy a few bits. For example, the character "D:" may be directly sent to "Laughter". "Encoding transmission, etc., of course, the encoding method can be more abundant. Here, for convenience of understanding only, the facial expression information is sent to the second terminal through the video call established by the server.

It should be noted that the preset face image model is more diverse. In an embodiment of the present invention, the face image model preset at the second terminal includes a real face image model and a cartoon face image model. In addition, it may be a photo or the like stored in the second terminal.

In order to make the video call process more personalized and fun, the second terminal user can select a favorite cartoon face image model according to the needs of the user. In an embodiment of the present invention, the virtual video call method further includes: The user of the second terminal provides at least one cartoon face image model; the second terminal receives the cartoon face image model selected by the user of the second terminal, and synthesizes and displays according to the facial expression information and the selected face image model. Specifically, after the user of the second terminal selects a favorite cartoon face image model for the first terminal user according to his own needs, the second terminal receives the cartoon face image model selected by the user of the second terminal, and according to the first terminal The facial expression information of the user and the cartoon face image model selected by the second terminal user are combined to reproduce the facial image of the first terminal user and displayed in the second terminal video call interface. For example, the facial expression information of the first end user is that the mouth is open, The curvature of the corner of the mouth is many and the eyes are slightly stunned. At this time, the first end user is laughing, the second terminal user selects the superhuman face image model, and the second terminal sets the facial expression information of the first terminal user and the superman cartoon image. An image is synthesized to reproduce the facial expression of the first end user as a laugh.

Another embodiment of the present invention also proposes a virtual video calling method.

2 is a flow chart of a virtual video call method in accordance with another embodiment of the present invention.

As shown in FIG. 2, the virtual video calling method includes the following steps:

S201. Receive facial expression information of a video image sent by the first terminal that establishes a call with the second terminal.

Specifically, first, the first terminal sends a video call request to the second terminal by using the server, or the second terminal sends a video call request to the first terminal by using the server. If the second terminal agrees to the video call request of the first terminal, or the first terminal agrees to the video call request of the second terminal, the server can establish a video call between the first terminal and the second terminal.

The first terminal can capture the video image of the first terminal user by using a camera of the self-contained or peripheral device, and can obtain the facial expression information according to the method described in any of the foregoing embodiments and send the facial expression information to the second terminal. .

S202. Synthesize a video image according to the facial expression information and the face image model preset in the second terminal and display.

Specifically, the second terminal may synthesize according to the facial expression information of the first terminal user and the preset facial image model to reproduce the facial image of the first terminal user, and display it in the video call interface of the second terminal. The preset face image model can be set by the user himself or by default. In addition, the user of the second terminal may also use his own photo or photo display of the first terminal user as a face image model to reproduce the facial image of the first terminal user.

3 is a flow chart of a virtual video call method in accordance with yet another embodiment of the present invention.

As shown in FIG. 3, the virtual video calling method includes the following steps:

S301. Receive facial expression information of a video image sent by the first terminal that establishes a call with the second terminal.

S302, selecting a real or cartoon face image model, and selecting a real or cartoon face image model for synthesizing the video image with the facial expression information and displaying.

Specifically, in order to make the video call process more personalized and fun, the second terminal may provide the user with multiple real or cartoon face image models, for example, multiple cartoon face image models, or photos, real people. The face image model, etc., the second end user can select his or her favorite face image model according to his own needs. For example, the facial expression information of the first end user is open, the curvature of the corner of the mouth is large, and the eyes are slightly stunned. At this time, the first terminal user is laughing, the second terminal user is selecting the superhuman face image model, and the second The terminal synthesizes the facial expression information of the first terminal user and the cartoon image of the superman to reproduce the image of the first terminal user's facial expression as a big laugh.

S303. Synthesize a video image according to the selected real or cartoon face image model and facial expression information and display.

In the virtual video calling method of the embodiment of the present invention, the user of the second terminal may select a real or cartoon face image model, and synthesize and display the video image according to the selected real or cartoon face image model and facial expression information. Increased fun and improved user experience.

In an embodiment of the present invention, in order to make the reproduced facial image more authentic, the second terminal may acquire a real face image model of the first terminal user to perform facial expression reproduction. Specifically, the first terminal may perform a video image captured by the camera and analyze the captured video image to obtain a real face image model, or the first terminal may analyze the face image selected by the user to obtain the real person. The face image model, without taking a picture, is then sent to the second terminal for storage.

In addition, the second terminal may further acquire a face image of the first terminal user, and perform analysis according to the face image to obtain a real face image model, that is, the real face image model may be generated in the second terminal. The second terminal may synthesize the facial image of the first terminal user according to the real face image model of the first terminal user and the facial expression information of the first terminal user to reproduce the video call interface of the second terminal. Thereby, the reproduced facial image can be made more authentic.

It should be understood that the real face image model may be formed only once, sent to the second terminal for storage, and only facial expression information may be transmitted during subsequent data transmission. In addition, a selection button may also be provided in the second terminal, and the second terminal user may select whether to display the true facial image of the first terminal user or the cartoon face image model to reproduce the facial image. More specifically, the user of the second terminal may select according to a specific network environment and terminal performance. For example, a cartoon face image model may be selected in the mobile terminal, and only facial expression information is sent to implement a video call, and the personal computer may Choose a real face image model to add realism.

The virtual video calling method of the embodiment of the present invention can reproduce the facial image of the first terminal user according to the real face image model and the facial expression information of the first terminal user, thereby making the reproduced facial image more authentic, The real face image model can be used multiple times in one transmission, and the receiving end does not need to reconstruct the real face image model in real time during the call, which simplifies the operation process of the receiving end and improves the user experience.

In order to implement the above embodiments, the present invention also proposes a terminal.

A terminal includes: an acquisition module, configured to collect a video image of a user; an identification module, configured to perform facial recognition on the video image to obtain facial expression information; and a sending module, configured to send the facial expression information to establish a call with the terminal The second terminal, the facial expression information is used to cause the second terminal to synthesize and display the video image according to the facial expression information and the facial image model preset in other terminals.

4 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

As shown in FIG. 4, the terminal includes: an acquisition module 110, an identification module 120, and a sending module 130.

Specifically, the collection module 110 is configured to collect a video image of the user. More specifically, the acquisition module 110 can capture by the camera of the terminal or the peripheral device to collect the video image of the user.

The identification module 120 is configured to perform facial recognition on the video image to obtain facial expression information. More specifically, the recognition module 120 may perform facial recognition on the video image by using various computer image processing technologies to obtain facial expression information, such as face recognition of genetic algorithms, face recognition of a neural network, and the like. The amount of data for facial expression information is very small. The process of acquiring facial expressions will be described in detail in the subsequent embodiments.

The sending module 130 is configured to send facial expression information to a second terminal that establishes a call with the terminal, and the facial expression information is used to cause the second terminal to synthesize and display the video image according to the facial expression information and the facial image model preset in other terminals.

The terminal sends a video call request to the second terminal by using the server, or the second terminal sends a video call request to the terminal by using the server. If the second terminal agrees to the video call request of the terminal, or the terminal agrees to the video call request of the second terminal, the server can establish a video call between the terminal and the second terminal.

More specifically, the sending module 130 may encode the facial expression information to form a digital expression, and send the facial expression information to the second terminal through the video call established by the server.

After transmitting the facial expression information to the second terminal, the second terminal may synthesize according to the facial expression information and the preset facial image model to reproduce the facial image of the end user, and display it in the video call interface of the second terminal. . The preset face image model can be set by the user himself or by default. In addition, the user of the second terminal may also synthesize with his own photo or photo of the end user and facial expression information to reproduce the facial image of the first terminal user.

The terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, so that the second terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, due to transmission The information is limited to the facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image, the amount of information included is small, and the data amount of the facial expression information after encoding can occupy only a few bit bytes, and thus the background art Compared with the transmitted information, the amount of data transmitted during the video call is greatly reduced, and the data traffic is saved, thereby making the video call more smooth, reducing the influence of limited network bandwidth or limited traffic on the video call, especially suitable for Transmission in the mobile network enhances the user experience. In addition, there is no need to reconstruct the face image model of the user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust.

In an embodiment of the present invention, the identification module 120 is further configured to perform facial recognition on the video image to obtain facial features, and extract facial expression information in the facial features.

Specifically, first, the facial features extracted by the recognition module 120 from the video image may be, but are not limited to, geometric information including facial features such as eyes, nose, mouth, ears, etc., for example, the position of the eyebrows, the mouth Angle, size of the eye, etc. It should be understood that the facial feature information can also be obtained by other methods, and for the new face recognition technology in the future, the video image can be used for face recognition to obtain facial feature information. Thereafter, the recognition module 120 extracts facial expression information in the facial features, and the identification module 120 may analyze the facial feature information to obtain facial expression information of the user.

In addition, the facial expression information mainly reflects the emotional information of the person. For example, by analyzing the position of the eyebrows, the angle of the mouth, the size of the eyes, etc., the expression of the user can be obtained by smiling, laughing, crying, depressed, excited or angry. Wait. Similarly, various facial expression information analysis techniques can be used for analysis, for example, machine learning algorithms, etc. In addition, algorithms with similar functions in the future can be used to perform facial feature information analysis to obtain facial expression information.

In addition, the sending module 130 may encode the facial expression information to form a digital expression. For example, it may be a simple few characters and occupy only a few bits. For example, the "Laughter" may directly transmit the character "D:" to encode. Transmission, etc., of course, the encoding method can be more abundant, here only for the convenience of understanding the example, and the facial expression information is sent to the second terminal through the video call established by the server.

In order to implement the above embodiment, the present invention also proposes another terminal.

FIG. 5 is a schematic structural diagram of a terminal according to another embodiment of the present invention.

As shown in FIG. 5, the terminal includes: a receiving module 210 and a synthesizing module 220.

Specifically, the receiving module 210 is configured to receive facial expression information of a video image sent by the first terminal that establishes a call with the terminal. The synthesizing module 220 is configured to synthesize and display the video image according to the facial expression information and the face image model preset in the terminal.

More specifically, the synthesizing module 220 may synthesize according to the facial expression information of the first end user and the preset facial image model to reproduce the facial image of the first end user and display it in the video call interface of the terminal. The preset face image model may be set by the user or may be set by default. In addition, users of the terminal can also adopt The own photo or the photo of the first end user is displayed as a face image model to reproduce the face image of the first end user.

As shown in FIG. 6, the terminal further includes: a selection module 230, as shown in FIG.

Specifically, the selecting module 230 is configured to select a real or cartoon face image model after the receiving module 210 receives the facial expression information of the video image sent by the first terminal that establishes the call with the second terminal, and select the real or cartoon The face image model is used to synthesize a video image with facial expression information and display.

More specifically, in order to make the video call process more personalized and fun, the terminal may provide the user with multiple real or cartoon face image models, for example, may be multiple cartoon face image models, or photos, real Face image models, etc., users can choose their favorite face image model according to their needs. For example, the facial expression information of the first terminal user is a big laugh, and the terminal user selects a superhuman face image model, and the terminal synthesizes the facial expression information of the first end user and the cartoon image of the superman to reproduce the other end users. The facial expression is a picture of a big laugh.

Thereby, the user can select a real or cartoon face image model, and synthesize and display the video image according to the selected real or cartoon face image model and facial expression information, thereby increasing the interest and improving the user experience.

In order to achieve the above object, the present invention also proposes a terminal device.

A terminal device of an embodiment of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when executed by the one or more processors And performing the following operations: collecting a video image of the user of the terminal device; performing facial recognition on the video image to obtain facial expression information; and transmitting the facial expression information to a second terminal that establishes a call with the terminal device, where The facial expression information is used to cause the second terminal to synthesize and display a video image according to the facial expression information and a face image model preset to the second terminal.

In order to achieve the above object, the present invention also proposes another terminal device.

A terminal device of an embodiment of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when executed by the one or more processors And performing the following operations: receiving facial expression information of a video image sent by the first terminal that establishes a call with the terminal device; synthesizing the video image according to the facial expression information and a face image model preset in the terminal device, and displaying .

It should be understood that portions of the invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.

In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.

While the embodiments of the present invention have been shown and described, the embodiments of the invention may The scope of the invention is defined by the claims and their equivalents.

Claims

A virtual video calling method, comprising:

Collecting video images of the first terminal user;

Performing facial recognition on the video image to obtain facial expression information;

Transmitting the facial expression information to a second terminal that establishes a call with the first terminal, where the facial expression information is used to cause the second terminal to be preset according to the facial expression information and preset at the second terminal The face image model synthesizes a video image and displays it.
The method according to claim 1, wherein the performing face recognition on the video image to obtain facial expression information comprises:

Facial recognition is performed on the video image to obtain a facial feature, and the facial expression information is extracted in the facial feature.
The method according to claim 1 or 2, wherein the facial expression information comprises one or more of the following: whether frowning, opening or closing of the mouth, curvature of the corner of the mouth, opening or closing of the eye The size of the eyes, whether there are tears.
A virtual video calling method, comprising:

Receiving facial expression information of a video image sent by the first terminal that establishes a call with the second terminal;

And synthesizing a video image according to the facial expression information and a face image model preset to the second terminal and displaying.
The method according to claim 4, further comprising: after receiving the facial expression information of the video image sent by the first terminal that establishes a call with the second terminal,

A real or cartoon face image model is selected, and the selected real or cartoon face image model is used to synthesize and display the video image with the facial expression information.
A terminal, comprising:

An acquisition module, configured to collect a video image of the user;

An identification module, configured to perform facial recognition on the video image to obtain facial expression information;

a sending module, configured to send the facial expression information to a second terminal that establishes a call with the terminal, where the facial expression information is used to cause the second terminal to be preset to the second terminal according to the facial expression information The face image model synthesizes the video image and displays it.
The terminal according to claim 6, wherein the identification module is further configured to perform face recognition on the video image to obtain a facial feature, and extract the facial expression information in the facial feature.
The terminal according to claim 6 or 7, wherein the facial expression information includes the following One or more of the contents: whether to frown, open or close the mouth, the curvature of the corner of the mouth, the eyes open or close, the size of the eyes, whether there is tears.
A terminal, comprising:

a receiving module, configured to receive facial expression information of a video image sent by the first terminal that establishes a call with the terminal;

And a synthesizing module, configured to synthesize and display the video image according to the facial expression information and a face image model preset in the terminal.
The terminal according to claim 9, further comprising:

a selection module, configured to select a real or cartoon face image model after the receiving module receives the facial expression information of the video image sent by the first terminal that establishes a call with the second terminal, the selected real or cartoon The face image model is used to synthesize a video image with the facial expression information and display it.
A terminal device, comprising:

One or more processors;

Memory

One or more programs, the one or more programs being stored in the memory, and when executed by the one or more processors, do the following:

Collecting a video image of the user of the terminal device;

Performing facial recognition on the video image to obtain facial expression information;

Transmitting the facial expression information to a second terminal that establishes a call with the terminal device, where the facial expression information is used to cause the second terminal to be based on the facial expression information and a person preset in the second terminal The face image model synthesizes the video image and displays it.
A terminal device, comprising:

One or more processors;

Memory

One or more programs, the one or more programs being stored in the memory, and when executed by the one or more processors, do the following:

Receiving facial expression information of a video image sent by the first terminal that establishes a call with the terminal device;

And synthesizing a video image according to the facial expression information and a face image model preset to the terminal device and displaying.