WO2015090147A1 - Virtual video call method and terminal - Google Patents

Virtual video call method and terminal Download PDF

Info

Publication number
WO2015090147A1
WO2015090147A1 PCT/CN2014/093187 CN2014093187W WO2015090147A1 WO 2015090147 A1 WO2015090147 A1 WO 2015090147A1 CN 2014093187 W CN2014093187 W CN 2014093187W WO 2015090147 A1 WO2015090147 A1 WO 2015090147A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal
facial expression
expression information
facial
video image
Prior art date
Application number
PCT/CN2014/093187
Other languages
French (fr)
Chinese (zh)
Inventor
李刚
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to JP2016543309A priority Critical patent/JP2016537922A/en
Priority to KR1020157036602A priority patent/KR101768980B1/en
Publication of WO2015090147A1 publication Critical patent/WO2015090147A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/157Conference systems defining a virtual conference space and using avatars or agents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a virtual video calling method and terminal.
  • the main method of virtual video call is to collect an image on the transmitting end, determine a face area in the image, extract facial feature information in the face area, and send the extracted facial feature information to the receiving end, and use it at the receiving end.
  • the facial feature information reproduces the facial expression of the corresponding user.
  • the current drawback is that since the facial features of each person are different, the data of the extracted facial feature information is still very large, and the above method also needs to reconstruct a specific target facial model according to the facial feature information (for example, the facial model of the user at the transmitting end) ). Therefore, it can be seen that the amount of video data transmitted in the prior art is very large, consumes a large amount of data traffic, and can also cause video calls to be unsmooth, and is not suitable for mobile networks with limited bandwidth or limited traffic, thus seriously hindering video. The popularity and promotion of calls.
  • the present invention aims to solve at least one of the above technical problems.
  • a first object of the present invention is to propose a virtual video calling method.
  • the method greatly reduces the amount of data transmitted during the video call, saves data traffic, thereby making the video call more smooth, reducing the impact of limited network bandwidth or limited traffic on the video call, and improving the user experience.
  • a second object of the present invention is to propose another virtual video calling method.
  • a third object of the present invention is to propose a terminal.
  • a fourth object of the present invention is to propose another terminal.
  • a fifth object of the present invention is to provide a terminal device.
  • a sixth object of the present invention is to propose another terminal device.
  • a virtual video calling method includes: collecting a first terminal a video image of the user; facial recognition of the video image to obtain facial expression information; transmitting the facial expression information to a second terminal that establishes a call with the first terminal, the facial expression information being used to cause the The second terminal synthesizes a video image according to the facial expression information and a face image model preset to the second terminal and displays the video image.
  • the virtual video calling method of the embodiment of the present invention uses facial recognition technology to extract facial expression information at a transmitting end (for example, a first terminal), and at the receiving end (for example, the second terminal) according to the sent facial expression information and the preset facial face.
  • the image model realizes simple synthesis and restoration of the face image, since the information transmitted at the transmitting end and the receiving end is limited to the facial expression information, and since the facial expression information does not need to synthesize a complete facial image, the amount of information included is small, and the encoding is small.
  • the data volume of the facial expression information can occupy only a few bit bytes, so compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, and the video call is smoother. It reduces the impact of limited network bandwidth or limited traffic on video calls, and is especially suitable for transmission in mobile networks, which improves the user experience.
  • the second terminal does not need to reconstruct the face image model of the first terminal user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust. .
  • the virtual video calling method of the second aspect of the present invention includes: receiving facial expression information of a video image sent by a first terminal that establishes a call with the second terminal; and according to the facial expression information and the preset The face image model of the second terminal synthesizes a video image and displays it.
  • the virtual video calling method of the embodiment of the present invention uses facial recognition technology to extract facial expression information at a transmitting end (for example, a first terminal), and at the receiving end (for example, the second terminal) according to the sent facial expression information and the preset facial face.
  • the image model realizes simple synthesis and restoration of the face image, since the information transmitted at the transmitting end and the receiving end is limited to the facial expression information, and since the facial expression information does not need to synthesize a complete facial image, the amount of information included is small, and the encoding is small.
  • the data volume of the facial expression information can occupy only a few bit bytes, so compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, and the video call is smoother. It reduces the impact of limited network bandwidth or limited traffic on video calls, and is especially suitable for transmission in mobile networks, which improves the user experience.
  • the second terminal does not need to reconstruct the face image model of the first terminal user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust. .
  • the terminal of the third aspect of the present invention includes: an acquisition module, configured to collect a video image of a user; and an identification module, configured to perform facial recognition on the video image to obtain facial expression information; and send a module, Sending the facial expression information to a second terminal that establishes a call with the terminal, the facial expression information is used to cause the second terminal to preset a face according to the facial expression information and the second terminal
  • the image model synthesizes the video image and displays it.
  • the terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, so that the second terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, due to transmission
  • the information is limited to facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image,
  • the amount of information included is small, and the amount of data of the facial expression information after encoding can occupy only a few bit bytes, so the amount of data transmitted during the video call is greatly reduced, and the data traffic is saved, compared with the information transmitted by the background art.
  • the video call is smoother, and the influence of limited network bandwidth or limited traffic on the video call is reduced, which is particularly suitable for transmission in the mobile network, thereby improving the user experience.
  • the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust.
  • the terminal of the fourth aspect of the present invention includes: a receiving module, configured to receive facial expression information of a video image sent by a first terminal that establishes a call with the terminal; and a synthesizing module, configured to use the facial expression according to the facial expression The information and the preset face image model of the terminal synthesize a video image and display.
  • the terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, and the first terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, because the sending
  • the information transmitted by the terminal and the receiving end is limited to the facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image, the amount of information included is small, and the data amount of the facial expression information after encoding can occupy only a few bit bytes. Therefore, compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, the video call is smoother, and the influence of limited network bandwidth or limited traffic on the video call is reduced.
  • a terminal device includes: one or more processors; a memory; one or more programs, the one or more programs are stored in the memory, when When the one or more processors are executed, the following operations are performed: collecting a video image of the user of the terminal device; performing facial recognition on the video image to obtain facial expression information; and transmitting the facial expression information to the terminal device Establishing a second terminal of the call, the facial expression information is used to cause the second terminal to synthesize and display a video image according to the facial expression information and a face image model preset in the second terminal.
  • a terminal device of a sixth aspect of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs are stored in the memory, when When the one or more processors are executed, performing the following operations: receiving facial expression information of a video image sent by the first terminal that establishes a call with the terminal device; according to the facial expression information and a person preset in the terminal device The face image model synthesizes the video image and displays it.
  • FIG. 1 is a flow chart of a virtual video call method according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of a virtual video call method according to another embodiment of the present invention.
  • FIG. 3 is a flowchart of a virtual video call method according to still another embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a terminal according to another embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a terminal according to still another embodiment of the present invention.
  • the present invention provides a virtual video calling method and terminal.
  • a virtual video call method and terminal according to an embodiment of the present invention are described below with reference to the accompanying drawings.
  • a virtual video calling method comprising the steps of: collecting a video image of a first terminal user; performing face recognition on the video image to obtain facial expression information; and transmitting the facial expression information to a second terminal that establishes a call with the first terminal, the face
  • the expression information is used to enable the second terminal to combine the facial expression information with the face image model preset in the second terminal. Become a video image and display it.
  • FIG. 1 is a flow chart of a virtual video call method in accordance with one embodiment of the present invention.
  • the virtual video calling method includes the following steps:
  • the first terminal may perform shooting by using a camera of a self-contained or peripheral device to collect a video image of the first terminal user.
  • S102 Perform facial recognition on the video image to obtain facial expression information.
  • the first terminal may perform facial recognition on the video image by using various computer image processing technologies to obtain facial expression information, such as face recognition by genetic algorithm, face recognition of a neural network, and the like.
  • facial expression information such as face recognition by genetic algorithm, face recognition of a neural network, and the like.
  • the amount of data on facial expressions is very small. The process of acquiring facial expressions will be described in detail in the subsequent embodiments.
  • S103 Send the facial expression information to the second terminal that establishes a call with the first terminal, where the facial expression information is used to cause the second terminal to synthesize and display the video image according to the facial expression information and the facial image model preset in the second terminal.
  • the first terminal sends a video call request to the second terminal by using the server, or the second terminal sends a video call request to the first terminal by using the server. If the second terminal agrees to the video call request of the first terminal, or the first terminal agrees to the video call request of the second terminal, the server can establish a video call between the first terminal and the second terminal.
  • the first terminal may encode the facial expression information of the first terminal user to form a digital expression, and send the facial expression information to the second terminal by using the video call established by the server.
  • the second terminal may synthesize according to the facial expression information of the first terminal user and the preset facial image model to reproduce the first terminal user.
  • the facial image is displayed in the video call interface of the second terminal.
  • the preset face image model can be set by the user himself or by default.
  • the user of the second terminal may also synthesize with his own photo or photo and facial expression information of the first terminal user to reproduce the facial image of the first terminal user.
  • the video can be regarded as a video image of one frame and one frame.
  • the facial expression information of each frame image is acquired, and in the second terminal, the facial expression information is also synthesized in each frame image, thereby realizing the virtual video.
  • the call, wherein the synthesis process is prior art, is not described here.
  • the virtual video calling method of the embodiment of the present invention uses facial recognition technology to extract facial expression information at a transmitting end (for example, a first terminal), and at the receiving end (for example, the second terminal) according to the sent facial expression information and the preset facial face.
  • the image model realizes simple synthesis and restoration of the face image, since the information transmitted at the transmitting end and the receiving end is limited to the facial expression information, and since the facial expression information does not need to synthesize a complete facial image, the amount of information included is small, and the encoding is small.
  • the data volume of the facial expression information can occupy only a few bit bytes, so compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, and the video call is smoother. Lower The limited network bandwidth or limited traffic impact on video calls, especially suitable for transmission in mobile networks, improving the user experience.
  • the second terminal does not need to reconstruct the face image model of the first terminal user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust. .
  • facial recognition of the video image to obtain facial expression information includes facial recognition of the video image to obtain facial features, and facial expression information is extracted in the facial features.
  • facial features are extracted from a video image, which may be, but is not limited to, geometric information including facial features such as eyes, nose, mouth, ears, etc., such as the position of the eyebrows, the angle of the mouth, the eyes Size and so on. It should be understood that the facial features can also be obtained by other methods.
  • the first terminal of the embodiment can use the facial recognition of the video image to obtain facial features. Thereafter, the facial expression information is extracted in the facial feature, and the first terminal may analyze the facial feature to obtain facial expression information of the first terminal user.
  • the facial expression information includes one or more of the following: whether to frown, open or close the mouth, the curvature of the corner of the mouth, the opening or closing of the eye, the size of the eye, whether there is tears Wait.
  • the facial expression information mainly reflects the emotional information of the person. For example, by analyzing the position of the eyebrows, the angle of the mouth, the size of the eyes, etc., the expression of the user can be obtained by smiling, laughing, crying, depressed, excited or angry. Wait.
  • the existing multiple facial expression information analysis techniques can be used for analysis, for example, a machine learning algorithm, etc.
  • an algorithm having similar functions can be used by the first terminal of the embodiment to perform facial feature analysis. Get facial expression information.
  • the first terminal may encode the facial expression information of the first terminal user to form a digital expression.
  • a digital expression For example, it may be a simple number of characters and only occupy a few bits.
  • the character "D:” may be directly sent to "Laughter”. "Encoding transmission, etc., of course, the encoding method can be more abundant.
  • the facial expression information is sent to the second terminal through the video call established by the server.
  • the preset face image model is more diverse.
  • the face image model preset at the second terminal includes a real face image model and a cartoon face image model.
  • it may be a photo or the like stored in the second terminal.
  • the second terminal user can select a favorite cartoon face image model according to the needs of the user.
  • the virtual video call method further includes: The user of the second terminal provides at least one cartoon face image model; the second terminal receives the cartoon face image model selected by the user of the second terminal, and synthesizes and displays according to the facial expression information and the selected face image model.
  • the second terminal receives the cartoon face image model selected by the user of the second terminal, and according to the first terminal
  • the facial expression information of the user and the cartoon face image model selected by the second terminal user are combined to reproduce the facial image of the first terminal user and displayed in the second terminal video call interface.
  • the facial expression information of the first end user is that the mouth is open, The curvature of the corner of the mouth is many and the eyes are slightly stunned.
  • the first end user is laughing
  • the second terminal user selects the superhuman face image model
  • the second terminal sets the facial expression information of the first terminal user and the superman cartoon image.
  • An image is synthesized to reproduce the facial expression of the first end user as a laugh.
  • Another embodiment of the present invention also proposes a virtual video calling method.
  • FIG. 2 is a flow chart of a virtual video call method in accordance with another embodiment of the present invention.
  • the virtual video calling method includes the following steps:
  • S201 Receive facial expression information of a video image sent by the first terminal that establishes a call with the second terminal.
  • the first terminal sends a video call request to the second terminal by using the server, or the second terminal sends a video call request to the first terminal by using the server. If the second terminal agrees to the video call request of the first terminal, or the first terminal agrees to the video call request of the second terminal, the server can establish a video call between the first terminal and the second terminal.
  • the first terminal can capture the video image of the first terminal user by using a camera of the self-contained or peripheral device, and can obtain the facial expression information according to the method described in any of the foregoing embodiments and send the facial expression information to the second terminal. .
  • the second terminal may synthesize according to the facial expression information of the first terminal user and the preset facial image model to reproduce the facial image of the first terminal user, and display it in the video call interface of the second terminal.
  • the preset face image model can be set by the user himself or by default.
  • the user of the second terminal may also use his own photo or photo display of the first terminal user as a face image model to reproduce the facial image of the first terminal user.
  • the virtual video calling method of the embodiment of the present invention uses facial recognition technology to extract facial expression information at a transmitting end (for example, a first terminal), and at the receiving end (for example, the second terminal) according to the sent facial expression information and the preset facial face.
  • the image model realizes simple synthesis and restoration of the face image, since the information transmitted at the transmitting end and the receiving end is limited to the facial expression information, and since the facial expression information does not need to synthesize a complete facial image, the amount of information included is small, and the encoding is small.
  • the data volume of the facial expression information can occupy only a few bit bytes, so compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, and the video call is smoother. It reduces the impact of limited network bandwidth or limited traffic on video calls, and is especially suitable for transmission in mobile networks, which improves the user experience.
  • the second terminal does not need to reconstruct the face image model of the first terminal user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust. .
  • FIG. 3 is a flow chart of a virtual video call method in accordance with yet another embodiment of the present invention.
  • the virtual video calling method includes the following steps:
  • S301 Receive facial expression information of a video image sent by the first terminal that establishes a call with the second terminal.
  • the second terminal may provide the user with multiple real or cartoon face image models, for example, multiple cartoon face image models, or photos, real people.
  • the face image model, etc. the second end user can select his or her favorite face image model according to his own needs.
  • the facial expression information of the first end user is open, the curvature of the corner of the mouth is large, and the eyes are slightly stunned.
  • the first terminal user is laughing, the second terminal user is selecting the superhuman face image model, and the second The terminal synthesizes the facial expression information of the first terminal user and the cartoon image of the superman to reproduce the image of the first terminal user's facial expression as a big laugh.
  • the user of the second terminal may select a real or cartoon face image model, and synthesize and display the video image according to the selected real or cartoon face image model and facial expression information. Increased fun and improved user experience.
  • the second terminal may acquire a real face image model of the first terminal user to perform facial expression reproduction.
  • the first terminal may perform a video image captured by the camera and analyze the captured video image to obtain a real face image model, or the first terminal may analyze the face image selected by the user to obtain the real person.
  • the face image model is then sent to the second terminal for storage.
  • the second terminal may further acquire a face image of the first terminal user, and perform analysis according to the face image to obtain a real face image model, that is, the real face image model may be generated in the second terminal.
  • the second terminal may synthesize the facial image of the first terminal user according to the real face image model of the first terminal user and the facial expression information of the first terminal user to reproduce the video call interface of the second terminal. Thereby, the reproduced facial image can be made more authentic.
  • the real face image model may be formed only once, sent to the second terminal for storage, and only facial expression information may be transmitted during subsequent data transmission.
  • a selection button may also be provided in the second terminal, and the second terminal user may select whether to display the true facial image of the first terminal user or the cartoon face image model to reproduce the facial image. More specifically, the user of the second terminal may select according to a specific network environment and terminal performance. For example, a cartoon face image model may be selected in the mobile terminal, and only facial expression information is sent to implement a video call, and the personal computer may choose a real face image model to add realism.
  • the virtual video calling method of the embodiment of the present invention can reproduce the facial image of the first terminal user according to the real face image model and the facial expression information of the first terminal user, thereby making the reproduced facial image more authentic,
  • the real face image model can be used multiple times in one transmission, and the receiving end does not need to reconstruct the real face image model in real time during the call, which simplifies the operation process of the receiving end and improves the user experience.
  • the present invention also proposes a terminal.
  • a terminal includes: an acquisition module, configured to collect a video image of a user; an identification module, configured to perform facial recognition on the video image to obtain facial expression information; and a sending module, configured to send the facial expression information to establish a call with the terminal.
  • the facial expression information is used to cause the second terminal to synthesize and display the video image according to the facial expression information and the facial image model preset in other terminals.
  • FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the terminal includes: an acquisition module 110, an identification module 120, and a sending module 130.
  • the collection module 110 is configured to collect a video image of the user. More specifically, the acquisition module 110 can capture by the camera of the terminal or the peripheral device to collect the video image of the user.
  • the identification module 120 is configured to perform facial recognition on the video image to obtain facial expression information. More specifically, the recognition module 120 may perform facial recognition on the video image by using various computer image processing technologies to obtain facial expression information, such as face recognition of genetic algorithms, face recognition of a neural network, and the like. The amount of data for facial expression information is very small. The process of acquiring facial expressions will be described in detail in the subsequent embodiments.
  • the sending module 130 is configured to send facial expression information to a second terminal that establishes a call with the terminal, and the facial expression information is used to cause the second terminal to synthesize and display the video image according to the facial expression information and the facial image model preset in other terminals.
  • the terminal sends a video call request to the second terminal by using the server, or the second terminal sends a video call request to the terminal by using the server. If the second terminal agrees to the video call request of the terminal, or the terminal agrees to the video call request of the second terminal, the server can establish a video call between the terminal and the second terminal.
  • the sending module 130 may encode the facial expression information to form a digital expression, and send the facial expression information to the second terminal through the video call established by the server.
  • the second terminal may synthesize according to the facial expression information and the preset facial image model to reproduce the facial image of the end user, and display it in the video call interface of the second terminal.
  • the preset face image model can be set by the user himself or by default.
  • the user of the second terminal may also synthesize with his own photo or photo of the end user and facial expression information to reproduce the facial image of the first terminal user.
  • the terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, so that the second terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, due to transmission
  • the information is limited to the facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image, the amount of information included is small, and the data amount of the facial expression information after encoding can occupy only a few bit bytes, and thus the background art Compared with the transmitted information, the amount of data transmitted during the video call is greatly reduced, and the data traffic is saved, thereby making the video call more smooth, reducing the influence of limited network bandwidth or limited traffic on the video call, especially suitable for Transmission in the mobile network enhances the user experience.
  • there is no need to reconstruct the face image model of the user in the second terminal and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to
  • the identification module 120 is further configured to perform facial recognition on the video image to obtain facial features, and extract facial expression information in the facial features.
  • the facial features extracted by the recognition module 120 from the video image may be, but are not limited to, geometric information including facial features such as eyes, nose, mouth, ears, etc., for example, the position of the eyebrows, the mouth Angle, size of the eye, etc. It should be understood that the facial feature information can also be obtained by other methods, and for the new face recognition technology in the future, the video image can be used for face recognition to obtain facial feature information. Thereafter, the recognition module 120 extracts facial expression information in the facial features, and the identification module 120 may analyze the facial feature information to obtain facial expression information of the user.
  • the facial expression information includes one or more of the following: whether to frown, open or close the mouth, the curvature of the corner of the mouth, the opening or closing of the eye, the size of the eye, whether there is tears Wait.
  • the facial expression information mainly reflects the emotional information of the person. For example, by analyzing the position of the eyebrows, the angle of the mouth, the size of the eyes, etc., the expression of the user can be obtained by smiling, laughing, crying, depressed, excited or angry. Wait.
  • various facial expression information analysis techniques can be used for analysis, for example, machine learning algorithms, etc.
  • algorithms with similar functions in the future can be used to perform facial feature information analysis to obtain facial expression information.
  • the facial expression information includes one or more of the following: whether to frown, open or close the mouth, the curvature of the corner of the mouth, the opening or closing of the eye, the size of the eye, whether there is tears Wait.
  • the sending module 130 may encode the facial expression information to form a digital expression.
  • it may be a simple few characters and occupy only a few bits.
  • the "Laughter” may directly transmit the character "D:” to encode. Transmission, etc., of course, the encoding method can be more abundant, here only for the convenience of understanding the example, and the facial expression information is sent to the second terminal through the video call established by the server.
  • the present invention also proposes another terminal.
  • FIG. 5 is a schematic structural diagram of a terminal according to another embodiment of the present invention.
  • the terminal includes: a receiving module 210 and a synthesizing module 220.
  • the receiving module 210 is configured to receive facial expression information of a video image sent by the first terminal that establishes a call with the terminal.
  • the synthesizing module 220 is configured to synthesize and display the video image according to the facial expression information and the face image model preset in the terminal.
  • the synthesizing module 220 may synthesize according to the facial expression information of the first end user and the preset facial image model to reproduce the facial image of the first end user and display it in the video call interface of the terminal.
  • the preset face image model may be set by the user or may be set by default.
  • users of the terminal can also adopt The own photo or the photo of the first end user is displayed as a face image model to reproduce the face image of the first end user.
  • the terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, and the first terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, because the sending
  • the information transmitted by the terminal and the receiving end is limited to the facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image, the amount of information included is small, and the data amount of the facial expression information after encoding can occupy only a few bit bytes. Therefore, compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, the video call is smoother, and the influence of limited network bandwidth or limited traffic on the video call is reduced.
  • FIG. 6 is a schematic structural diagram of a terminal according to still another embodiment of the present invention.
  • the terminal further includes: a selection module 230, as shown in FIG.
  • the selecting module 230 is configured to select a real or cartoon face image model after the receiving module 210 receives the facial expression information of the video image sent by the first terminal that establishes the call with the second terminal, and select the real or cartoon
  • the face image model is used to synthesize a video image with facial expression information and display.
  • the terminal may provide the user with multiple real or cartoon face image models, for example, may be multiple cartoon face image models, or photos, real Face image models, etc., users can choose their favorite face image model according to their needs.
  • the facial expression information of the first terminal user is a big laugh
  • the terminal user selects a superhuman face image model
  • the terminal synthesizes the facial expression information of the first end user and the cartoon image of the superman to reproduce the other end users.
  • the facial expression is a picture of a big laugh.
  • the user can select a real or cartoon face image model, and synthesize and display the video image according to the selected real or cartoon face image model and facial expression information, thereby increasing the interest and improving the user experience.
  • the present invention also proposes a terminal device.
  • a terminal device of an embodiment of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when executed by the one or more processors And performing the following operations: collecting a video image of the user of the terminal device; performing facial recognition on the video image to obtain facial expression information; and transmitting the facial expression information to a second terminal that establishes a call with the terminal device, where The facial expression information is used to cause the second terminal to synthesize and display a video image according to the facial expression information and a face image model preset to the second terminal.
  • the present invention also proposes another terminal device.
  • a terminal device of an embodiment of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when executed by the one or more processors And performing the following operations: receiving facial expression information of a video image sent by the first terminal that establishes a call with the terminal device; synthesizing the video image according to the facial expression information and a face image model preset in the terminal device, and displaying .
  • portions of the invention may be implemented in hardware, software, firmware or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Telephonic Communication Services (AREA)
  • Processing Or Creating Images (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Image Processing (AREA)
  • Telephone Function (AREA)
  • Image Analysis (AREA)

Abstract

Provided are a virtual video call method and terminal, the method comprising: capturing a video image of a first terminal user; conducting face recognition on the video image to obtain facial expression information; sending the facial expression information to a second terminal establishing a call connection with the first terminal, the facial expression information being used to enable the second terminal to compose and display a video image according to the facial expression information and a facial image model preset in the second terminal. The method in an embodiment of the present invention uses face recognition technology to extract the facial expression information in a sending terminal (for example, the first terminal), and composes and restores the facial image in a receiving terminal (for example, the second terminal) according to the sent facial expression information and a preset facial image module. The extremely small amount of transmitted facial expression data results in a significant reduction in the amount of data transmitted in a video call process, thus providing a smoother video call, and reducing the impact of limited network bandwidth or limited traffic on the video call.

Description

虚拟视频通话方法和终端Virtual video calling method and terminal
相关申请的交叉引用Cross-reference to related applications
本申请要求百度在线网络技术(北京)有限公司于2013年12月20日提交的、发明名称为“虚拟视频通话方法和终端”的、中国专利申请号“201310714667.3”的优先权。This application claims the priority of Chinese Patent Application No. 201310714667.3, filed on Dec. 20, 2013 by Baidu Online Network Technology (Beijing) Co., Ltd., entitled "Virtual Video Calling Method and Terminal".
技术领域Technical field
本发明涉及通信技术领域,尤其涉及一种虚拟视频通话方法和终端。The present invention relates to the field of communications technologies, and in particular, to a virtual video calling method and terminal.
背景技术Background technique
随着网络宽带的快速提升以及硬件设备的发展和普及,视频通话的市场也进入了发展的快车道。目前,虚拟视频通话的主要方法是在发送端采集图像,并确定图像中的面部区域,对面部区域内的面部特征信息进行提取,将提取出来的面部特征信息发送至接收端,在接收端利用面部特征信息重现对应的用户的面部表情。With the rapid increase of network broadband and the development and popularization of hardware devices, the market for video calls has also entered the fast lane of development. At present, the main method of virtual video call is to collect an image on the transmitting end, determine a face area in the image, extract facial feature information in the face area, and send the extracted facial feature information to the receiving end, and use it at the receiving end. The facial feature information reproduces the facial expression of the corresponding user.
目前存在的缺陷是,由于每个人的面部特征是不同的,提取的面部特征信息的数据依然非常大,并且上述方法还需根据面部特征信息重建特定对象面部模型(例如,发送端的用户的面部模型)。因此可以看出,现有技术中传送的视频数据量非常大,消耗了大量数据流量,还可造成视频通话不流畅,不适合带宽有限的移动网络或者流量受限的场合,因此严重阻碍了视频通话的普及和推广。The current drawback is that since the facial features of each person are different, the data of the extracted facial feature information is still very large, and the above method also needs to reconstruct a specific target facial model according to the facial feature information (for example, the facial model of the user at the transmitting end) ). Therefore, it can be seen that the amount of video data transmitted in the prior art is very large, consumes a large amount of data traffic, and can also cause video calls to be unsmooth, and is not suitable for mobile networks with limited bandwidth or limited traffic, thus seriously hindering video. The popularity and promotion of calls.
发明内容Summary of the invention
本发明旨在至少解决上述技术问题之一。The present invention aims to solve at least one of the above technical problems.
为此,本发明的第一个目的在于提出一种虚拟视频通话方法。该方法大大降低了视频通话过程中传送的数据量,节省了数据流量,从而使得视频通话更加流畅,降低了网络带宽有限或者流量受限对视频通话的影响,提升了用户体验。To this end, a first object of the present invention is to propose a virtual video calling method. The method greatly reduces the amount of data transmitted during the video call, saves data traffic, thereby making the video call more smooth, reducing the impact of limited network bandwidth or limited traffic on the video call, and improving the user experience.
本发明的第二个目的在于提出另一种虚拟视频通话方法。A second object of the present invention is to propose another virtual video calling method.
本发明的第三个目的在于提出一种终端。A third object of the present invention is to propose a terminal.
本发明的第四个目的在于提出另一种终端。A fourth object of the present invention is to propose another terminal.
本发明的第五个目的在于提出一种终端设备。A fifth object of the present invention is to provide a terminal device.
本发明的第六个目的在于提出另一种终端设备。A sixth object of the present invention is to propose another terminal device.
为了实现上述目的,本发明第一方面实施例的虚拟视频通话方法包括:采集第一终端 用户的视频图像;对所述视频图像进行面部识别以获取面部表情信息;将所述面部表情信息发送至与所述第一终端建立通话的第二终端,所述面部表情信息用于使所述第二终端根据所述面部表情信息和预设在所述第二终端的人脸图像模型合成视频图像并显示。In order to achieve the above object, a virtual video calling method according to an embodiment of the first aspect of the present invention includes: collecting a first terminal a video image of the user; facial recognition of the video image to obtain facial expression information; transmitting the facial expression information to a second terminal that establishes a call with the first terminal, the facial expression information being used to cause the The second terminal synthesizes a video image according to the facial expression information and a face image model preset to the second terminal and displays the video image.
本发明实施例的虚拟视频通话方法,利用面部识别技术在发送端(例如,第一终端)提取面部表情信息,在接收端(例如,第二终端)根据发送的面部表情信息和预设人脸图像模型实现人脸图像简单的合成和还原,由于在发送端和接收端传输的信息仅限于面部表情信息,并由于该面部表情信息无需合成完整的人脸图像,所包括的信息量少,编码之后面部表情信息的数据量可仅占用几个比特字节,因此与背景技术传输的信息相比,大大降低了视频通话过程中传送的数据量,节省了数据流量,从而使得视频通话更加流畅,降低了网络带宽有限或者流量受限对视频通话的影响,特别适合在移动网络中传输,提升了用户体验。另外,无需在第二终端重建第一终端用户的人脸图像模型,第二终端只需要根据面部表情信息在预设的人脸图像模型上显示对应的面部表情即可,使得第二终端易于调整。The virtual video calling method of the embodiment of the present invention uses facial recognition technology to extract facial expression information at a transmitting end (for example, a first terminal), and at the receiving end (for example, the second terminal) according to the sent facial expression information and the preset facial face. The image model realizes simple synthesis and restoration of the face image, since the information transmitted at the transmitting end and the receiving end is limited to the facial expression information, and since the facial expression information does not need to synthesize a complete facial image, the amount of information included is small, and the encoding is small. After that, the data volume of the facial expression information can occupy only a few bit bytes, so compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, and the video call is smoother. It reduces the impact of limited network bandwidth or limited traffic on video calls, and is especially suitable for transmission in mobile networks, which improves the user experience. In addition, the second terminal does not need to reconstruct the face image model of the first terminal user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust. .
为了实现上述目的,本发明第二方面实施例的虚拟视频通话方法包括:接收与第二终端建立通话的第一终端发送的视频图像的面部表情信息;根据所述面部表情信息和预设在所述第二终端的人脸图像模型合成视频图像并显示。In order to achieve the above object, the virtual video calling method of the second aspect of the present invention includes: receiving facial expression information of a video image sent by a first terminal that establishes a call with the second terminal; and according to the facial expression information and the preset The face image model of the second terminal synthesizes a video image and displays it.
本发明实施例的虚拟视频通话方法,利用面部识别技术在发送端(例如,第一终端)提取面部表情信息,在接收端(例如,第二终端)根据发送的面部表情信息和预设人脸图像模型实现人脸图像简单的合成和还原,由于在发送端和接收端传输的信息仅限于面部表情信息,并由于该面部表情信息无需合成完整的人脸图像,所包括的信息量少,编码之后面部表情信息的数据量可仅占用几个比特字节,因此与背景技术传输的信息相比,大大降低了视频通话过程中传送的数据量,节省了数据流量,从而使得视频通话更加流畅,降低了网络带宽有限或者流量受限对视频通话的影响,特别适合在移动网络中传输,提升了用户体验。另外,无需在第二终端重建第一终端用户的人脸图像模型,第二终端只需要根据面部表情信息在预设的人脸图像模型上显示对应的面部表情即可,使得第二终端易于调整。The virtual video calling method of the embodiment of the present invention uses facial recognition technology to extract facial expression information at a transmitting end (for example, a first terminal), and at the receiving end (for example, the second terminal) according to the sent facial expression information and the preset facial face. The image model realizes simple synthesis and restoration of the face image, since the information transmitted at the transmitting end and the receiving end is limited to the facial expression information, and since the facial expression information does not need to synthesize a complete facial image, the amount of information included is small, and the encoding is small. After that, the data volume of the facial expression information can occupy only a few bit bytes, so compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, and the video call is smoother. It reduces the impact of limited network bandwidth or limited traffic on video calls, and is especially suitable for transmission in mobile networks, which improves the user experience. In addition, the second terminal does not need to reconstruct the face image model of the first terminal user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust. .
为了实现上述目的,本发明第三方面实施例的终端,包括:采集模块,用于采集用户的视频图像;识别模块,用于对所述视频图像进行面部识别以获取面部表情信息;发送模块,用于将所述面部表情信息发送至与终端建立通话的第二终端,所述面部表情信息用于使所述第二终端根据所述面部表情信息和预设在所述第二终端的人脸图像模型合成视频图像并显示。The terminal of the third aspect of the present invention includes: an acquisition module, configured to collect a video image of a user; and an identification module, configured to perform facial recognition on the video image to obtain facial expression information; and send a module, Sending the facial expression information to a second terminal that establishes a call with the terminal, the facial expression information is used to cause the second terminal to preset a face according to the facial expression information and the second terminal The image model synthesizes the video image and displays it.
本发明实施例的终端,利用面部识别技术提取面部表情信息,使与终端建立通话的第二终端根据发送的面部表情信息和预设人脸图像模型实现人脸图像简单的合成和还原,由于传输的信息仅限于面部表情信息,又由于该面部表情信息无需合成完整的人脸图像,所 包括的信息量少,编码之后面部表情信息的数据量可仅占用几个比特字节,因此与背景技术传输的信息相比,大大降低了视频通话过程中传送的数据量,节省了数据流量,从而使得视频通话更加流畅,降低了网络带宽有限或者流量受限对视频通话的影响,特别适合在移动网络中传输,提升了用户体验。另外,无需在第二终端重建用户的人脸图像模型,第二终端只需要根据面部表情信息在预设的人脸图像模型上显示对应的面部表情即可,使得第二终端易于调整。The terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, so that the second terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, due to transmission The information is limited to facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image, The amount of information included is small, and the amount of data of the facial expression information after encoding can occupy only a few bit bytes, so the amount of data transmitted during the video call is greatly reduced, and the data traffic is saved, compared with the information transmitted by the background art. Therefore, the video call is smoother, and the influence of limited network bandwidth or limited traffic on the video call is reduced, which is particularly suitable for transmission in the mobile network, thereby improving the user experience. In addition, there is no need to reconstruct the face image model of the user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust.
为了实现上述目的,本发明第四方面实施例的终端,包括:接收模块,用于接收与终端建立通话的第一终端发送的视频图像的面部表情信息;合成模块,用于根据所述面部表情信息和预设在所述终端的人脸图像模型合成视频图像并显示。The terminal of the fourth aspect of the present invention includes: a receiving module, configured to receive facial expression information of a video image sent by a first terminal that establishes a call with the terminal; and a synthesizing module, configured to use the facial expression according to the facial expression The information and the preset face image model of the terminal synthesize a video image and display.
本发明实施例的终端,利用面部识别技术提取面部表情信息,与终端建立通话的第一终端根据发送的面部表情信息和预设人脸图像模型实现人脸图像简单的合成和还原,由于在发送端和接收端传输的信息仅限于面部表情信息,并由于该面部表情信息无需合成完整的人脸图像,所包括的信息量少,编码之后面部表情信息的数据量可仅占用几个比特字节,因此与背景技术传输的信息相比,大大降低了视频通话过程中传送的数据量,节省了数据流量,从而使得视频通话更加流畅,降低了网络带宽有限或者流量受限对视频通话的影响,特别适合在移动网络中传输,提升了用户体验。另外,无需在重建人脸图像模型,只需要根据面部表情信息在预设的人脸图像模型上显示对应的面部表情即可,易于调整,使得终端易于调整。The terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, and the first terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, because the sending The information transmitted by the terminal and the receiving end is limited to the facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image, the amount of information included is small, and the data amount of the facial expression information after encoding can occupy only a few bit bytes. Therefore, compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, the video call is smoother, and the influence of limited network bandwidth or limited traffic on the video call is reduced. Particularly suitable for transmission in mobile networks, improving the user experience. In addition, there is no need to reconstruct the face image model, and only need to display the corresponding facial expression on the preset face image model according to the facial expression information, which is easy to adjust, so that the terminal is easy to adjust.
为了实现上述目的,本发明第五方面实施例的终端设备,包括:一个或者多个处理器;存储器;一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:采集所述终端设备用户的视频图像;对所述视频图像进行面部识别以获取面部表情信息;将所述面部表情信息发送至与所述终端设备建立通话的第二终端,所述面部表情信息用于使所述第二终端根据所述面部表情信息和预设在所述第二终端的人脸图像模型合成视频图像并显示。In order to achieve the above object, a terminal device according to an embodiment of the fifth aspect of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs are stored in the memory, when When the one or more processors are executed, the following operations are performed: collecting a video image of the user of the terminal device; performing facial recognition on the video image to obtain facial expression information; and transmitting the facial expression information to the terminal device Establishing a second terminal of the call, the facial expression information is used to cause the second terminal to synthesize and display a video image according to the facial expression information and a face image model preset in the second terminal.
为了实现上述目的,本发明第六方面实施例的终端设备,包括:一个或者多个处理器;存储器;一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:接收与所述终端设备建立通话的第一终端发送的视频图像的面部表情信息;根据所述面部表情信息和预设在所述终端设备的人脸图像模型合成视频图像并显示。In order to achieve the above object, a terminal device of a sixth aspect of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs are stored in the memory, when When the one or more processors are executed, performing the following operations: receiving facial expression information of a video image sent by the first terminal that establishes a call with the terminal device; according to the facial expression information and a person preset in the terminal device The face image model synthesizes the video image and displays it.
本发明附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。 The additional aspects and advantages of the invention will be set forth in part in the description which follows.
附图说明DRAWINGS
本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中,The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from
图1是根据本发明一个实施例的虚拟视频通话方法的流程图;1 is a flow chart of a virtual video call method according to an embodiment of the present invention;
图2是根据本发明另一个实施例的虚拟视频通话方法的流程图;2 is a flowchart of a virtual video call method according to another embodiment of the present invention;
图3是根据本发明又一个实施例的虚拟视频通话方法的流程图;3 is a flowchart of a virtual video call method according to still another embodiment of the present invention;
图4是根据本发明一个实施例的终端的结构示意图;4 is a schematic structural diagram of a terminal according to an embodiment of the present invention;
图5是根据本发明另一个实施例的终端的结构示意图;以及FIG. 5 is a schematic structural diagram of a terminal according to another embodiment of the present invention; and
图6是根据本发明又一个实施例的终端的结构示意图。FIG. 6 is a schematic structural diagram of a terminal according to still another embodiment of the present invention.
具体实施方式detailed description
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。相反,本发明的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are intended to be illustrative of the invention and are not to be construed as limiting. Rather, the invention is to cover all modifications, modifications and equivalents within the spirit and scope of the appended claims.
在本发明的描述中,需要理解的是,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。在本发明的描述中,需要说明的是,除非另有明确的规定和限定,术语“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。此外,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present invention, it should be noted that the terms "connected" and "connected" are to be understood broadly, and may be, for example, a fixed connection, a detachable connection, or an integral, unless otherwise explicitly defined and defined. Ground connection; it can be mechanical connection or electrical connection; it can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in the present invention can be understood in a specific case by those skilled in the art. Further, in the description of the present invention, the meaning of "a plurality" is two or more unless otherwise specified.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code that includes one or more executable instructions for implementing the steps of a particular logical function or process. And the scope of the preferred embodiments of the invention includes additional implementations, in which the functions may be performed in a substantially simultaneous manner or in an opposite order depending on the functions involved, in the order shown or discussed. It will be understood by those skilled in the art to which the embodiments of the present invention pertain.
为了解决在视频通话时传送的视频数据量过大的问题,本发明提出一种虚拟视频通话方法和终端。下面参考附图描述根据本发明实施例的虚拟视频通话方法和终端。In order to solve the problem that the amount of video data transmitted during a video call is excessive, the present invention provides a virtual video calling method and terminal. A virtual video call method and terminal according to an embodiment of the present invention are described below with reference to the accompanying drawings.
一种虚拟视频通话方法,包括以下步骤:采集第一终端用户的视频图像;对视频图像进行面部识别以获取面部表情信息;将面部表情信息发送至与第一终端建立通话的第二终端,面部表情信息用于使第二终端根据面部表情信息和预设在第二终端的人脸图像模型合 成视频图像并显示。A virtual video calling method, comprising the steps of: collecting a video image of a first terminal user; performing face recognition on the video image to obtain facial expression information; and transmitting the facial expression information to a second terminal that establishes a call with the first terminal, the face The expression information is used to enable the second terminal to combine the facial expression information with the face image model preset in the second terminal. Become a video image and display it.
图1是根据本发明一个实施例的虚拟视频通话方法的流程图。1 is a flow chart of a virtual video call method in accordance with one embodiment of the present invention.
如图1所示,虚拟视频通话方法包括以下步骤:As shown in FIG. 1, the virtual video calling method includes the following steps:
S101,采集第一终端用户的视频图像。S101. Collect a video image of the first terminal user.
具体地,第一终端可通过自带或者外设的摄像头进行拍摄,以采集第一终端用户的视频图像。Specifically, the first terminal may perform shooting by using a camera of a self-contained or peripheral device to collect a video image of the first terminal user.
S102,对视频图像进行面部识别以获取面部表情信息。S102. Perform facial recognition on the video image to obtain facial expression information.
具体地,第一终端可通过现有的多种计算机图像处理技术对视频图像进行面部识别以获取面部表情信息,例如遗传算法的人脸识别、神经网络的人脸识别等。面部表情的数据量非常小。在后续的实施例中将详细叙述面部表情的获取过程。Specifically, the first terminal may perform facial recognition on the video image by using various computer image processing technologies to obtain facial expression information, such as face recognition by genetic algorithm, face recognition of a neural network, and the like. The amount of data on facial expressions is very small. The process of acquiring facial expressions will be described in detail in the subsequent embodiments.
S103,将面部表情信息发送至与第一终端建立通话的第二终端,面部表情信息用于使第二终端根据面部表情信息和预设在第二终端的人脸图像模型合成视频图像并显示。S103. Send the facial expression information to the second terminal that establishes a call with the first terminal, where the facial expression information is used to cause the second terminal to synthesize and display the video image according to the facial expression information and the facial image model preset in the second terminal.
其中,第一终端通过服务器向第二终端发送视频通话请求,或者第二终端通过服务器向第一终端发送视频通话请求。如果第二终端同意了第一终端的视频通话请求,或者第一终端同意了第二终端的视频通话请求,服务器即可建立第一终端和第二终端之间的视频通话。The first terminal sends a video call request to the second terminal by using the server, or the second terminal sends a video call request to the first terminal by using the server. If the second terminal agrees to the video call request of the first terminal, or the first terminal agrees to the video call request of the second terminal, the server can establish a video call between the first terminal and the second terminal.
具体地,第一终端可将第一终端用户的面部表情信息进行编码,形成数字表达,并将该面部表情信息通过服务器建立的视频通话发送至第二终端。Specifically, the first terminal may encode the facial expression information of the first terminal user to form a digital expression, and send the facial expression information to the second terminal by using the video call established by the server.
在第一终端将第一终端用户的面部表情信息发送至第二终端之后,第二终端可根据第一终端用户的面部表情信息和预设的人脸图像模型进行合成以重现第一终端用户的面部图像,并显示在第二终端的视频通话界面中。其中,预设的人脸图像模型可为用户自己设定的,也可为服务器默认设定的。此外,第二终端的用户还可以采用自己的照片或者第一终端用户的照片和面部表情信息进行合成以重现第一终端用户的面部图像。After the first terminal sends the facial expression information of the first terminal user to the second terminal, the second terminal may synthesize according to the facial expression information of the first terminal user and the preset facial image model to reproduce the first terminal user. The facial image is displayed in the video call interface of the second terminal. The preset face image model can be set by the user himself or by default. In addition, the user of the second terminal may also synthesize with his own photo or photo and facial expression information of the first terminal user to reproduce the facial image of the first terminal user.
另外,视频可以看作是一帧一帧的视频图像,在第一终端,获取每帧图像的面部表情信息,在第二终端,也是每帧图像进行面部表情信息的合成,由此实现虚拟视频通话,其中合成过程为现有技术,在此不在赘述。In addition, the video can be regarded as a video image of one frame and one frame. In the first terminal, the facial expression information of each frame image is acquired, and in the second terminal, the facial expression information is also synthesized in each frame image, thereby realizing the virtual video. The call, wherein the synthesis process is prior art, is not described here.
本发明实施例的虚拟视频通话方法,利用面部识别技术在发送端(例如,第一终端)提取面部表情信息,在接收端(例如,第二终端)根据发送的面部表情信息和预设人脸图像模型实现人脸图像简单的合成和还原,由于在发送端和接收端传输的信息仅限于面部表情信息,并由于该面部表情信息无需合成完整的人脸图像,所包括的信息量少,编码之后面部表情信息的数据量可仅占用几个比特字节,因此与背景技术传输的信息相比,大大降低了视频通话过程中传送的数据量,节省了数据流量,从而使得视频通话更加流畅,降低 了网络带宽有限或者流量受限对视频通话的影响,特别适合在移动网络中传输,提升了用户体验。另外,无需在第二终端重建第一终端用户的人脸图像模型,第二终端只需要根据面部表情信息在预设的人脸图像模型上显示对应的面部表情即可,使得第二终端易于调整。The virtual video calling method of the embodiment of the present invention uses facial recognition technology to extract facial expression information at a transmitting end (for example, a first terminal), and at the receiving end (for example, the second terminal) according to the sent facial expression information and the preset facial face. The image model realizes simple synthesis and restoration of the face image, since the information transmitted at the transmitting end and the receiving end is limited to the facial expression information, and since the facial expression information does not need to synthesize a complete facial image, the amount of information included is small, and the encoding is small. After that, the data volume of the facial expression information can occupy only a few bit bytes, so compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, and the video call is smoother. Lower The limited network bandwidth or limited traffic impact on video calls, especially suitable for transmission in mobile networks, improving the user experience. In addition, the second terminal does not need to reconstruct the face image model of the first terminal user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust. .
在本方面的一个实施例中,对视频图像进行面部识别以获取面部表情信息(即S102),包括:对视频图像进行面部识别以获得面部特征,在面部特征中提取面部表情信息。In one embodiment of the present aspect, facial recognition of the video image to obtain facial expression information (ie, S102) includes facial recognition of the video image to obtain facial features, and facial expression information is extracted in the facial features.
具体地,首先,从视频图像中提取面部特征,面部特征可以但不限于包括人脸特征(如眼、鼻、嘴、耳等)的几何信息,例如,眉毛的位置、嘴巴的角度、眼睛的大小等。应当理解,还可以通过其他方法获取面部特征,对于未来新的人脸识别技术,本实施例的第一终端都可以使用它对视频图像进行面部识别,从而获取面部特征。之后,在面部特征中提取面部表情信息,第一终端可根据面部特征分析以获取第一终端用户的面部表情信息。Specifically, first, facial features are extracted from a video image, which may be, but is not limited to, geometric information including facial features such as eyes, nose, mouth, ears, etc., such as the position of the eyebrows, the angle of the mouth, the eyes Size and so on. It should be understood that the facial features can also be obtained by other methods. For the new face recognition technology in the future, the first terminal of the embodiment can use the facial recognition of the video image to obtain facial features. Thereafter, the facial expression information is extracted in the facial feature, and the first terminal may analyze the facial feature to obtain facial expression information of the first terminal user.
在本方面的一个实施例中,面部表情信息包括以下内容中的一个或多个:是否皱眉、嘴巴张开或合上、嘴角的弧度、眼睛睁开或闭上、眼睛的大小、是否有眼泪等。In one embodiment of the present aspect, the facial expression information includes one or more of the following: whether to frown, open or close the mouth, the curvature of the corner of the mouth, the opening or closing of the eye, the size of the eye, whether there is tears Wait.
另外,面部表情信息主要反映人的情绪信息,例如,通过对眉毛的位置、嘴巴的角度、眼睛的大小等进行分析可以获取用户的表情是微笑、大笑、大哭、郁闷、兴奋或生气等等。同样地,可采取现有的多种面部表情信息分析技术进行分析,例如,机器学习算法等,此外,未来具有相似功能的算法,本实施例的第一终端都可以使用它进行面部特征分析以获取面部表情信息。In addition, the facial expression information mainly reflects the emotional information of the person. For example, by analyzing the position of the eyebrows, the angle of the mouth, the size of the eyes, etc., the expression of the user can be obtained by smiling, laughing, crying, depressed, excited or angry. Wait. Similarly, the existing multiple facial expression information analysis techniques can be used for analysis, for example, a machine learning algorithm, etc. In addition, in the future, an algorithm having similar functions can be used by the first terminal of the embodiment to perform facial feature analysis. Get facial expression information.
第一终端可将第一终端用户的面部表情信息进行编码,形成数字表达,例如,可以是简单的几个字符且仅占用几个比特,如,对“大笑”可以直接发送字符“D:”进行编码传输等,当然编码方式可以更加丰富,在此仅为了方便理解举例说明,并将该面部表情信息通过服务器建立的视频通话发送至第二终端。The first terminal may encode the facial expression information of the first terminal user to form a digital expression. For example, it may be a simple number of characters and only occupy a few bits. For example, the character "D:" may be directly sent to "Laughter". "Encoding transmission, etc., of course, the encoding method can be more abundant. Here, for convenience of understanding only, the facial expression information is sent to the second terminal through the video call established by the server.
需要说明的是,预设的人脸图像模型更是多样化。在本发明的一个实施例中,预设在第二终端的人脸图像模型包括真实人脸图像模型和卡通人脸图像模型。此外,还可以是第二终端中存储的照片等。It should be noted that the preset face image model is more diverse. In an embodiment of the present invention, the face image model preset at the second terminal includes a real face image model and a cartoon face image model. In addition, it may be a photo or the like stored in the second terminal.
为了使得视频通话过程更加个性化,提高乐趣性,第二终端用户可以根据自己需求选择喜欢的卡通人脸图像模型,在本发明的一个实施例中,虚拟视频通话方法还包括:第二终端向第二终端的用户提供至少一个卡通人脸图像模型;第二终端接收第二终端的用户选择的卡通人脸图像模型,并根据面部表情信息和选择的人脸图像模型进行合成并显示。具体地,在第二终端的用户根据自己的需求为第一终端用户选择喜欢的卡通人脸图像模型之后,第二终端接收第二终端的用户选择的卡通人脸图像模型,并根据第一终端用户的面部表情信息和第二终端用户选择的卡通人脸图像模型进行合成以重现第一终端用户的面部图像,并显示在第二终端视频通话界面中。例如,第一终端用户的面部表情信息为嘴巴张开、 嘴角的弧度很多、眼睛微眯,此时第一终端用户在大笑,第二终端用户选择的是超人的人脸图像模型,第二终端将第一终端用户的面部表情信息和超人的卡通图像进行合成以重现第一终端用户的面部表情为大笑的图像。In order to make the video call process more personalized and fun, the second terminal user can select a favorite cartoon face image model according to the needs of the user. In an embodiment of the present invention, the virtual video call method further includes: The user of the second terminal provides at least one cartoon face image model; the second terminal receives the cartoon face image model selected by the user of the second terminal, and synthesizes and displays according to the facial expression information and the selected face image model. Specifically, after the user of the second terminal selects a favorite cartoon face image model for the first terminal user according to his own needs, the second terminal receives the cartoon face image model selected by the user of the second terminal, and according to the first terminal The facial expression information of the user and the cartoon face image model selected by the second terminal user are combined to reproduce the facial image of the first terminal user and displayed in the second terminal video call interface. For example, the facial expression information of the first end user is that the mouth is open, The curvature of the corner of the mouth is many and the eyes are slightly stunned. At this time, the first end user is laughing, the second terminal user selects the superhuman face image model, and the second terminal sets the facial expression information of the first terminal user and the superman cartoon image. An image is synthesized to reproduce the facial expression of the first end user as a laugh.
本发明的实施例还提出另一种虚拟视频通话方法。Another embodiment of the present invention also proposes a virtual video calling method.
图2是根据本发明另一个实施例的虚拟视频通话方法的流程图。2 is a flow chart of a virtual video call method in accordance with another embodiment of the present invention.
如图2所示,虚拟视频通话方法包括以下步骤:As shown in FIG. 2, the virtual video calling method includes the following steps:
S201,接收与第二终端建立通话的第一终端发送的视频图像的面部表情信息。S201. Receive facial expression information of a video image sent by the first terminal that establishes a call with the second terminal.
具体地,首先,第一终端通过服务器向第二终端发送视频通话请求,或者第二终端通过服务器向第一终端发送视频通话请求。如果第二终端同意了第一终端的视频通话请求,或者第一终端同意了第二终端的视频通话请求,服务器即可建立第一终端和第二终端之间的视频通话。Specifically, first, the first terminal sends a video call request to the second terminal by using the server, or the second terminal sends a video call request to the first terminal by using the server. If the second terminal agrees to the video call request of the first terminal, or the first terminal agrees to the video call request of the second terminal, the server can establish a video call between the first terminal and the second terminal.
其中,第一终端可通过自带或者外设的摄像头进行拍摄,以采集第一终端用户的视频图像,并可以根据上述任一项实施例所述的方法获取面部表情信息并发送至第二终端。The first terminal can capture the video image of the first terminal user by using a camera of the self-contained or peripheral device, and can obtain the facial expression information according to the method described in any of the foregoing embodiments and send the facial expression information to the second terminal. .
S202,根据面部表情信息和预设在第二终端的人脸图像模型合成视频图像并显示。S202. Synthesize a video image according to the facial expression information and the face image model preset in the second terminal and display.
具体地,第二终端可根据第一终端用户的面部表情信息和预设的人脸图像模型进行合成以重现第一终端用户的面部图像,并显示在第二终端的视频通话界面中。其中,预设的人脸图像模型可为用户自己设定的,也可为服务器默认设定的。此外,第二终端的用户还可以采用自己的照片或者第一终端用户的照片显示作为人脸图像模型以重现第一终端用户的面部图像。Specifically, the second terminal may synthesize according to the facial expression information of the first terminal user and the preset facial image model to reproduce the facial image of the first terminal user, and display it in the video call interface of the second terminal. The preset face image model can be set by the user himself or by default. In addition, the user of the second terminal may also use his own photo or photo display of the first terminal user as a face image model to reproduce the facial image of the first terminal user.
本发明实施例的虚拟视频通话方法,利用面部识别技术在发送端(例如,第一终端)提取面部表情信息,在接收端(例如,第二终端)根据发送的面部表情信息和预设人脸图像模型实现人脸图像简单的合成和还原,由于在发送端和接收端传输的信息仅限于面部表情信息,并由于该面部表情信息无需合成完整的人脸图像,所包括的信息量少,编码之后面部表情信息的数据量可仅占用几个比特字节,因此与背景技术传输的信息相比,大大降低了视频通话过程中传送的数据量,节省了数据流量,从而使得视频通话更加流畅,降低了网络带宽有限或者流量受限对视频通话的影响,特别适合在移动网络中传输,提升了用户体验。另外,无需在第二终端重建第一终端用户的人脸图像模型,第二终端只需要根据面部表情信息在预设的人脸图像模型上显示对应的面部表情即可,使得第二终端易于调整。The virtual video calling method of the embodiment of the present invention uses facial recognition technology to extract facial expression information at a transmitting end (for example, a first terminal), and at the receiving end (for example, the second terminal) according to the sent facial expression information and the preset facial face. The image model realizes simple synthesis and restoration of the face image, since the information transmitted at the transmitting end and the receiving end is limited to the facial expression information, and since the facial expression information does not need to synthesize a complete facial image, the amount of information included is small, and the encoding is small. After that, the data volume of the facial expression information can occupy only a few bit bytes, so compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, and the video call is smoother. It reduces the impact of limited network bandwidth or limited traffic on video calls, and is especially suitable for transmission in mobile networks, which improves the user experience. In addition, the second terminal does not need to reconstruct the face image model of the first terminal user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust. .
图3是根据本发明又一个实施例的虚拟视频通话方法的流程图。3 is a flow chart of a virtual video call method in accordance with yet another embodiment of the present invention.
如图3所示,虚拟视频通话方法包括以下步骤:As shown in FIG. 3, the virtual video calling method includes the following steps:
S301,接收与第二终端建立通话的第一终端发送的视频图像的面部表情信息。 S301. Receive facial expression information of a video image sent by the first terminal that establishes a call with the second terminal.
S302,选择真实的或卡通的人脸图像模型,选择的真实的或卡通的人脸图像模型用于与面部表情信息合成视频图像并显示。S302, selecting a real or cartoon face image model, and selecting a real or cartoon face image model for synthesizing the video image with the facial expression information and displaying.
具体地,为了使得视频通话过程更加个性化,提高乐趣性,第二终端可以向用户提供多个真实的或卡通的人脸图像模型,例如,多个卡通人脸图像模型、或者照片、真实人脸图像模型等,第二终端用户可以根据自己需求选择自己喜欢的人脸图像模型。例如,第一终端用户的面部表情信息嘴巴张开、嘴角的弧度很多、眼睛微眯,此时第一终端用户在为大笑,第二终端用户选择的是超人的人脸图像模型,第二终端将第一终端用户的面部表情信息和超人的卡通图像进行合成以重现第一终端用户的面部表情为大笑的图像。Specifically, in order to make the video call process more personalized and fun, the second terminal may provide the user with multiple real or cartoon face image models, for example, multiple cartoon face image models, or photos, real people. The face image model, etc., the second end user can select his or her favorite face image model according to his own needs. For example, the facial expression information of the first end user is open, the curvature of the corner of the mouth is large, and the eyes are slightly stunned. At this time, the first terminal user is laughing, the second terminal user is selecting the superhuman face image model, and the second The terminal synthesizes the facial expression information of the first terminal user and the cartoon image of the superman to reproduce the image of the first terminal user's facial expression as a big laugh.
S303,根据选择的真实的或卡通的人脸图像模型和面部表情信息合成视频图像并显示。S303. Synthesize a video image according to the selected real or cartoon face image model and facial expression information and display.
本发明实施例的虚拟视频通话方法,第二终端的用户可以选择真实的或卡通的人脸图像模型,并根据选择的真实的或卡通的人脸图像模型和面部表情信息合成视频图像并显示,增加了趣味性,提升用户体验。In the virtual video calling method of the embodiment of the present invention, the user of the second terminal may select a real or cartoon face image model, and synthesize and display the video image according to the selected real or cartoon face image model and facial expression information. Increased fun and improved user experience.
在本发明的实施例中,为了使得重现的面部图像更具有真实性,第二终端可获取第一终端用户的真实人脸图像模型以进行面部表情重现。具体地,第一终端可通过摄像头拍摄的视频图像,并对拍摄的视频图像进行分析,从而获取真实人脸图像模型,或者第一终端可根据用户自己选择的人脸图像进行分析以获取真实人脸图像模型,无需拍摄,之后发送至第二终端进行存储。In an embodiment of the present invention, in order to make the reproduced facial image more authentic, the second terminal may acquire a real face image model of the first terminal user to perform facial expression reproduction. Specifically, the first terminal may perform a video image captured by the camera and analyze the captured video image to obtain a real face image model, or the first terminal may analyze the face image selected by the user to obtain the real person. The face image model, without taking a picture, is then sent to the second terminal for storage.
另外,第二终端还可以获取第一终端用户的人脸图像,并根据人脸图像进行分析以获取真实人脸图像模型,即真实人脸图像模型可在第二终端中生成。第二终端可根据第一终端用户的真实人脸图像模型和第一终端用户的面部表情信息合成第一终端用户的面部图像,以重现在第二终端的视频通话界面中。由此,可使得重现的面部图像更具有真实性。In addition, the second terminal may further acquire a face image of the first terminal user, and perform analysis according to the face image to obtain a real face image model, that is, the real face image model may be generated in the second terminal. The second terminal may synthesize the facial image of the first terminal user according to the real face image model of the first terminal user and the facial expression information of the first terminal user to reproduce the video call interface of the second terminal. Thereby, the reproduced facial image can be made more authentic.
应当理解,真实人脸图像模型可以只形成一次,发送至第二终端进行存储,在以后的数据发送过程中只发送面部表情信息即可。此外,还可以在第二终端中提供选择按钮,第二终端用户可以选择显示重现第一终端用户真实的面部图像,还是选择卡通人脸图像模型重现面部图像。更具体地,第二终端的用户可根据具体的网络环境和终端性能进行选择,例如,在移动终端中可选择卡通人脸图像模型,并只发送面部表情信息实现视频通话,在个人计算机中可以选择真实人脸图像模型,增加真实感。It should be understood that the real face image model may be formed only once, sent to the second terminal for storage, and only facial expression information may be transmitted during subsequent data transmission. In addition, a selection button may also be provided in the second terminal, and the second terminal user may select whether to display the true facial image of the first terminal user or the cartoon face image model to reproduce the facial image. More specifically, the user of the second terminal may select according to a specific network environment and terminal performance. For example, a cartoon face image model may be selected in the mobile terminal, and only facial expression information is sent to implement a video call, and the personal computer may Choose a real face image model to add realism.
本发明实施例的虚拟视频通话方法,可根据第一终端用户的真实人脸图像模型和面部表情信息重现第一终端用户的面部图像,由此使得重现的面部图像更具有真实性,此外,真实人脸图像模型一次传输可多次使用,不需要接收端在通话的过程中实时重建真实人脸图像模型,简化了接收端的操作过程,提升了用户体验。 The virtual video calling method of the embodiment of the present invention can reproduce the facial image of the first terminal user according to the real face image model and the facial expression information of the first terminal user, thereby making the reproduced facial image more authentic, The real face image model can be used multiple times in one transmission, and the receiving end does not need to reconstruct the real face image model in real time during the call, which simplifies the operation process of the receiving end and improves the user experience.
为了实现上述实施例,本发明还提出一种终端。In order to implement the above embodiments, the present invention also proposes a terminal.
一种终端,包括:采集模块,用于采集用户的视频图像;识别模块,用于对视频图像进行面部识别以获取面部表情信息;发送模块,用于将面部表情信息发送至与终端建立通话的第二终端,面部表情信息用于使第二终端根据面部表情信息和预设在其他终端的人脸图像模型合成视频图像并显示。A terminal includes: an acquisition module, configured to collect a video image of a user; an identification module, configured to perform facial recognition on the video image to obtain facial expression information; and a sending module, configured to send the facial expression information to establish a call with the terminal The second terminal, the facial expression information is used to cause the second terminal to synthesize and display the video image according to the facial expression information and the facial image model preset in other terminals.
图4是根据本发明一个实施例的终端的结构示意图。4 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
如图4所示,终端包括:采集模块110、识别模块120和发送模块130。As shown in FIG. 4, the terminal includes: an acquisition module 110, an identification module 120, and a sending module 130.
具体地,采集模块110用于采集用户的视频图像。更具体地,采集模块110可通过终端自带或者外设的摄像头进行拍摄,以采集用户的视频图像。Specifically, the collection module 110 is configured to collect a video image of the user. More specifically, the acquisition module 110 can capture by the camera of the terminal or the peripheral device to collect the video image of the user.
识别模块120用于对视频图像进行面部识别以获取面部表情信息。更具体地,识别模块120可通过现有的多种计算机图像处理技术对视频图像进行面部识别以获取面部表情信息,例如遗传算法的人脸识别、神经网络的人脸识别等。面部表情信息的数据量非常小。在后续的实施例中将详细叙述面部表情的获取过程。The identification module 120 is configured to perform facial recognition on the video image to obtain facial expression information. More specifically, the recognition module 120 may perform facial recognition on the video image by using various computer image processing technologies to obtain facial expression information, such as face recognition of genetic algorithms, face recognition of a neural network, and the like. The amount of data for facial expression information is very small. The process of acquiring facial expressions will be described in detail in the subsequent embodiments.
发送模块130用于将面部表情信息发送至与终端建立通话的第二终端,面部表情信息用于使第二终端根据面部表情信息和预设在其他终端的人脸图像模型合成视频图像并显示。The sending module 130 is configured to send facial expression information to a second terminal that establishes a call with the terminal, and the facial expression information is used to cause the second terminal to synthesize and display the video image according to the facial expression information and the facial image model preset in other terminals.
其中,终端通过服务器向第二终端发送视频通话请求,或者第二终端通过服务器向终端发送视频通话请求。如果第二终端同意了终端的视频通话请求,或者终端同意了第二终端的视频通话请求,服务器即可建立终端和第二终端之间的视频通话。The terminal sends a video call request to the second terminal by using the server, or the second terminal sends a video call request to the terminal by using the server. If the second terminal agrees to the video call request of the terminal, or the terminal agrees to the video call request of the second terminal, the server can establish a video call between the terminal and the second terminal.
更具体地,发送模块130可将面部表情信息进行编码,形成数字表达,并将该面部表情信息通过服务器建立的视频通话发送至第二终端。More specifically, the sending module 130 may encode the facial expression information to form a digital expression, and send the facial expression information to the second terminal through the video call established by the server.
在将面部表情信息发送至第二终端之后,第二终端可根据面部表情信息和预设的人脸图像模型进行合成以重现终端用户的面部图像,并显示在第二终端的视频通话界面中。其中,预设的人脸图像模型可为用户自己设定的,也可为服务器默认设定的。此外,第二终端的用户还可以采用自己的照片或者终端用户的照片和面部表情信息进行合成以重现第一终端用户的面部图像。After transmitting the facial expression information to the second terminal, the second terminal may synthesize according to the facial expression information and the preset facial image model to reproduce the facial image of the end user, and display it in the video call interface of the second terminal. . The preset face image model can be set by the user himself or by default. In addition, the user of the second terminal may also synthesize with his own photo or photo of the end user and facial expression information to reproduce the facial image of the first terminal user.
本发明实施例的终端,利用面部识别技术提取面部表情信息,使与终端建立通话的第二终端根据发送的面部表情信息和预设人脸图像模型实现人脸图像简单的合成和还原,由于传输的信息仅限于面部表情信息,又由于该面部表情信息无需合成完整的人脸图像,所包括的信息量少,编码之后面部表情信息的数据量可仅占用几个比特字节,因此与背景技术传输的信息相比,大大降低了视频通话过程中传送的数据量,节省了数据流量,从而使得视频通话更加流畅,降低了网络带宽有限或者流量受限对视频通话的影响,特别适合在 移动网络中传输,提升了用户体验。另外,无需在第二终端重建用户的人脸图像模型,第二终端只需要根据面部表情信息在预设的人脸图像模型上显示对应的面部表情即可,使得第二终端易于调整。The terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, so that the second terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, due to transmission The information is limited to the facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image, the amount of information included is small, and the data amount of the facial expression information after encoding can occupy only a few bit bytes, and thus the background art Compared with the transmitted information, the amount of data transmitted during the video call is greatly reduced, and the data traffic is saved, thereby making the video call more smooth, reducing the influence of limited network bandwidth or limited traffic on the video call, especially suitable for Transmission in the mobile network enhances the user experience. In addition, there is no need to reconstruct the face image model of the user in the second terminal, and the second terminal only needs to display the corresponding facial expression on the preset face image model according to the facial expression information, so that the second terminal is easy to adjust.
在本发明的一个实施例中,识别模块120还用于对视频图像进行面部识别以获得面部特征,在面部特征中提取面部表情信息。In an embodiment of the present invention, the identification module 120 is further configured to perform facial recognition on the video image to obtain facial features, and extract facial expression information in the facial features.
具体地,首先,识别模块120从视频图像中提取的面部特征,面部特征可以但不限于包括人脸特征(如眼、鼻、嘴、耳等)的几何信息,例如,眉毛的位置、嘴巴的角度、眼睛的大小等。应当理解,还可以通过其他方法获取面部特征信息,对于未来新的人脸识别技术,都可以使用它对视频图像进行面部识别,从而获取面部特征信息。之后,识别模块120在面部特征中提取面部表情信息,识别模块120可根据面部特征信息分析以获取用户的面部表情信息。Specifically, first, the facial features extracted by the recognition module 120 from the video image may be, but are not limited to, geometric information including facial features such as eyes, nose, mouth, ears, etc., for example, the position of the eyebrows, the mouth Angle, size of the eye, etc. It should be understood that the facial feature information can also be obtained by other methods, and for the new face recognition technology in the future, the video image can be used for face recognition to obtain facial feature information. Thereafter, the recognition module 120 extracts facial expression information in the facial features, and the identification module 120 may analyze the facial feature information to obtain facial expression information of the user.
在本方面的一个实施例中,面部表情信息包括以下内容中的一个或多个:是否皱眉、嘴巴张开或合上、嘴角的弧度、眼睛睁开或闭上、眼睛的大小、是否有眼泪等。In one embodiment of the present aspect, the facial expression information includes one or more of the following: whether to frown, open or close the mouth, the curvature of the corner of the mouth, the opening or closing of the eye, the size of the eye, whether there is tears Wait.
另外,面部表情信息主要反映人的情绪信息,例如,通过对眉毛的位置、嘴巴的角度、眼睛的大小等进行分析可以获取用户的表情是微笑、大笑、大哭、郁闷、兴奋或生气等等。同样地,可采取现有的多种面部表情信息分析技术进行分析,例如,机器学习算法等,此外,未来具有相似功能的算法,都可以使用它进行面部特征信息分析以获取面部表情信息。In addition, the facial expression information mainly reflects the emotional information of the person. For example, by analyzing the position of the eyebrows, the angle of the mouth, the size of the eyes, etc., the expression of the user can be obtained by smiling, laughing, crying, depressed, excited or angry. Wait. Similarly, various facial expression information analysis techniques can be used for analysis, for example, machine learning algorithms, etc. In addition, algorithms with similar functions in the future can be used to perform facial feature information analysis to obtain facial expression information.
在本方面的一个实施例中,面部表情信息包括以下内容中的一个或多个:是否皱眉、嘴巴张开或合上、嘴角的弧度、眼睛睁开或闭上、眼睛的大小、是否有眼泪等。In one embodiment of the present aspect, the facial expression information includes one or more of the following: whether to frown, open or close the mouth, the curvature of the corner of the mouth, the opening or closing of the eye, the size of the eye, whether there is tears Wait.
另外,发送模块130可将面部表情信息进行编码,形成数字表达,例如,可以是简单的几个字符且仅占用几个比特,如,对“大笑”可以直接发送字符“D:”进行编码传输等,当然编码方式可以更加丰富,在此仅为了方便理解举例说明,并将该面部表情信息通过服务器建立的视频通话发送至第二终端。In addition, the sending module 130 may encode the facial expression information to form a digital expression. For example, it may be a simple few characters and occupy only a few bits. For example, the "Laughter" may directly transmit the character "D:" to encode. Transmission, etc., of course, the encoding method can be more abundant, here only for the convenience of understanding the example, and the facial expression information is sent to the second terminal through the video call established by the server.
为了实现上述实施例,本发明还提出另一种终端。In order to implement the above embodiment, the present invention also proposes another terminal.
图5是根据本发明另一个实施例的终端的结构示意图。FIG. 5 is a schematic structural diagram of a terminal according to another embodiment of the present invention.
如图5所示,终端包括:接收模块210和合成模块220。As shown in FIG. 5, the terminal includes: a receiving module 210 and a synthesizing module 220.
具体地,接收模块210用于接收与终端建立通话的第一终端发送的视频图像的面部表情信息。合成模块220用于根据面部表情信息和预设在终端的人脸图像模型合成视频图像并显示。Specifically, the receiving module 210 is configured to receive facial expression information of a video image sent by the first terminal that establishes a call with the terminal. The synthesizing module 220 is configured to synthesize and display the video image according to the facial expression information and the face image model preset in the terminal.
更具体地,合成模块220可根据第一终端用户的面部表情信息和预设的人脸图像模型进行合成以重现第一终端用户的面部图像,并显示在终端的视频通话界面中。其中,预设的人脸图像模型可为用户自己设定的,也可为默认设定的。此外,终端的用户还可以采用 自己的照片或者第一终端用户的照片显示作为人脸图像模型以重现第一终端用户的面部图像。More specifically, the synthesizing module 220 may synthesize according to the facial expression information of the first end user and the preset facial image model to reproduce the facial image of the first end user and display it in the video call interface of the terminal. The preset face image model may be set by the user or may be set by default. In addition, users of the terminal can also adopt The own photo or the photo of the first end user is displayed as a face image model to reproduce the face image of the first end user.
本发明实施例的终端,利用面部识别技术提取面部表情信息,与终端建立通话的第一终端根据发送的面部表情信息和预设人脸图像模型实现人脸图像简单的合成和还原,由于在发送端和接收端传输的信息仅限于面部表情信息,并由于该面部表情信息无需合成完整的人脸图像,所包括的信息量少,编码之后面部表情信息的数据量可仅占用几个比特字节,因此与背景技术传输的信息相比,大大降低了视频通话过程中传送的数据量,节省了数据流量,从而使得视频通话更加流畅,降低了网络带宽有限或者流量受限对视频通话的影响,特别适合在移动网络中传输,提升了用户体验。另外,无需在重建人脸图像模型,只需要根据面部表情信息在预设的人脸图像模型上显示对应的面部表情即可,易于调整,使得终端易于调整。The terminal of the embodiment of the present invention extracts facial expression information by using facial recognition technology, and the first terminal that establishes a call with the terminal realizes simple synthesis and restoration of the facial image according to the sent facial expression information and the preset facial image model, because the sending The information transmitted by the terminal and the receiving end is limited to the facial expression information, and since the facial expression information does not need to be synthesized into a complete facial image, the amount of information included is small, and the data amount of the facial expression information after encoding can occupy only a few bit bytes. Therefore, compared with the information transmitted by the background technology, the amount of data transmitted during the video call is greatly reduced, data traffic is saved, the video call is smoother, and the influence of limited network bandwidth or limited traffic on the video call is reduced. Particularly suitable for transmission in mobile networks, improving the user experience. In addition, there is no need to reconstruct the face image model, and only need to display the corresponding facial expression on the preset face image model according to the facial expression information, which is easy to adjust, so that the terminal is easy to adjust.
图6是根据本发明又一个实施例的终端的结构示意图。FIG. 6 is a schematic structural diagram of a terminal according to still another embodiment of the present invention.
如图6所示,在图5所示的基础上终端还包括:选择模块230。As shown in FIG. 6, the terminal further includes: a selection module 230, as shown in FIG.
具体地,选择模块230用于在接收模块210接收与第二终端建立通话的第一终端发送的视频图像的面部表情信息后,选择真实的或卡通的人脸图像模型,选择的真实的或卡通的人脸图像模型用于与面部表情信息合成视频图像并显示。Specifically, the selecting module 230 is configured to select a real or cartoon face image model after the receiving module 210 receives the facial expression information of the video image sent by the first terminal that establishes the call with the second terminal, and select the real or cartoon The face image model is used to synthesize a video image with facial expression information and display.
更具体地,为了使得视频通话过程更加个性化,提高乐趣性,终端可以向用户提供多个真实的或卡通的人脸图像模型,例如,可以是多个卡通人脸图像模型、或者照片、真实人脸图像模型等,用户可以根据自己需求选择自己喜欢的人脸图像模型。例如,第一终端用户的面部表情信息为大笑,终端用户选择的是超人的人脸图像模型,终端将第一终端用户的面部表情信息和超人的卡通图像进行合成以重现其他终端用户的面部表情为大笑的图像。More specifically, in order to make the video call process more personalized and fun, the terminal may provide the user with multiple real or cartoon face image models, for example, may be multiple cartoon face image models, or photos, real Face image models, etc., users can choose their favorite face image model according to their needs. For example, the facial expression information of the first terminal user is a big laugh, and the terminal user selects a superhuman face image model, and the terminal synthesizes the facial expression information of the first end user and the cartoon image of the superman to reproduce the other end users. The facial expression is a picture of a big laugh.
由此,用户可以选择真实的或卡通的人脸图像模型,并根据选择的真实的或卡通的人脸图像模型和面部表情信息合成视频图像并显示,增加了趣味性,提升用户体验。Thereby, the user can select a real or cartoon face image model, and synthesize and display the video image according to the selected real or cartoon face image model and facial expression information, thereby increasing the interest and improving the user experience.
为了实现上述目的,本发明还提出一种终端设备。In order to achieve the above object, the present invention also proposes a terminal device.
本发明实施例的终端设备,包括:一个或者多个处理器;存储器;一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:采集所述终端设备用户的视频图像;对所述视频图像进行面部识别以获取面部表情信息;将所述面部表情信息发送至与所述终端设备建立通话的第二终端,所述面部表情信息用于使所述第二终端根据所述面部表情信息和预设在所述第二终端的人脸图像模型合成视频图像并显示。A terminal device of an embodiment of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when executed by the one or more processors And performing the following operations: collecting a video image of the user of the terminal device; performing facial recognition on the video image to obtain facial expression information; and transmitting the facial expression information to a second terminal that establishes a call with the terminal device, where The facial expression information is used to cause the second terminal to synthesize and display a video image according to the facial expression information and a face image model preset to the second terminal.
为了实现上述目的,本发明还提出另一种终端设备。 In order to achieve the above object, the present invention also proposes another terminal device.
本发明实施例的终端设备,包括:一个或者多个处理器;存储器;一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:接收与所述终端设备建立通话的第一终端发送的视频图像的面部表情信息;根据所述面部表情信息和预设在所述终端设备的人脸图像模型合成视频图像并显示。A terminal device of an embodiment of the present invention includes: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when executed by the one or more processors And performing the following operations: receiving facial expression information of a video image sent by the first terminal that establishes a call with the terminal device; synthesizing the video image according to the facial expression information and a face image model preset in the terminal device, and displaying .
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that portions of the invention may be implemented in hardware, software, firmware or a combination thereof. In the above-described embodiments, multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.
尽管已经示出和描述了本发明的实施例,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。 While the embodiments of the present invention have been shown and described, the embodiments of the invention may The scope of the invention is defined by the claims and their equivalents.

Claims (12)

  1. 一种虚拟视频通话方法,其特征在于,包括:A virtual video calling method, comprising:
    采集第一终端用户的视频图像;Collecting video images of the first terminal user;
    对所述视频图像进行面部识别以获取面部表情信息;Performing facial recognition on the video image to obtain facial expression information;
    将所述面部表情信息发送至与所述第一终端建立通话的第二终端,所述面部表情信息用于使所述第二终端根据所述面部表情信息和预设在所述第二终端的人脸图像模型合成视频图像并显示。Transmitting the facial expression information to a second terminal that establishes a call with the first terminal, where the facial expression information is used to cause the second terminal to be preset according to the facial expression information and preset at the second terminal The face image model synthesizes a video image and displays it.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述视频图像进行面部识别以获取面部表情信息,包括:The method according to claim 1, wherein the performing face recognition on the video image to obtain facial expression information comprises:
    对所述视频图像进行面部识别以获得面部特征,在所述面部特征中提取所述面部表情信息。Facial recognition is performed on the video image to obtain a facial feature, and the facial expression information is extracted in the facial feature.
  3. 根据权利要求1或2所述的方法,其特征在于,所述面部表情信息包括以下内容中的一个或多个:是否皱眉、嘴巴张开或合上、嘴角的弧度、眼睛睁开或闭上、眼睛的大小、是否有眼泪。The method according to claim 1 or 2, wherein the facial expression information comprises one or more of the following: whether frowning, opening or closing of the mouth, curvature of the corner of the mouth, opening or closing of the eye The size of the eyes, whether there are tears.
  4. 一种虚拟视频通话方法,其特征在于,包括:A virtual video calling method, comprising:
    接收与第二终端建立通话的第一终端发送的视频图像的面部表情信息;Receiving facial expression information of a video image sent by the first terminal that establishes a call with the second terminal;
    根据所述面部表情信息和预设在所述第二终端的人脸图像模型合成视频图像并显示。And synthesizing a video image according to the facial expression information and a face image model preset to the second terminal and displaying.
  5. 根据权利要求4所述的方法,其特征在于,在所述接收与第二终端建立通话的第一终端发送的视频图像的面部表情信息后,还包括:The method according to claim 4, further comprising: after receiving the facial expression information of the video image sent by the first terminal that establishes a call with the second terminal,
    选择真实的或卡通的人脸图像模型,所述选择的真实的或卡通的人脸图像模型用于与所述面部表情信息合成视频图像并显示。A real or cartoon face image model is selected, and the selected real or cartoon face image model is used to synthesize and display the video image with the facial expression information.
  6. 一种终端,其特征在于,包括:A terminal, comprising:
    采集模块,用于采集用户的视频图像;An acquisition module, configured to collect a video image of the user;
    识别模块,用于对所述视频图像进行面部识别以获取面部表情信息;An identification module, configured to perform facial recognition on the video image to obtain facial expression information;
    发送模块,用于将所述面部表情信息发送至与终端建立通话的第二终端,所述面部表情信息用于使所述第二终端根据所述面部表情信息和预设在所述第二终端的人脸图像模型合成视频图像并显示。a sending module, configured to send the facial expression information to a second terminal that establishes a call with the terminal, where the facial expression information is used to cause the second terminal to be preset to the second terminal according to the facial expression information The face image model synthesizes the video image and displays it.
  7. 根据权利要求6所述的终端,其特征在于,所述识别模块还用于对所述视频图像进行面部识别以获得面部特征,在所述面部特征中提取所述面部表情信息。The terminal according to claim 6, wherein the identification module is further configured to perform face recognition on the video image to obtain a facial feature, and extract the facial expression information in the facial feature.
  8. 根据权利要求6或7所述的终端,其特征在于,所述面部表情信息包括以下内 容中的一个或多个:是否皱眉、嘴巴张开或合上、嘴角的弧度、眼睛睁开或闭上、眼睛的大小、是否有眼泪。The terminal according to claim 6 or 7, wherein the facial expression information includes the following One or more of the contents: whether to frown, open or close the mouth, the curvature of the corner of the mouth, the eyes open or close, the size of the eyes, whether there is tears.
  9. 一种终端,其特征在于,包括:A terminal, comprising:
    接收模块,用于接收与终端建立通话的第一终端发送的视频图像的面部表情信息;a receiving module, configured to receive facial expression information of a video image sent by the first terminal that establishes a call with the terminal;
    合成模块,用于根据所述面部表情信息和预设在所述终端的人脸图像模型合成视频图像并显示。And a synthesizing module, configured to synthesize and display the video image according to the facial expression information and a face image model preset in the terminal.
  10. 根据权利要求9所述的终端,其特征在于,还包括:The terminal according to claim 9, further comprising:
    选择模块,用于在所述接收模块接收与第二终端建立通话的第一终端发送的视频图像的面部表情信息后,选择真实的或卡通的人脸图像模型,所述选择的真实的或卡通的人脸图像模型用于与所述面部表情信息合成视频图像并显示。a selection module, configured to select a real or cartoon face image model after the receiving module receives the facial expression information of the video image sent by the first terminal that establishes a call with the second terminal, the selected real or cartoon The face image model is used to synthesize a video image with the facial expression information and display it.
  11. 一种终端设备,其特征在于,包括:A terminal device, comprising:
    一个或者多个处理器;One or more processors;
    存储器;Memory
    一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:One or more programs, the one or more programs being stored in the memory, and when executed by the one or more processors, do the following:
    采集所述终端设备用户的视频图像;Collecting a video image of the user of the terminal device;
    对所述视频图像进行面部识别以获取面部表情信息;Performing facial recognition on the video image to obtain facial expression information;
    将所述面部表情信息发送至与所述终端设备建立通话的第二终端,所述面部表情信息用于使所述第二终端根据所述面部表情信息和预设在所述第二终端的人脸图像模型合成视频图像并显示。Transmitting the facial expression information to a second terminal that establishes a call with the terminal device, where the facial expression information is used to cause the second terminal to be based on the facial expression information and a person preset in the second terminal The face image model synthesizes the video image and displays it.
  12. 一种终端设备,其特征在于,包括:A terminal device, comprising:
    一个或者多个处理器;One or more processors;
    存储器;Memory
    一个或者多个程序,所述一个或者多个程序存储在所述存储器中,当被所述一个或者多个处理器执行时进行如下操作:One or more programs, the one or more programs being stored in the memory, and when executed by the one or more processors, do the following:
    接收与所述终端设备建立通话的第一终端发送的视频图像的面部表情信息;Receiving facial expression information of a video image sent by the first terminal that establishes a call with the terminal device;
    根据所述面部表情信息和预设在所述终端设备的人脸图像模型合成视频图像并显示。 And synthesizing a video image according to the facial expression information and a face image model preset to the terminal device and displaying.
PCT/CN2014/093187 2013-12-20 2014-12-05 Virtual video call method and terminal WO2015090147A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2016543309A JP2016537922A (en) 2013-12-20 2014-12-05 Pseudo video call method and terminal
KR1020157036602A KR101768980B1 (en) 2013-12-20 2014-12-05 Virtual video call method and terminal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310714667.3 2013-12-20
CN201310714667.3A CN103647922A (en) 2013-12-20 2013-12-20 Virtual video call method and terminals

Publications (1)

Publication Number Publication Date
WO2015090147A1 true WO2015090147A1 (en) 2015-06-25

Family

ID=50253066

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/093187 WO2015090147A1 (en) 2013-12-20 2014-12-05 Virtual video call method and terminal

Country Status (4)

Country Link
JP (1) JP2016537922A (en)
KR (1) KR101768980B1 (en)
CN (1) CN103647922A (en)
WO (1) WO2015090147A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105245702A (en) * 2015-09-25 2016-01-13 北京奇虎科技有限公司 Method and system for optimizing caller ID image, and client
CN105263040A (en) * 2015-10-08 2016-01-20 安徽理工大学 Method for watching ball game live broadcast in mobile phone flow saving mode
CN111814652A (en) * 2020-07-03 2020-10-23 广州视源电子科技股份有限公司 Virtual portrait rendering method, device and storage medium
CN112116548A (en) * 2020-09-28 2020-12-22 北京百度网讯科技有限公司 Method and device for synthesizing face image
CN113220500A (en) * 2020-02-05 2021-08-06 伊姆西Ip控股有限责任公司 Recovery method, apparatus and program product based on reverse differential recovery
CN111240482B (en) * 2020-01-10 2023-06-30 北京字节跳动网络技术有限公司 Special effect display method and device

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103647922A (en) * 2013-12-20 2014-03-19 百度在线网络技术(北京)有限公司 Virtual video call method and terminals
CN105282621A (en) * 2014-07-22 2016-01-27 中兴通讯股份有限公司 Method and device for achieving voice message visualized service
CN104333730B (en) * 2014-11-26 2019-03-15 北京奇艺世纪科技有限公司 A kind of video communication method and device
CN106162052A (en) * 2015-03-30 2016-11-23 联想(北京)有限公司 Transmitting video image and device thereof, the method for display image and device thereof
CN106162042A (en) * 2015-04-13 2016-11-23 中兴通讯股份有限公司 A kind of method of video conference, server and terminal
CN105407313A (en) * 2015-10-28 2016-03-16 掌赢信息科技(上海)有限公司 Video calling method, equipment and system
CN105847734A (en) * 2016-03-30 2016-08-10 宁波三博电子科技有限公司 Face recognition-based video communication method and system
CN107465885A (en) * 2016-06-06 2017-12-12 中兴通讯股份有限公司 A kind of method and apparatus for realizing video communication
CN106101858A (en) * 2016-06-27 2016-11-09 乐视控股(北京)有限公司 A kind of video generation method and device
CN107690055A (en) * 2016-08-05 2018-02-13 中兴通讯股份有限公司 The control method of video calling, apparatus and system
CN106506908A (en) * 2016-10-26 2017-03-15 宇龙计算机通信科技(深圳)有限公司 A kind of image synthesizing method and device
CN106649712B (en) * 2016-12-20 2020-03-03 北京小米移动软件有限公司 Method and device for inputting expression information
CN108347578B (en) * 2017-01-23 2020-05-08 腾讯科技(深圳)有限公司 Method and device for processing video image in video call
KR102256110B1 (en) * 2017-05-26 2021-05-26 라인 가부시키가이샤 Method for image compression and method for image restoration
CN107396035A (en) * 2017-08-15 2017-11-24 罗静 Stereoscopic face expression reappears component, device and 3D tele-conferencing systems
CN107509081A (en) * 2017-08-22 2017-12-22 姚静洁 A kind of image compression encoding method for recognition of face
CN108174141B (en) * 2017-11-30 2019-12-31 维沃移动通信有限公司 Video communication method and mobile device
CN107911644B (en) * 2017-12-04 2020-05-08 吕庆祥 Method and device for carrying out video call based on virtual face expression
CN108377356B (en) * 2018-01-18 2020-07-28 上海掌门科技有限公司 Method, apparatus and computer readable medium for video call based on virtual image
CN108271058B (en) * 2018-02-02 2021-10-12 阿里巴巴(中国)有限公司 Video interaction method, user client, server and storage medium
CN109803109B (en) * 2018-12-17 2020-07-31 中国科学院深圳先进技术研究院 Wearable augmented reality remote video system and video call method
CN109740476B (en) * 2018-12-25 2021-08-20 北京琳云信息科技有限责任公司 Instant messaging method, device and server
JP7277145B2 (en) * 2019-01-10 2023-05-18 株式会社Iriam Live communication system with characters
CN109831638B (en) * 2019-01-23 2021-01-08 广州视源电子科技股份有限公司 Video image transmission method and device, interactive intelligent panel and storage medium
CN110460719B (en) * 2019-07-23 2021-06-18 维沃移动通信有限公司 Voice communication method and mobile terminal
CN110650306B (en) * 2019-09-03 2022-04-15 平安科技(深圳)有限公司 Method and device for adding expression in video chat, computer equipment and storage medium
CN110599359B (en) * 2019-09-05 2022-09-16 深圳追一科技有限公司 Social contact method, device, system, terminal equipment and storage medium
CN110769186A (en) * 2019-10-28 2020-02-07 维沃移动通信有限公司 Video call method, first electronic device and second electronic device
CN113099150B (en) * 2020-01-08 2022-12-02 华为技术有限公司 Image processing method, device and system
CN111641798A (en) * 2020-06-15 2020-09-08 黑龙江科技大学 Video communication method and device
CN111669662A (en) * 2020-07-03 2020-09-15 海信视像科技股份有限公司 Display device, video call method and server
CN112218034A (en) * 2020-10-13 2021-01-12 北京字节跳动网络技术有限公司 Video processing method, system, terminal and storage medium
CN112235531A (en) * 2020-10-15 2021-01-15 北京字节跳动网络技术有限公司 Video processing method, device, terminal and storage medium
CN114710640B (en) * 2020-12-29 2023-06-27 华为技术有限公司 Video call method, device and terminal based on virtual image
CN113066497A (en) * 2021-03-18 2021-07-02 Oppo广东移动通信有限公司 Data processing method, device, system, electronic equipment and readable storage medium
CN113613048A (en) * 2021-07-30 2021-11-05 武汉微派网络科技有限公司 Virtual image expression driving method and system
WO2023195426A1 (en) * 2022-04-05 2023-10-12 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Decoding device, encoding device, decoding method, and encoding method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254336A (en) * 2011-07-14 2011-11-23 清华大学 Method and device for synthesizing face video
CN102479388A (en) * 2010-11-22 2012-05-30 北京盛开互动科技有限公司 Expression interaction method based on face tracking and analysis
CN102542586A (en) * 2011-12-26 2012-07-04 暨南大学 Personalized cartoon portrait generating system based on mobile terminal and method
CN103415003A (en) * 2013-08-26 2013-11-27 苏州跨界软件科技有限公司 Virtual figure communication system
CN103647922A (en) * 2013-12-20 2014-03-19 百度在线网络技术(北京)有限公司 Virtual video call method and terminals

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004104424A (en) * 2002-09-09 2004-04-02 Oki Electric Ind Co Ltd System and method for distributing data
CN102271241A (en) * 2011-09-02 2011-12-07 北京邮电大学 Image communication method and system based on facial expression/action recognition
CN103369289B (en) * 2012-03-29 2016-05-04 深圳市腾讯计算机系统有限公司 A kind of communication means of video simulation image and device
US9357174B2 (en) * 2012-04-09 2016-05-31 Intel Corporation System and method for avatar management and selection
CN103368929B (en) * 2012-04-11 2016-03-16 腾讯科技(深圳)有限公司 A kind of Video chat method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479388A (en) * 2010-11-22 2012-05-30 北京盛开互动科技有限公司 Expression interaction method based on face tracking and analysis
CN102254336A (en) * 2011-07-14 2011-11-23 清华大学 Method and device for synthesizing face video
CN102542586A (en) * 2011-12-26 2012-07-04 暨南大学 Personalized cartoon portrait generating system based on mobile terminal and method
CN103415003A (en) * 2013-08-26 2013-11-27 苏州跨界软件科技有限公司 Virtual figure communication system
CN103647922A (en) * 2013-12-20 2014-03-19 百度在线网络技术(北京)有限公司 Virtual video call method and terminals

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105245702A (en) * 2015-09-25 2016-01-13 北京奇虎科技有限公司 Method and system for optimizing caller ID image, and client
CN105245702B (en) * 2015-09-25 2018-09-11 北京奇虎科技有限公司 Optimization method, client and the system of caller identification image
CN105263040A (en) * 2015-10-08 2016-01-20 安徽理工大学 Method for watching ball game live broadcast in mobile phone flow saving mode
CN111240482B (en) * 2020-01-10 2023-06-30 北京字节跳动网络技术有限公司 Special effect display method and device
CN113220500A (en) * 2020-02-05 2021-08-06 伊姆西Ip控股有限责任公司 Recovery method, apparatus and program product based on reverse differential recovery
CN111814652A (en) * 2020-07-03 2020-10-23 广州视源电子科技股份有限公司 Virtual portrait rendering method, device and storage medium
CN112116548A (en) * 2020-09-28 2020-12-22 北京百度网讯科技有限公司 Method and device for synthesizing face image

Also Published As

Publication number Publication date
JP2016537922A (en) 2016-12-01
CN103647922A (en) 2014-03-19
KR101768980B1 (en) 2017-08-17
KR20160021146A (en) 2016-02-24

Similar Documents

Publication Publication Date Title
WO2015090147A1 (en) Virtual video call method and terminal
CN108876877B (en) Emotion symbol doll
US11503377B2 (en) Method and electronic device for processing data
CN106231434B (en) A kind of living broadcast interactive special efficacy realization method and system based on Face datection
US9210372B2 (en) Communication method and device for video simulation image
WO2018153267A1 (en) Group video session method and network device
JP4449723B2 (en) Image processing apparatus, image processing method, and program
CN110446000B (en) Method and device for generating dialogue figure image
CN109670427B (en) Image information processing method and device and storage medium
CN110401810B (en) Virtual picture processing method, device and system, electronic equipment and storage medium
WO2017211139A1 (en) Method and apparatus for implementing video communication
CN103886632A (en) Method for generating user expression head portrait and communication terminal
WO2007036838A1 (en) Face annotation in streaming video
EP3839768A1 (en) Mediating apparatus and method, and computer-readable recording medium thereof
KR20210060196A (en) Server, method and user device for providing avatar message service
EP4258632A1 (en) Video processing method and related device
US20150181161A1 (en) Information Processing Method And Information Processing Apparatus
WO2020056694A1 (en) Augmented reality communication method and electronic devices
CN109039851B (en) Interactive data processing method and device, computer equipment and storage medium
JP2005018305A (en) Image distributing system and information processor with image communication function
JP7206741B2 (en) HEALTH CONDITION DETERMINATION SYSTEM, HEALTH CONDITION DETERMINATION DEVICE, SERVER, HEALTH CONDITION DETERMINATION METHOD, AND PROGRAM
WO2023159897A1 (en) Video generation method and apparatus
KR20120037712A (en) Imaginary beauty experience service system and method
CN114727120B (en) Live audio stream acquisition method and device, electronic equipment and storage medium
CN115526772A (en) Video processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14871448

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20157036602

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2016543309

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14871448

Country of ref document: EP

Kind code of ref document: A1