WO2018121699A1 - 视频通信方法、设备和终端 - Google Patents

视频通信方法、设备和终端 Download PDF

Info

Publication number
WO2018121699A1
WO2018121699A1 PCT/CN2017/119602 CN2017119602W WO2018121699A1 WO 2018121699 A1 WO2018121699 A1 WO 2018121699A1 CN 2017119602 W CN2017119602 W CN 2017119602W WO 2018121699 A1 WO2018121699 A1 WO 2018121699A1
Authority
WO
WIPO (PCT)
Prior art keywords
character image
dimensional character
image
virtual
dimensional
Prior art date
Application number
PCT/CN2017/119602
Other languages
English (en)
French (fr)
Inventor
于洋
李子军
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018121699A1 publication Critical patent/WO2018121699A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working

Definitions

  • the present disclosure relates to the field of visual communications, for example, to a video communication method, apparatus, and terminal.
  • Visual communication adds video images in the voice communication mode, making the communication process more visual and specific, increasing the amount of information transmitted, satisfying People's sensory needs.
  • Visual communication will be one of the mainstream communication modes in the future.
  • the following embodiments provide a video communication method, device, and terminal, which can solve the problem of single communication effect, poor overall relevance, and low interaction in the visual communication technology.
  • a video communication method includes:
  • the first three-dimensional character image and the second three-dimensional character image are merged into the pre-constructed first virtual three-dimensional scene to obtain a second virtual three-dimensional scene that fuses the character image, so that the second virtual three-dimensional scene is presented in local.
  • the method further includes: transmitting the first current video picture to a peer end.
  • the face recognition result includes: the identified first face image of the local user and the recognized second face image of the peer user;
  • the constructing the first three-dimensional character image of the local user and the second three-dimensional character image of the peer user based on the face recognition result including:
  • the method before the generating the first three-dimensional character image, the method further includes: according to the first face image, and the first face image to the first three-dimensional character image a first size mapping relationship, determining a first size of the first three-dimensional character image;
  • the method further includes: mapping, according to the second face image, the second face image to the second size image of the second three-dimensional character image Determining a second size of the second three-dimensional character image.
  • the merging the first three-dimensional character image and the second three-dimensional character image into the pre-constructed first virtual three-dimensional scene includes:
  • the local shooting angle data is used to indicate a camera shooting angle corresponding to the first current video screen
  • the opposite shooting angle data is used to represent the second The camera shooting angle corresponding to the current video screen
  • the method before the integrating the first three-dimensional character image and the second three-dimensional character image into the first virtual three-dimensional scene, the method further includes: setting the first three-dimensional character image a first location area in the first virtual stereoscopic scene, and a second location area of the second three-dimensional character image in the first virtual stereoscopic scene;
  • the merging the first three-dimensional character image and the second three-dimensional character image into the first virtual three-dimensional scene based on the relative orientation relationship includes: first, based on the relative orientation relationship Determining, in the first virtual stereoscopic scene, the three-dimensional character image and the second three-dimensional character image; determining one or more of the first virtual stereoscopic scene according to the first location area and the second location area Positions of the virtual solid elements; and generating the one or more virtual solid elements at respective locations in the first virtual stereoscopic scene according to the locations of the one or more virtual solid elements.
  • a video communication device includes: an acquisition module, an identification module, a construction module, and a fusion module; wherein
  • Obtaining a module configured to acquire a first current video picture of the local user, and receive a second current video picture of the peer user;
  • the identification module is configured to perform face recognition on the first current video frame and the second current video image respectively to obtain a face recognition result
  • a building module configured to construct a first three-dimensional character image of the local user and a second three-dimensional character image of the opposite user based on the face recognition result
  • a fusion module configured to fuse the first three-dimensional character image and the second three-dimensional character image into a pre-constructed first virtual three-dimensional scene, to obtain a second virtual three-dimensional scene of the fused character image, so that the second The virtual stereo scene is rendered locally.
  • the device further includes:
  • a sending module configured to send the first current video picture to the opposite end.
  • the face recognition result includes: the identified first face image of the local user and the recognized second face image of the peer user;
  • the constructing module is configured to perform edge detection on the entire person image of the local user in the first current video screen to obtain a first edge detection result of the local user; according to the first face image And generating, by the first edge detection result, the first three-dimensional character image;
  • the building module is further configured to: before the generating the first three-dimensional character image, according to the first face image and the first face image to the first three-dimensional character image a first size mapping relationship, determining a first size of the first three-dimensional character image; and,
  • the merging module is configured to acquire local shooting angle data and peer shooting angle data, and determine the first virtual stereoscopic scene according to the local shooting angle data and the opposite end shooting angle data. a relative orientation relationship between the first three-dimensional character image and the second three-dimensional character image; and, merging the first three-dimensional character image and the second three-dimensional character image to the first virtual image based on the relative orientation relationship
  • the local shooting angle data is used to indicate a camera shooting angle corresponding to the first current video image
  • the opposite end shooting angle data is used to indicate a camera shooting corresponding to the second current video image. angle.
  • the fusion module is further configured to set the first three-dimensional character image before the first three-dimensional character image and the second three-dimensional character image are merged into the first virtual three-dimensional scene. a first location area in the first virtual stereoscopic scene, and a second location area of the second three-dimensional character image in the first virtual stereoscopic scene; the fusion module is configured to be based on the relative Positioning the first three-dimensional character image and the second three-dimensional character image in the first virtual three-dimensional scene simultaneously; determining the first according to the first location area and the second location area a location of one or more virtual stereoscopic elements in a virtual stereoscopic scene; and generating the one or more virtualities at respective locations in the first virtual stereoscopic scene according to locations of the one or more virtual stereoscopic elements Stereoscopic elements.
  • a terminal comprising the device of any of the preceding claims.
  • a computer readable storage medium storing computer executable instructions for use in any of the above methods of video communication.
  • a terminal comprising any of the above video communication devices.
  • a terminal comprising:
  • At least one processor At least one processor
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to cause the at least one processor to perform the method described above.
  • FIG. 1a is a flowchart of a video communication method according to an embodiment
  • FIG. 1b is a flowchart of a method in step 103 of FIG. 1a according to an embodiment
  • FIG. 1c is a flowchart of a video communication method according to another embodiment
  • FIG. 2a is a schematic structural diagram of a video communication device according to an embodiment
  • 2b is a schematic structural diagram of a video communication device according to another embodiment
  • FIG. 3 is a schematic structural diagram of a video communication device according to another embodiment.
  • FIG. 4 is a schematic structural diagram of a hardware of a terminal according to an embodiment.
  • the visual communication in the related art can only collect the video picture of the communication party unilaterally and transmit it to the other party of the communication, and only the local picture collected by the local party and the transmitted party picture can be separately displayed on the local end; Different screen sources, different content, and relatively independent, will lead to a single communication effect, poor overall relevance, low interactivity, and can not form a more visual communication atmosphere, resulting in a lower user experience.
  • An embodiment provides a video communication method, device, and terminal, which can implement visual communication between a local user and a peer user.
  • the local and the peer end can be both sides of the video communication, and both the local user and the peer user can use the communication.
  • the functional terminal implements visual communication.
  • the terminal can be a mobile terminal or a fixed terminal.
  • a camera can be set, and the camera is used to collect the image of the user in real time.
  • FIG. 1a is a flowchart of a video communication method according to the embodiment. As shown in FIG. 1a, the process includes the following steps.
  • step 101 the current video screen of the local user is collected, the video screen of the collected local user is sent to the peer end, and the current video screen of the peer user is received.
  • Both the local and the peer can use the camera to capture the user's video picture; After the peer end collects the current video picture of the peer user, the current video picture of the peer user can be sent to the local device in real time, and the current video picture of the peer user can be received locally. In an embodiment, after the peer end collects the current video picture of the peer user, the peer end can video encode the current video picture of the peer user, and send the video coded video data to the local, after the local user receives the video data. Decoding the received video data to obtain the current video picture of the peer user.
  • the current video picture of the local user can be obtained, and the current video picture of the peer user can be received, and the current video picture of the local user and the current video picture of the peer user can be processed.
  • step 102 face recognition is performed on the current video screen of the local user and the current video screen of the peer user, and the three-dimensional character image of the local user and the peer user are constructed based on the face recognition result. Three-dimensional character image.
  • the position of the face in the corresponding image may also be located, and the current video picture of the local user and the current video picture of the opposite user may be synchronized by using the face recognition detection program.
  • the face image is extracted, and the face recognition process is exemplarily explained below.
  • the face recognition process can include:
  • images of a plurality of "human faces” and “non-human faces” are acquired in advance, a sample library is created, and a classifier for distinguishing between “human faces” and “non-human faces” is trained.
  • the detected image is scaled by a certain ratio, and all the regions in the scaled image are detected by using the above classifier, and it is determined whether the currently detected region is a region corresponding to a face (human face) or a non-human face (non-human face). Area.
  • the position and size of the face are determined.
  • the face recognition result may include: a face image of the local user and a face image of the opposite user.
  • edge detection may be performed on the entire person image of the local user to obtain an edge detection result of the local user.
  • Generating a three-dimensional character image of the local user according to the face image of the local user and the edge detection result of the local user; and in the current video screen of the peer user, the whole user of the peer user The edge detection of the character image is performed, and the edge detection result of the peer user is obtained, and the three-dimensional character image of the peer user is generated according to the face image of the peer user and the edge detection result of the peer user.
  • the overall character image area of the local user when generating a three-dimensional character image of a local user, may be determined on the current video screen of the local user. In an embodiment, after determining the overall character image area of the local user, the area of the current user's current video picture other than the overall character image area of the local user may be transparently processed to facilitate later integration.
  • the overall character image area of the opposite user may be determined on the current video screen of the opposite user. After determining the overall character image area of the peer user, the area of the peer user's current video picture other than the overall character image area of the peer user may be transparently processed to facilitate later integration.
  • At least one of the following: the current video picture of the local user, and the current video picture of the peer user, may also be scaled.
  • the current video picture of the local user is unified with the size of the current video picture of the peer user.
  • the three-dimensional character image of the local user may be generated according to the face image of the local user and the template of the three-dimensional character image set in advance.
  • the three-dimensional character image of the opposite user may be generated according to the face image of the opposite user and the template of the three-dimensional character image set in advance.
  • the size of the three-dimensional character image of the local user may also be determined according to the face image of the local user and the size mapping relationship of the face image to the three-dimensional character image.
  • the size mapping relationship between the face image and the three-dimensional character image may be a size conversion relationship between the size of the face image and the face of the three-dimensional character image in the virtual three-dimensional scene.
  • the three-dimensional character image of the opposite user may be determined according to the face image of the opposite user and the size mapping relationship between the face image and the three-dimensional character image. size.
  • the size mapping relationship between the face image and the three-dimensional character image is to convert the face image into a three-dimensional character image in a specific scene.
  • the initial three-dimensional character of the local user or the opposite user when generating the three-dimensional character image of the local user or the opposite user, the initial three-dimensional character of the local user or the opposite user may be adopted by using the augmented reality technology after generating the initial three-dimensional character image of the local user or the opposite user.
  • the image is decorated according to the preset character decoration method, and the three-dimensional character image of the local user or the opposite user is obtained.
  • the above steps may be performed before step 101, and a character decoration template for embodying the decoration manner of the character may be set by the user.
  • step 103 the three-dimensional character image of the local user and the three-dimensional character image of the opposite user are merged into a pre-constructed virtual stereoscopic scene to obtain a virtual stereoscopic scene that fuses the character image.
  • a plurality of virtual stereoscopic scenes may be set, for example, the set virtual stereoscopic scene may be a conference room scene, a living room scene, or a park scene.
  • Each virtual stereoscopic scene may be composed of a plurality of virtual three-dimensional elements.
  • a conference table and chair is set as a virtual three-dimensional element in a conference room scene, or a sofa, a television, and a coffee table are virtual three-dimensional elements in a living room scene.
  • one of the virtual stereoscopic scenes may be selected by the user as a pre-built virtual stereoscopic scene.
  • the augmented reality technology may be used to fuse the three-dimensional character image of the local user and the three-dimensional character image of the opposite user into the virtual stereoscopic scene.
  • FIG. 1b is a flowchart of the method in step 103 of FIG. 1a provided by an embodiment. As shown in FIG. 1b, the three-dimensional character image of the local user and the three-dimensional character image of the opposite user are merged. In the pre-built virtual stereo scene, step 1031, step 1032, and step 1033 may be included.
  • the local shooting angle data is used to represent the camera shooting angle corresponding to the current video screen of the local user, and the opposite shooting angle data is used to represent the location.
  • local shooting angle data may be input to the corresponding terminal in advance by the local user, and the opposite end user inputs the opposite end shooting angle data to the corresponding terminal.
  • the camera set to capture the local user and the opposite user can be rotated under the control of an external signal, at which time the camera can acquire its own shooting angle.
  • step 1032 the relative orientation relationship between the three-dimensional character image of the local user and the three-dimensional character image of the opposite user in the virtual stereoscopic scene is determined according to the local shooting angle data and the opposite end shooting angle data.
  • the angular position relationship mapping may be performed according to the local shooting angle data and the opposite shooting angle data, thereby determining a relative orientation relationship between the three-dimensional character image of the local user and the three-dimensional character image of the opposite user in the virtual stereoscopic scene.
  • the camera that captures the current video screen of the local user may be recorded as a local camera, and the camera that collects the current video screen of the opposite user is recorded as a peer camera.
  • the three-dimensional character of the local user in the virtual stereo scene may be directly in front of or behind the three-dimensional character of the opposite user.
  • the three-dimensional character of the opposite user in the virtual stereo scene may be in the right front of the three-dimensional character of the local user, and the opposite user
  • the angle of the three-dimensional character image deviating from the local user's three-dimensional character image may be equal to the angle of the opposite camera facing away from the front of the opposite camera.
  • the three-dimensional character of the opposite user in the virtual stereo scene may be in the left front of the three-dimensional character of the local user, and the opposite user
  • the angle of the three-dimensional character image deviating from the local user's three-dimensional character image may be equal to the angle of the opposite camera facing away from the front of the opposite camera.
  • the three-dimensional character of the local user in the virtual stereo scene may be in the right front of the three-dimensional character image of the opposite user, locally.
  • the angle of the user's three-dimensional character image that deviates from the front of the three-dimensional character of the opposite user may be equal to the angle of the local camera that is offset from the front of the local camera.
  • the three-dimensional character of the local user in the virtual stereo scene may be in the left front of the three-dimensional character image of the opposite user, local
  • the angle of the user's three-dimensional character image deviating from the left front of the three-dimensional character of the opposite user may be equal to the angle of the local camera's orientation deviating from the front of the local camera.
  • step 1033 based on the determined relative orientation relationship, the three-dimensional character image of the local user and the three-dimensional character image of the opposite user are merged into a pre-constructed virtual stereoscopic scene by using augmented reality technology.
  • the location area of the three-dimensional character of the local user in the pre-built virtual stereoscopic scene and the location area of the three-dimensional character of the opposite user in the pre-constructed virtual stereoscopic scene may also be preset.
  • the location area is used to indicate a general area of the corresponding three-dimensional character image in the virtual stereoscopic scene, and does not indicate the precise position of the corresponding three-dimensional character image in the virtual stereoscopic scene.
  • step 1033 includes simultaneously arranging the three-dimensional character image of the local user and the three-dimensional character image of the opposite user in a pre-established virtual stereoscopic scene based on the determined relative orientation relationship; Determining the pre-constructed virtual stereoscopic scene by determining a location area of a three-dimensional character of a local user in a pre-built virtual stereoscopic scene and a location area of the three-dimensional character of the opposite user in a pre-built virtual stereoscopic scene a location of one or more virtual stereoscopic elements; in the pre-built virtual stereoscopic scene, generating the one or the corresponding location according to a location of one or more virtual solid elements in the pre-built virtual stereoscopic scene Multiple virtual solid elements.
  • the virtual stereoscopic scene in which the three-dimensional character image of the local user and the three-dimensional character image of the opposite user are initially configured is not a virtual three-dimensional scene constructed, and the three-dimensional character image of the local user and the three-dimensional user of the opposite user may be based on the determined relative orientation relationship.
  • the characters are simultaneously placed in a virtual stereo scene without the remaining virtual three-dimensional elements. Determining the pre-constructed virtual stereoscopic scene according to the first location area in the initial virtual stereoscopic scene and the second location area of the opposite user's three-dimensional character image in the initial virtual stereoscopic scene according to the three-dimensional character image of the local user The location of one or more virtual solid elements.
  • the pre-built virtual stereoscopic scene is a living room scene
  • the first location area of the three-dimensional character image of the local user in the virtual stereoscopic scene, and the second location area of the three-dimensional character image of the opposite user in the virtual stereoscopic scene are both In the sofa area
  • the virtual three-dimensional element corresponding to the sofa can be constructed.
  • the angle of the three-dimensional character image of the local user in the virtual stereoscopic scene and the angle of the three-dimensional character image of the peer user to the user may be preset, and may be based on the determined relative orientation relationship, and the local user in the virtual stereoscopic scene.
  • the angle of the three-dimensional character image presented to the user and the angle of the three-dimensional character image of the opposite user are presented to the user, and the three-dimensional character image of the local user and the three-dimensional character image of the opposite user are simultaneously arranged in the virtual three-dimensional scene.
  • the three-dimensional character image of the local user and the three-dimensional character image of the opposite user can be presented according to a preset angle.
  • the face of the three-dimensional character of the local user in the virtual stereoscopic scene may be set to be presented to the user
  • the face of the three-dimensional character of the local user in the virtual stereoscopic scene may be set to be presented to the user
  • the resulting fusion image is
  • the three-dimensional character image of the local user and the three-dimensional character image of the opposite user can be presented according to a preset angle.
  • step 104 the virtual stereoscopic scene of the fused character image is presented locally.
  • the virtual stereoscopic scene of the fused character may be presented by a local display.
  • step 101 local audio information may also be collected when the current video frame of the local user is collected, where the local audio information may include voice information of the local user.
  • the collected local audio information can be sent to the peer.
  • the peer can also collect the audio information of the peer and send the audio information of the peer to the local.
  • the microphone can be used to collect audio information.
  • the local audio information and the audio information of the opposite end can also be played synchronously.
  • initial data may be set, and the initial data may include one or more of the following: an initial virtual stereoscopic scene, an initial location area of a three-dimensional character of a local user in a virtual stereoscopic scene, and a peer end.
  • the initial position of the user's three-dimensional character image in the virtual stereoscopic scene, the initial angle of the three-dimensional character image of the local user in the virtual stereoscopic scene, and the initial angle of the three-dimensional character image of the opposite user in the virtual stereoscopic scene The initial character decoration method of the local user and the initial character decoration mode of the opposite user.
  • the user can change any one of the initial data in real time, thereby changing the fusion effect between the character image and the virtual stereo scene.
  • the three-dimensional character image of the local user and the three-dimensional character image of the opposite user can be merged with the virtual three-dimensional scene, and the merged instant scene can be presented to the user, and both sides can be obtained.
  • the picture of both sides of the communication can be extracted and the pictures of both parties of the communication can be merged into a customized personalized virtual scene, and the user can simulate the atmosphere of communicating at the same time and at the same time, creating an atmosphere of face-to-face communication at the same time. It can also customize the scene and character decoration, enrich the content and fun of the communication, and improve the user's sensory experience.
  • the technical method in the above embodiments can avoid the visual independence of the visual communication scheme, the poor correlation, the low interactivity, and the lack of image specific features.
  • FIG. 1c is a flowchart of a video communication method according to an embodiment. On the basis of the foregoing embodiment, as shown in FIG. 1c, the method includes the following steps.
  • step 110 the first current video picture of the local user is obtained, and the second current video picture of the peer user is received.
  • step 120 face recognition is performed on the first current video frame and the second current video frame, respectively, and the first three-dimensional character image of the local user and the peer user are constructed based on the face recognition result. Two-dimensional character image.
  • step 130 the first three-dimensional character image and the second three-dimensional character image are merged into the pre-constructed first virtual three-dimensional scene to obtain a second virtual three-dimensional scene of the fused character image, so that the second virtual The stereo scene is rendered locally.
  • the video communication further comprises: transmitting the first current video picture to the opposite end.
  • the face recognition result includes: the identified first face image of the local user and the recognized second face image of the peer user;
  • the constructing the first three-dimensional character image of the local user and the second three-dimensional character image of the peer user based on the face recognition result including:
  • the method before the generating the first three-dimensional character image, the method further includes: determining, according to the first face image, and a size mapping relationship between the face image and the three-dimensional character image The first size of a three-dimensional character image;
  • the method further includes: determining, according to the second face image and the size mapping relationship of the face image to the three-dimensional character image, determining the second three-dimensional character image Second size.
  • the merging the first three-dimensional character image and the second three-dimensional character image into the pre-constructed first virtual three-dimensional scene includes:
  • the local shooting angle data is used to indicate a camera shooting angle corresponding to the first current video screen
  • the opposite shooting angle data is used to represent the second The camera shooting angle corresponding to the current video screen
  • the method before the integrating the first three-dimensional character image and the second three-dimensional character image into the first virtual three-dimensional scene, the method further includes: setting the first three-dimensional character image a first location area in the first virtual stereoscopic scene, and a second location area of the second three-dimensional character image in the first virtual stereoscopic scene;
  • the merging the first three-dimensional character image and the second three-dimensional character image into the first virtual three-dimensional scene based on the relative orientation relationship includes: first, based on the relative orientation relationship Determining, in the first virtual stereoscopic scene, the three-dimensional character image and the second three-dimensional character image; determining one or more of the first virtual stereoscopic scene according to the first location area and the second location area Positions of the virtual solid elements; and generating the one or more virtual solid elements at respective locations in the first virtual stereoscopic scene according to the locations of the one or more virtual solid elements.
  • a video communication device is also proposed for the video communication method provided by the present application.
  • the video communication device includes: a first acquiring module 201, a first identifying module 202, a first building module 203, and a first combining module 204. And a presentation module 205.
  • the first obtaining module 201 is configured to collect the current video picture of the local user, send the collected video picture of the local user to the peer end, and receive the current video picture of the peer user.
  • the identification module 202 is configured to perform face recognition on the current video picture of the local user and the current video picture of the peer user, respectively, to obtain a face recognition result.
  • the building module 203 is configured to construct a three-dimensional character image of the local user and a three-dimensional character image of the opposite user based on the face recognition result.
  • the fusion module 204 is configured to fuse the three-dimensional character image of the local user and the three-dimensional character image of the opposite user into a pre-constructed virtual stereoscopic scene to obtain a virtual stereoscopic scene that fuses the character image.
  • the presentation module 205 is arranged to present the virtual stereoscopic scene of the fused character image locally.
  • the face recognition result includes: the recognized face image of the local user and the recognized face image of the opposite user.
  • the constructing module 203 is configured to perform edge detection on the overall character image of the local user in the current video screen of the local user, to obtain an edge detection result of the local user; and according to the face image of the local user and the edge of the local user Detecting the result, generating a three-dimensional character image of the local user; performing edge detection on the overall character image of the peer user in the current video image of the peer user, and obtaining an edge detection result of the peer user; and, according to the peer user The face image and the edge detection result of the peer user generate a three-dimensional character image of the peer user.
  • the building module 203 may be further configured to determine a local user according to a face image of the local user and a size mapping relationship between the face image and the three-dimensional character image before generating the three-dimensional character image of the local user.
  • the size of the 3D character image and, before generating the three-dimensional character image of the opposite user, determining the size of the three-dimensional character image of the opposite user according to the face image of the opposite user and the size mapping relationship between the face image and the three-dimensional character image.
  • the fusion module 204 is configured to acquire local shooting angle data and peer shooting angle data, and determine a three-dimensional character image of the local user in the virtual stereoscopic scene according to the local shooting angle data and the opposite shooting angle data.
  • the relative orientation relationship of the three-dimensional character image of the peer user; and the three-dimensional character image of the local user and the three-dimensional character image of the peer user are merged into the pre-constructed virtual stereoscopic scene based on the relative orientation relationship.
  • the local shooting angle data is used to indicate a camera shooting angle corresponding to the current video screen of the local user
  • the opposite camera shooting angle data is used to indicate a camera shooting angle corresponding to the current video screen of the peer user.
  • the fusion module 204 is further configured to set a three-dimensional character image of the local user before the three-dimensional character image of the local user and the three-dimensional character image of the opposite user are merged into the pre-built virtual stereoscopic scene.
  • the fusion module 204 is configured to simultaneously arrange the three-dimensional character image of the local user and the three-dimensional character image of the opposite user in the virtual stereoscopic scene based on the relative orientation relationship; Determining a location area in the pre-constructed virtual stereoscopic scene and a location area of the three-dimensional character of the opposite user in the pre-built virtual stereoscopic scene, and determining one or more virtual stereoscopic elements in the pre-constructed virtual stereoscopic scene And, in the virtual stereoscopic scene, generating one or more virtual solid elements according to locations of one or more virtual elements in the virtual stereoscopic scene.
  • the first obtaining module 201 may acquire a current video picture of the local user or a video picture of the peer user from the camera in the terminal, the first acquiring module 201, the first identifying module 202, the first building module 203, and the first A fusion module 204 can be configured by a central processing unit (CPU), a microprocessor (Micro Processor Unit (MPU), a digital signal processor (DSP), or a field programmable gate located in the terminal.
  • a Field Programmable Gate Array FPGA
  • the presentation module 205 can be implemented by a display or the like located in the terminal.
  • FIG. 2b is a schematic structural diagram of a video communication device according to the embodiment.
  • the video communication device includes: a second acquisition module. 210.
  • the second obtaining module 210 is configured to acquire a current video picture of the local user and receive a current video picture of the peer user.
  • the second identification module 220 is configured to perform face recognition on the current video picture of the local user and the current video picture of the peer user, respectively, to obtain a face recognition result.
  • the second building block 230 is configured to construct a three-dimensional character image of the local user and a three-dimensional character image of the opposite user based on the face recognition result.
  • the second fusion module 240 is configured to merge the three-dimensional character image of the local user and the three-dimensional character image of the opposite user into a pre-constructed virtual stereoscopic scene to obtain a virtual stereoscopic scene of the fused character image, so that the fusion The virtual stereo scene of the character is presented locally.
  • the second obtaining module 210, the second identifying module 220, the second building module 230, and the second combining module 240 can all be implemented by a CPU, an MPU, a DSP, or an FPGA located in the terminal.
  • the video communication device further includes: a sending module 250.
  • the sending module 250 is configured to send the first current video picture to the opposite end.
  • the face recognition result includes: the identified first face image of the local user and the recognized second face image of the peer user;
  • the constructing module is configured to perform edge detection on the entire person image of the local user in the first current video screen to obtain a first edge detection result of the local user; according to the first face image And generating, by the first edge detection result, the first three-dimensional character image;
  • the building module is further configured to: before the generating the first three-dimensional character image, according to the first face image and the first face image to the first three-dimensional character image a first size mapping relationship, determining a first size of the first three-dimensional character image; and,
  • the merging module is configured to acquire local shooting angle data and peer shooting angle data, and determine the first virtual stereo scene according to the local shooting angle data and the opposite shooting angle data. Determining a relative orientation relationship between the first three-dimensional character image and the second three-dimensional character image; and, based on the relative orientation relationship, fusing the first three-dimensional character image and the second three-dimensional character image to the first In the virtual stereoscopic scene, the local shooting angle data is used to indicate a camera shooting angle corresponding to the first current video image, and the opposite end shooting angle data is used to represent a camera corresponding to the second current video image. Filming angle.
  • the fusion module is further configured to set the first three-dimensional character image before the first three-dimensional character image and the second three-dimensional character image are merged into the first virtual three-dimensional scene. a first location area in the first virtual stereoscopic scene, and a second location area of the second three-dimensional character image in the first virtual stereoscopic scene; the fusion module is configured to be based on the relative Positioning the first three-dimensional character image and the second three-dimensional character image in the first virtual three-dimensional scene simultaneously; determining the first according to the first location area and the second location area a location of one or more virtual stereoscopic elements in a virtual stereoscopic scene; and generating the one or more virtualities at respective locations in the first virtual stereoscopic scene according to locations of the one or more virtual stereoscopic elements Stereoscopic elements.
  • FIG. 3 is a schematic structural diagram of a video communication device according to the embodiment.
  • the video communication device may be configured.
  • the system includes: a communication module 301 (also called a communication circuit), a data processing module 302, an audio and video collection module 303 (also called an audio and video acquisition circuit), a main control module 304 (also called a main control controller), and an output module 305 (also called The output module), wherein the main control module 304 can be respectively connected to the communication module 301, the data processing module 302, the audio and video collection module 303, and the output module 305, and the data processing module 302 can be respectively connected to the communication module 301, the audio and video collection module 303, and the output.
  • the module 305, the communication module 301 can be connected to the audio and video collection module 303.
  • the main control module 304 can be configured to be responsible for overall business process control and resource allocation, and the main control module can be implemented by a high performance microcontroller.
  • the data processing module 302 can be configured to receive control information sent by the main control module, perform data processing according to the control information, and can also be configured to receive information from the audio and video collection module and the communication module.
  • the data processing module 302 can perform face recognition detection based on the collected and received information, extract the character image, and can merge the three-dimensional character image with the virtual stereo scene by using the augmented reality technology.
  • the data processing module can be implemented by a high performance processor.
  • the audio and video collection module 303 can be configured to collect video images and local audio information of the local user, and send the collected video images and local audio information of the local user to the data processing module 302 and the communication module 301.
  • the audio and video acquisition module 303 can be implemented with at least one camera and at least one microphone. In one embodiment, the audio and video collection module 303 can provide video information of different angular orientations using a plurality of cameras.
  • the communication module 301 can be configured to receive the control information sent by the main control module 304, and decode the received information from the opposite end according to the received control information, and then send the information to the local data processing module 302.
  • the communication module 301 can also be configured to encode the information from the audio and video collection module 303 according to the received control information, and send the encoded information to the communication module 301 of the opposite end.
  • the data processing module 302 can also be configured to synchronously output the virtual stereoscopic scene of the three-dimensional character image, the local audio information, and the audio information of the opposite end to the output module 305.
  • the output module 305 can be configured to receive the control information sent by the main control module 304. According to the received control information, the virtual stereoscopic scene of the three-dimensional character image, the voice information of the local user, and the voice information of the peer user can be synchronously presented to the user.
  • the output module 305 can be implemented by a display and a speaker.
  • the above embodiments may be provided as a method, system, or computer program product. Therefore, the above embodiments may be implemented in the form of hardware, software, or a combination of software and hardware.
  • the above embodiments may take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage and optical storage, etc.) containing computer usable program code, which is stored in a
  • the storage medium includes one or more instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the method of the above-described embodiments.
  • the foregoing storage medium may be a non-transitory storage medium, including: a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
  • the terminal includes:
  • At least one processor 40 which is exemplified by a processor 40 in FIG. 4; and a memory 41, may further include an image acquisition device 42, a display 43, a communication interface 44, and a bus 45.
  • the processor 40, the memory 41, the image acquisition device 42, the display 43, and the communication interface 44 can complete communication with each other through the bus 45.
  • the image capture device 42 is arranged to capture the current video footage of the local user.
  • the display 43 is arranged to display a virtual stereoscopic scene of the fused character.
  • Communication interface 44 can be used for information transfer.
  • Processor 40 may invoke logic instructions in memory 41 to perform the method of the embodiment of Figure 1c.
  • logic instructions in the memory 41 described above may be implemented in the form of a software functional unit and sold or used as a stand-alone product, and may be stored in a computer readable storage medium.
  • the memory 41 is used as a computer readable storage medium, and can be used to store a software program, a computer executable program, a program instruction or a module corresponding to the method in the embodiment of FIG. 1c.
  • the processor 40 executes the functional application and data processing by executing software programs, instructions or modules stored in the memory 41, i.e., implementing the method of the embodiment of Figure 1c.
  • the memory 41 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the terminal device, and the like. Further, the memory 41 may include a high speed random access memory, and may also include a nonvolatile memory.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide for performing at least one of the following: a step of a function specified in a flow or a plurality of flows in a flowchart, and a function specified in a block or a plurality of blocks in the block diagram.
  • the video communication method, device and terminal can solve the problem of single communication effect, poor overall relevance and low interaction in the visual communication technology.

Abstract

一种视频通信方法包括:获取本地用户的第一当前视频画面,并接收对端用户的第二当前视频画面;分别对所述第一当前视频画面和所述第二当前视频画面进行人脸识别,基于人脸识别结果构建所述本地用户的第一三维人物形象和所述对端用户的第二三维人物形象;以及,将所述第一三维人物形象和所述第二三维人物形象融合至预先构建的第一虚拟立体场景中,得到融合人物形象的第二虚拟立体场景,以使所述第二虚拟立体场景呈现在本地。

Description

视频通信方法、设备和终端 技术领域
本公开涉及可视通信领域,例如涉及一种视频通信方法、设备和终端。
背景技术
随着互联网时代通讯技术的发展,人们可以通过网络更加方便快捷的进行即时通讯,可视通讯在语音通讯模式上增加了视频画面,使通讯过程更加形象具体,增加了传递的信息量,满足了人们的感官需求。可视通讯将是今后主流的通讯模式之一。
发明内容
以下实施例提供一种视频通信方法、设备和终端,能够解决可视通信技术中出现的通讯效果单一、整体关联性差和互动性低的问题。
一种视频通信方法,包括:
获取本地用户的第一当前视频画面,并接收对端用户的第二当前视频画面;
分别对所述第一当前视频画面和所述第二当前视频画面进行人脸识别,基于人脸识别结果构建所述本地用户的第一三维人物形象和所述对端用户的第二三维人物形象;以及,
将所述第一三维人物形象和所述第二三维人物形象融合至预先构建的第一虚拟立体场景中,得到融合人物形象的第二虚拟立体场景,以使所述第二虚拟立体场景呈现在本地。
一实施例中,所述的方法还包括:将所述第一当前视频画面发送至对端。
一实施例中,所述人脸识别结果包括:识别出的所述本地用户的第一人脸图像和识别出的所述对端用户的第二人脸图像;
所述基于人脸识别结果构建所述本地用户的第一三维人物形象和所述对端用户的第二三维人物形象,包括:
在所述第一当前视频画面中,对所述本地用户的整体人物图像进行边缘检测,得到所述本地用户的第一边缘检测结果;根据所述第一人脸图像和所述第一边缘检测结果,生成所述第一三维人物形象;
在所述第二当前视频画面中,对所述对端用户的整体人物图像进行边缘检测,得到所述对端用户的第二边缘检测结果;以及,根据所述第二人脸图像和所述第二边缘检测结果,生成所述第二三维人物形象。
一实施例中,在所述生成所述第一三维人物形象之前,所述方法还包括:根据所述第一人脸图像、以及所述第一人脸图像到所述第一三维人物形象的第一尺寸映射关系,确定所述第一三维人物形象的第一尺寸;以及
在所述生成所述第二三维人物形象之前,所述方法还包括:根据所述第二人脸图像、以及所述第二人脸图像到所述第二三维人物形象的第二尺寸映射关系,确定所述第二三维人物形象的第二尺寸。
一实施例中,所述将所述第一三维人物形象和所述第二三维人物形象融合至预先构建的第一虚拟立体场景中,包括:
获取本地拍摄角度数据和对端拍摄角度数据,其中,所述本地拍摄角度数据用于表示所述第一当前视频画面对应的摄像头拍摄角度,所述对端拍摄角度数据用于表示所述第二当前视频画面对应的摄像头拍摄角度;
根据所述本地拍摄角度数据和所述对端拍摄角度数据,确定所述第一虚拟立体场景中所述第一三维人物形象与所述第二三维人物形象的相对方位关系;以及,
基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中。
一实施例中,在所述将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中之前,所述方法还包括:设置所述第一三维人物形象在所述第一虚拟立体场景中的第一位置区域,以及所述第二三维人物形象在所述第一虚拟立体场景中的第二位置区域;其中,
所述基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中,包括:基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象同时布置在所述第一虚拟立体场 景中;根据所述第一位置区域以及所述第二位置区域,确定所述第一虚拟立体场景中一个或多个虚拟立体元素的位置;以及,根据所述一个或多个虚拟立体元素的位置,在所述第一虚拟立体场景中的相应位置生成所述一个或多个虚拟立体元素。
一种视频通信设备,包括:获取模块、识别模块、构建模块以及融合模块;其中,
获取模块,设置为获取本地用户的第一当前视频画面,,并接收对端用户的第二当前视频画面;
识别模块,设置为分别对所述第一当前视频画面和所述第二当前视频画面进行人脸识别,得出人脸识别结果;
构建模块,设置为基于所述人脸识别结果构建所述本地用户的第一三维人物形象和所述对端用户的第二三维人物形象;以及
融合模块,设置为将所述第一三维人物形象和所述第二三维人物形象融合至预先构建的第一虚拟立体场景中,得到融合人物形象的第二虚拟立体场景,以使所述第二虚拟立体场景呈现在本地。
一实施例中,所述的设备,还包括:
发送模块,设置为将所述第一当前视频画面发送至对端。
一实施例中,所述人脸识别结果包括:识别出的所述本地用户的第一人脸图像和识别出的所述对端用户的第二人脸图像;
所述构建模块,设置为在所述第一当前视频画面中,对所述本地用户的整体人物图像进行边缘检测,得到所述本地用户的第一边缘检测结果;根据所述第一人脸图像和所述第一边缘检测结果,生成所述第一三维人物形象;
在所述第二当前视频画面中,对所述对端用户的整体人物图像进行边缘检测,得到所述对端用户的第二边缘检测结果;以及,根据所述第一人脸图像和所述第一边缘检测结果,生成所述第一三维人物形象。
一实施例中,所述构建模块,还设置为在生成所述第一三维人物形象之前,根据所述第一人脸图像、以及所述第一人脸图像到所述第一三维人物形象的第一尺寸映射关系,确定所述第一三维人物形象的第一尺寸;以及,
在生成所述第二三维人物形象之前,根据所述第二人脸图像、以及所述第二人脸图像到所述第二三维人物形象的第二尺寸映射关系,确定所述第二三维人物形象的第二尺寸。
一实施例中,所述融合模块设置为获取本地拍摄角度数据和对端拍摄角度数据,根据所述本地拍摄角度数据和所述对端拍摄角度数据,确定所述第一虚拟立体场景中所述第一三维人物形象与所述第二三维人物形象的相对方位关系;以及,基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中;其中,所述本地拍摄角度数据用于表示所述第一当前视频画面对应的摄像头拍摄角度,以及所述对端拍摄角度数据用于表示所述第二当前视频画面对应的摄像头拍摄角度。
一实施例中,所述融合模块,还设置为在将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中之前,设置所述第一三维人物形象在所述的第一虚拟立体场景中的第一位置区域,以及所述第二三维人物形象在所述第一虚拟立体场景中的第二位置区域;所述融合模块,设置为基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象同时布置在所述第一虚拟立体场景中;根据所述第一位置区域以及所述第二位置区域,确定所述第一虚拟立体场景中一个或多个虚拟立体元素的位置;以及,根据所述一个或多个虚拟立体元素的位置,在所述第一虚拟立体场景中的相应位置生成所述一个或多个虚拟立体元素。
一种终端,所述终端包括权利要求上述任一项所述的设备。
一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于上述任一项视频通信的方法。
一种终端,包括上述任一种视频通信设备。
一种终端,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器执行上述的方法。
附图说明
图1a为一实施例提供的视频通信方法的流程图;
图1b为一实施例提供的图1a中步骤103的方法流程图;
图1c为另一实施例提供的视频通信方法的流程图;
图2a为一实施例提供的视频通信设备的组成结构示意图;
图2b为另一实施例提供的视频通信设备的组成结构示意图;
图3为另一实施例提供的视频通信设备的组成结构示意图;以及
图4为一实施例提供的终端的硬件结构示意图。
具体实施方式
相关技术中的可视通信只能单方面采集通讯方的视频画面,并传输给通讯的另一方,在本地端只能将本地采集的己方画面和传输来的对方画面进行分别展示;由于两个画面来源不同,内容不同,相对独立,会导致通讯效果单一,整体关联性差,互动性低,无法形成较为形象的交流沟通氛围,导致用户体验降低。一实施例提供了一种视频通信方法、设备和终端,能够实现本地用户和对端用户的可视通信,本地和对端可以是视频通信的双方,本地用户和对端用户均可以使用具有通信功能的终端实现可视通信。终端可以是移动终端,也可以是固定终端。在本地用户和对端用户使用的终端上,均可以设置有摄像头,摄像头用于实时采集用户的图像。
基于上述记载的本地用户的终端、对端用户的终端和摄像头,提出以下实施例。
一实施例提供了一种视频通信方法,图1a为本实施例提供的视频通信方法的流程图,如图1a所示,该流程包括以下步骤。
步骤101中,采集本地用户的当前视频画面,将采集的本地用户的视频画面发送至对端,以及接收对端用户的当前视频画面。
本地和对端均可以采用摄像头采集用户的视频画面;。在对端采集到对端用户的当前视频画面后,可以将对端用户的当前视频画面实时发送至本地,本地可以接收对端用户的当前视频画面。一实施例中,对端在采集到对端用户的当前视频画面后,可以将对端用户的当前视频画面进行视频编码,将视频编码后 的视频数据发送至本地,本地用户接收到视频数据后,对接收的视频数据进行解码,得到对端用户的当前视频画面。
在本地,可以获取本地用户的当前视频画面,接收对端用户的当前视频画面,可以对本地用户的当前视频画面和对端用户的当前视频画面进行处理。
步骤102中,分别对所述本地用户的当前视频画面和所述对端用户的当前视频画面进行人脸识别,基于人脸识别结果构建所述本地用户的三维人物形象和所述对端用户的三维人物形象。
一实施例中,在人脸识别的过程中,还可以定位人脸在对应图像中的位置,可以利用人脸识别检测程序对本地用户的当前视频画面和所述对端用户的当前视频画面同步提取人脸图像,下面示例性地说明人脸识别过程。
人脸识别过程可以包括:
采用统计原理,预先获取多个“人面部”和“非人面部”的图像,建立样本库,训练出用于区分“人面部”和“非人面部”的分类器。
对待检测图像进行一定比例的缩放,对缩放后的图像中的所有区域均利用上述分类器进行检测,判断当前检测的区域为人脸(人面部)对应的区域还是非人脸(非人面部)对应的区域。
根据判断结果,确定人脸的位置和大小。
人脸识别结果可以包括:本地用户的人脸图像和对端用户的人脸图像。
一实施例中,在本地用户的当前视频画面中,可以对所述本地用户的整体人物图像进行边缘检测,得到所述本地用户的边缘检测结果。根据所述本地用户的人脸图像和所述本地用户的边缘检测结果,生成所述本地用户的三维人物形象;以及在所述对端用户的当前视频画面中,对所述对端用户的整体人物图像进行边缘检测,得到所述对端用户的边缘检测结果,根据所述对端用户的人脸图像和所述对端用户的边缘检测结果,生成所述对端用户的三维人物形象。
一实施例中,在生成本地用户的三维人物形象时,可以在本地用户的当前视频画面确定本地用户的整体人物形象区域。一实施例中,在确定本地用户的整体人物形象区域后,可以对本地用户的当前视频画面中除本地用户的整体人物形象区域外的区域进行透明化处理,便于实现后期融合。
一实施例中,在生成对端用户的三维人物形象时,可以在对端用户的当前 视频画面确定对端用户的整体人物形象区域。在确定对端用户的整体人物形象区域后,可以对对端用户的当前视频画面中除对端用户的整体人物形象区域外的区域进行透明化处理,便于实现后期融合。
在一个实施例中,在确定本地用户的整体人物形象区域和对端用户的整体人物形象区域后,还可以缩放以下至少之一:本地用户的当前视频画面,和对端用户的当前视频画面,使本地用户的当前视频画面与对端用户的当前视频画面的尺寸统一。
在一个实施例中,在生成本地用户的整体人物形象区域后,可以根据本地用户的人脸图像、以及预先设置的三维人物形象的模板,生成本地用户的三维人物形象。在生成对端用户的整体人物形象区域后,可以根据对端用户的人脸图像、以及预先设置的三维人物形象的模板,生成对端用户的三维人物形象。
在一个实施例中,在生成本地用户的三维人物形象之前,还可以根据本地用户的人脸图像、以及人脸图像到三维人物形象的尺寸映射关系,确定本地用户的三维人物形象的尺寸。其中,人脸图像到三维人物形象的尺寸映射关系可以是人脸图像的尺寸与虚拟立体场景中三维人物形象的人脸之间的尺寸转换关系。
在一个实施例中,在生成对端用户的三维人物形象之前,还可以根据对端用户的人脸图像、以及人脸图像到三维人物形象的尺寸映射关系,确定对端用户的三维人物形象的尺寸。人脸图像到三维人物形象的尺寸映射关系,是将人脸图像转换到特定场景中三维人物形象
一实施例中,在生成本地用户或对端用户的三维人物形象时,可以在生成本地用户或对端用户的初始三维人物形象后,采用增强现实技术对本地用户或对端用户的初始三维人物形象按照预先设置的人物装饰方式进行装饰,得到本地用户或对端用户的三维人物形象。上述步骤可以在步骤101前执行,可以由用户设置用于体现人物装饰方式的人物装饰模板。
步骤103中,将所述本地用户的三维人物形象和所述对端用户的三维人物形象融合至预先构建的虚拟立体场景中,得到融合人物形象的虚拟立体场景。
可以设置多个虚拟立体场景,例如设置的虚拟立体场景可以是会议室场景、客厅场景或公园场景。每个虚拟立体场景可以由多个虚拟立体元素组成,例如,会议室场景中设置会议桌椅为虚拟立体元素,或者客厅场景中设置沙发、电视 以及茶几为虚拟立体元素。在设置多个虚拟立体场景后,可以由用户选择其中一个虚拟立体场景作为预先构建的虚拟立体场景。
一实施例中,可以采用增强现实技术,将所述本地用户的三维人物形象和对端用户的三维人物形象融合至虚拟立体场景中。
在上述实施例的基础上,图1b为一实施例提供的图1a中步骤103的方法流程图,如图1b所示,将所述本地用户的三维人物形象和对端用户的三维人物形象融合至预先构建的虚拟立体场景中,可以包括步骤1031、步骤1032和步骤1033。
步骤1031中,获取本地拍摄角度数据和对端拍摄角度数据,所述本地拍摄角度数据用于表示所述本地用户的当前视频画面对应的摄像头拍摄角度,所述对端拍摄角度数据用于表示所述对端用户的当前视频画面对应的摄像头拍摄角度。
在一个实施例中,可以由本地用户预先向对应终端输入本地拍摄角度数据,由对端用户向对应终端输入对端拍摄角度数据。在一个实施例中,设置为拍摄本地用户和对端用户的摄像头可以在外部信号控制下转动,此时,摄像头能够获取自身的拍摄角度。
步骤1032中,根据所述本地拍摄角度数据和所述对端拍摄角度数据,确定虚拟立体场景中所述本地用户的三维人物形象与所述对端用户的三维人物形象的相对方位关系。
可以根据本地拍摄角度数据和对端拍摄角度数据进行角度位置关系映射,从而确定虚拟立体场景中所述本地用户的三维人物形象与对端用户的三维人物形象的相对方位关系。
示例性地,可以将采集本地用户的当前视频画面的摄像头记为本地摄像头,将采集对端用户的当前视频画面的摄像头记为对端摄像头。
当本地摄像头朝向正前方,且对端摄像头朝向正前方时,虚拟立体场景中本地用户的三维人物形象可以处在对端用户的三维人物形象的正前方或正后方。
一实施例中,当本地摄像头朝向正前方,且对端摄像头朝向自身的右前方时,虚拟立体场景中对端用户的三维人物形象可以处在本地用户的三维人物形 象的右前方,对端用户的三维人物形象偏离本地用户的三维人物形象的正前方的角度可以等于对端摄像头的朝向偏离对端摄像头正前方的角度。
一实施例中,当本地摄像头朝向正前方,且对端摄像头朝向自身的左前方时,虚拟立体场景中对端用户的三维人物形象可以处在本地用户的三维人物形象的左前方,对端用户的三维人物形象偏离本地用户的三维人物形象的正前方的角度可以等于对端摄像头的朝向偏离对端摄像头正前方的角度。
一实施例中,当本地摄像头朝向自身的右前方,且对端摄像头朝向自身的正前方时,虚拟立体场景中本地用户的三维人物形象可以处在对端用户的三维人物形象的右前方,本地用户的三维人物形象偏离对端用户的三维人物形象的正前方的角度可以等于本地摄像头的朝向偏离本地摄像头正前方的角度。
一实施例中,当本地摄像头朝向自身的左前方,且对端摄像头朝向自身的正前方时,虚拟立体场景中本地用户的三维人物形象可以处在对端用户的三维人物形象的左前方,本地用户的三维人物形象偏离对端用户的三维人物形象的左前方的角度可以等于本地摄像头的朝向偏离本地摄像头正前方的角度。
步骤1033中,基于所确定的所述相对方位关系,采用增强现实技术将所述本地用户的三维人物形象和所述对端用户的三维人物形象融合至预先构建的虚拟立体场景中。
一实施例中,还可以预先设置本地用户的三维人物形象在预先构建的虚拟立体场景中的位置区域、以及对端用户的三维人物形象在预先构建的虚拟立体场景中的位置区域。其中,位置区域用于表示对应的三维人物形象在虚拟立体场景中的一个大致的区域,并非表示对应的三维人物形象在虚拟立体场景中的精确位置。
一实施例中,步骤1033包括基于所确定的所述相对方位关系,将所述本地用户的三维人物形象和所述对端用户的三维人物形象同时布置在预先建立的虚拟立体场景中;根据所述本地用户的三维人物形象在预先构建的虚拟立体场景中的位置区域、以及所述对端用户的三维人物形象在预先构建的虚拟立体场景中的位置区域,确定所述预先构建的虚拟立体场景中一个或多个虚拟立体元素的位置;在所述预先构建的虚拟立体场景中,根据所述预先构建的虚拟立体场景中一个或多个虚拟立体元素的位置,在相应位置生成所述一个或多个虚拟立体元素。
布置本地用户的三维人物形象和对端用户的三维人物形象的虚拟立体场景初始时并非构建的虚拟立体场景,可以基于所确定的相对方位关系,将本地用户的三维人物形象和对端用户的三维人物形象同时布置在一个没有其余虚拟立体元素的虚拟立体场景中。可以根据本地用户的三维人物形象在初始虚拟立体场景中的第一位置区域、以及对端用户的三维人物形象在初始虚拟立体场景中的第二位置区域,确定所述预先构建的虚拟立体场景中一个或多个虚拟立体元素的位置。
例如,当预先构建的虚拟立体场景为客厅场景,本地用户的三维人物形象在虚拟立体场景中的第一位置区域、以及对端用户的三维人物形象在虚拟立体场景中的第二位置区域均为沙发区域时,此时,可以构建沙发对应的虚拟立体元素。
还可以预先设置虚拟立体场景中本地用户的三维人物形象向用户呈现的角度、以及对端用户的三维人物形象向用户呈现的角度,可以基于所确定的相对方位关系、虚拟立体场景中本地用户的三维人物形象向用户呈现的角度、以及对端用户的三维人物形象向用户呈现的角度,将本地用户的三维人物形象和对端用户的三维人物形象同时布置在虚拟立体场景中。
在上述步骤完成后,融合人物形象的虚拟立体场景中,本地用户的三维人物形象和对端用户的三维人物形象均可以按照预先设置的角度进行呈现。例如,虚拟立体场景中本地用户的三维人物形象的面部可以设置为背向用户呈现,虚拟立体场景中本地用户的三维人物形象的面部可以设置为面向用户呈现,则在得出的融合人物形象的虚拟立体场景中,本地用户的三维人物形象和对端用户的三维人物形象均可以按照预先设置的角度进行呈现。
在将所述本地用户的三维人物形象和对端用户的三维人物形象融合至预先构建的虚拟立体场景中时,可以将三维人物形象和虚拟立体场景融合成一个完整的空间画面。步骤104中,将所述融合人物形象的虚拟立体场景在本地呈现。
可以由本地的显示器呈现所述融合人物形象的虚拟立体场景。
步骤101中,在采集本地用户的当前视频画面的时,还可以采集本地的音频信息,其中,本地的音频信息可以包括本地用户的语音信息。可以将采集的本地的音频信息发送至对端。对端也可以采集对端的音频信息,并将对端的音频信息发送至本地。可以利用麦克风采集音频信息。
在将所述融合人物形象的虚拟立体场景在本地呈现的时,还可以同步播放本地的音频信息以及对端的音频信息。
在一个实施例中,在步骤101之前,可以设置初始数据,初始数据可以包括以下一项或多项:初始虚拟立体场景、本地用户的三维人物形象在虚拟立体场景中的初始位置区域、对端用户的三维人物形象在虚拟立体场景中的初始位置区域、虚拟立体场景中本地用户的三维人物形象向用户呈现的初始角度、虚拟立体场景中对端用户的三维人物形象向用户呈现的初始角度、本地用户的初始人物装饰方式、以及对端用户的初始人物装饰方式。
在设置初始数据后,在进行视频通信的过程中,用户可以实时更改初始数据中的任意一项,从而改变人物形象与虚拟立体场景的融合效果。
应用上述实施例中的视频通信方法,可以将本地用户的三维人物形象及对端用户的三维人物形象与虚拟立体场景进行融合,并可以将融合后的即时场景呈现给用户,可以获得双方同处一地进行交流的场景。可以提取通信双方的画面并将通信双方的画面融合到一个自定义的个性化虚拟场景中,给用户模拟同时同地交流沟通的氛围,营造一种面对面同时同地交流的氛围。还可以对场景和人物装饰进行个性化定制,丰富了通讯的内容和趣味性,提高了用户的感官体验。
上述实施例中的技术方法能够避免了可视通讯方案双方画面独立,关联性差,互动性低,以及不够形象具体的特点。
图1c为一实施例提供的视频通信方法的流程图,在上述实施例的基础上,如图1c所示,该方法包括以下步骤。
步骤110中,获取本地用户的第一当前视频画面,并接收对端用户的第二当前视频画面。
步骤120中,分别对所述第一当前视频画面和所述第二当前视频画面进行人脸识别,基于人脸识别结果构建所述本地用户的第一三维人物形象和所述对端用户的第二三维人物形象。
步骤130中,将所述第一三维人物形象和所述第二三维人物形象融合至预先构建的第一虚拟立体场景中,得到融合人物形象的第二虚拟立体场景,以使所述第二虚拟立体场景呈现在本地。
一实施例中,视频通信还包括:将所述第一当前视频画面发送至对端。
一实施例中,所述人脸识别结果包括:识别出的所述本地用户的第一人脸图像和识别出的所述对端用户的第二人脸图像;
所述基于人脸识别结果构建所述本地用户的第一三维人物形象和所述对端用户的第二三维人物形象,包括:
在所述第一当前视频画面中,对所述本地用户的整体人物图像进行边缘检测,得到所述本地用户的第一边缘检测结果;根据所述第一人脸图像和所述第一边缘检测结果,生成所述第一三维人物形象;
在所述第二当前视频画面中,对所述对端用户的整体人物图像进行边缘检测,得到所述对端用户的第二边缘检测结果;以及,根据所述第二人脸图像和所述第二边缘检测结果,生成所述第二三维人物形象。
一实施例中,在所述生成所述第一三维人物形象之前,所述方法还包括:根据所述第一人脸图像、以及人脸图像到三维人物形象的尺寸映射关系,确定所述第一三维人物形象的第一尺寸;以及
在所述生成所述第二三维人物形象之前,所述方法还包括:根据所述第二人脸图像、以及人脸图像到三维人物形象的尺寸映射关系,确定所述第二三维人物形象的第二尺寸。
一实施例中,其中,所述将所述第一三维人物形象和所述第二三维人物形象融合至预先构建的第一虚拟立体场景中,包括:
获取本地拍摄角度数据和对端拍摄角度数据,其中,所述本地拍摄角度数据用于表示所述第一当前视频画面对应的摄像头拍摄角度,所述对端拍摄角度数据用于表示所述第二当前视频画面对应的摄像头拍摄角度;
根据所述本地拍摄角度数据和所述对端拍摄角度数据,确定所述第一虚拟立体场景中所述第一三维人物形象与所述第二三维人物形象的相对方位关系;以及,
基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中。
一实施例中,在所述将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中之前,所述方法还包括:设置所述第一三维人物 形象在所述第一虚拟立体场景中的第一位置区域,以及所述第二三维人物形象在所述第一虚拟立体场景中的第二位置区域;其中,
所述基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中,包括:基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象同时布置在所述第一虚拟立体场景中;根据所述第一位置区域以及所述第二位置区域,确定所述第一虚拟立体场景中一个或多个虚拟立体元素的位置;以及,根据所述一个或多个虚拟立体元素的位置,在所述第一虚拟立体场景中的相应位置生成所述一个或多个虚拟立体元素。
针对本申请提供的视频通信方法,还提出一种视频通信设备。
图2a为本实施例的视频通信设备的组成结构示意图,如图2a所示,该视频通信设备包括:第一获取模块201、第一识别模块202、第一构建模块203、第一融合模块204和呈现模块205。
第一获取模块201设置为采集本地用户的当前视频画面,将采集的本地用户的视频画面发送至对端,并接收对端用户的当前视频画面。
识别模块202设置为分别对所述本地用户的当前视频画面和所述对端用户的当前视频画面进行人脸识别,得出人脸识别结果。
构建模块203设置为基于所述人脸识别结果构建所述本地用户的三维人物形象和所述对端用户的三维人物形象。
融合模块204设置为将所述本地用户的三维人物形象和所述对端用户的三维人物形象融合至预先构建的虚拟立体场景中,得到融合人物形象的虚拟立体场景。
呈现模块205设置为将所述融合人物形象的虚拟立体场景在本地呈现。
一实施例中,所述人脸识别结果包括:识别出的本地用户的人脸图像和识别出的对端用户的人脸图像。
所述构建模块203,设置为在本地用户的当前视频画面中,对本地用户的整体人物图像进行边缘检测,得到本地用户的边缘检测结果;根据本地用户的人脸图像和所述本地用户的边缘检测结果,生成本地用户的三维人物形象;在对端用户的当前视频画面中,对对端用户的整体人物图像进行边缘检测,得到对 端用户的边缘检测结果;以及,根据对端用户的人脸图像和所述对端用户的边缘检测结果,生成对端用户的三维人物形象。
一实施例中,所述构建模块203,还可以设置为在生成本地用户的三维人物形象之前,根据本地用户的人脸图像、以及人脸图像到三维人物形象的尺寸映射关系,确定本地用户的三维人物形象的尺寸。以及,在生成对端用户的三维人物形象之前,根据对端用户的人脸图像、以及人脸图像到三维人物形象的尺寸映射关系,确定对端用户的三维人物形象的尺寸。
一实施例中,所述融合模块204设置为获取本地拍摄角度数据和对端拍摄角度数据,根据本地拍摄角度数据和对端拍摄角度数据,确定虚拟立体场景中所述本地用户的三维人物形象与对端用户的三维人物形象的相对方位关系;以及基于所述相对方位关系,将所述本地用户的三维人物形象和对端用户的三维人物形象融合至预先构建的虚拟立体场景中。其中,所述本地拍摄角度数据用于表示本地用户的当前视频画面对应的摄像头拍摄角度,所述对端拍摄角度数据用于表示对端用户的当前视频画面对应的摄像头拍摄角度。
一实施例中,所述融合模块204,还设置为在将所述本地用户的三维人物形象和对端用户的三维人物形象融合至预先构建的虚拟立体场景中之前,设置本地用户的三维人物形象在预先构建的虚拟立体场景中的位置区域、以及对端用户的三维人物形象在预先构建的虚拟立体场景中的位置区域。
一实施例中,所述融合模块204设置为基于所述相对方位关系,将所述本地用户的三维人物形象和对端用户的三维人物形象同时布置在虚拟立体场景中;根据本地用户的三维人物形象在预先构建的虚拟立体场景中的位置区域、以及对端用户的三维人物形象在预先构建的虚拟立体场景中的位置区域,确定所述预先构建的虚拟立体场景中一个或多个虚拟立体元素的位置;以及,在所述虚拟立体场景中,根据所述虚拟立体场景中一个或多个虚拟元素的位置,生成一个或多个虚拟立体元素。
所述第一获取模块201可以从终端中的摄像头获取本地用户的的当前视频画面或对端用户的视频画面,所述第一获取模块201、第一识别模块202、第一构建模块203和第一融合模块204均可以由位于终端中的中央处理器(Central Processing Unit,CPU)、微处理器(Micro Processor Unit,MPU)、数字信号处理器(Digital Signal Processor,DSP)、或现场可编程门阵列(Field Programmable  Gate Array,FPGA)实现,所述呈现模块205可由位于终端中的显示器等实现。
一实施例提供了一种视频通信设备,图2b为本实施例的视频通信设备的组成结构示意图,在上述实施例的基础上,如图2b所示,该视频通信设备包括:第二获取模块210、第二识别模块220、第二构建模块230以及第二融合模块240。
第二获取模块210设置为获取本地用户的当前视频画面,并接收对端用户的当前视频画面。
第二识别模块220设置为分别对所述本地用户的当前视频画面和所述对端用户的当前视频画面进行人脸识别,得出人脸识别结果。
第二构建模块230设置为基于所述人脸识别结果构建所述本地用户的三维人物形象和所述对端用户的三维人物形象。
第二融合模块240设置为将所述本地用户的三维人物形象和所述对端用户的三维人物形象融合至预先构建的虚拟立体场景中,得到融合人物形象的虚拟立体场景,以使所述融合人物形象的虚拟立体场景呈现在本地。
所述第二获取模块210,第二识别模块220、第二构建模块230和第二融合模块240均可以由位于终端中的CPU、MPU、DSP或FPGA实现。
一实施例中,视频通信设备还包括:发送模块250。发送模块250设置为将所述第一当前视频画面发送至对端。
一实施例中,所述人脸识别结果包括:识别出的所述本地用户的第一人脸图像和识别出的所述对端用户的第二人脸图像;
所述构建模块,设置为在所述第一当前视频画面中,对所述本地用户的整体人物图像进行边缘检测,得到所述本地用户的第一边缘检测结果;根据所述第一人脸图像和所述第一边缘检测结果,生成所述第一三维人物形象;
在所述第二当前视频画面中,对所述对端用户的整体人物图像进行边缘检测,得到所述对端用户的第二边缘检测结果;以及,根据所述第一人脸图像和所述第一边缘检测结果,生成所述第一三维人物形象。
一实施例中,所述构建模块,还设置为在生成所述第一三维人物形象之前,根据所述第一人脸图像、以及所述第一人脸图像到所述第一三维人物形象的第一尺寸映射关系,确定所述第一三维人物形象的第一尺寸;以及,
在生成所述第二三维人物形象之前,根据所述第二人脸图像、以及所述第二人脸图像到所述第二三维人物形象的第二尺寸映射关系,确定所述第二三维人物形象的第二尺寸。
一实施例中,所述融合模块,设置为获取本地拍摄角度数据和对端拍摄角度数据,根据所述本地拍摄角度数据和所述对端拍摄角度数据,确定所述第一虚拟立体场景中所述第一三维人物形象与所述第二三维人物形象的相对方位关系;以及,基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中;其中,所述本地拍摄角度数据用于表示所述第一当前视频画面对应的摄像头拍摄角度,以及所述对端拍摄角度数据用于表示所述第二当前视频画面对应的摄像头拍摄角度。
一实施例中,所述融合模块,还设置为在将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中之前,设置所述第一三维人物形象在所述的第一虚拟立体场景中的第一位置区域,以及所述第二三维人物形象在所述第一虚拟立体场景中的第二位置区域;所述融合模块,设置为基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象同时布置在所述第一虚拟立体场景中;根据所述第一位置区域以及所述第二位置区域,确定所述第一虚拟立体场景中一个或多个虚拟立体元素的位置;以及,根据所述一个或多个虚拟立体元素的位置,在所述第一虚拟立体场景中的相应位置生成所述一个或多个虚拟立体元素。
基于上述实施例提供的视频通信方法,一实施例提出了一种视频通信设备,图3为本实施例提供的一种视频通信设备的组成结构示意图,如图3所示,该视频通信设备可以包括:通信模块301(也称通信电路)、数据处理模块302、音视频采集模块303(也称音视频采集电路)、主控模块304(也称主控控制器)和输出模块305(也称输出电路),其中,主控模块304可以分别连接通信模块301、数据处理模块302、音视频采集模块303和输出模块305,数据处理模块302可以分别连接通信模块301、音视频采集模块303和输出模块305,通信模块301可以连接音视频采集模块303。
主控模块304可以设置为负责整体业务流程控制和资源分配,主控模块可以采用高性能微控制器实现。
数据处理模块302可以设置为接收主控模块发送的控制信息,并根据控制 信息进行数据处理,还可以设置为接收来自音视频采集模块和通信模块的信息。数据处理模块302可以基于采集和接收的信息,进行人脸识别检测,提取人物图像,并可以利用增强现实技术将三维人物形象与虚拟立体场景进行融合。数据处理模块可以由高性能处理器实现。
音视频采集模块303可以设置为采集本地用户的视频画面和本地音频信息,将采集到的本地用户的视频画面和本地音频信息发送至数据处理模块302和通信模块301。音视频采集模块303可以利用至少一个摄像头和至少一个麦克风实现。一实施例中,音视频采集模块303可以利用多个摄像头提供不同角度方位的视频信息。
通信模块301可以设置为接收主控模块304发送的控制信息,并根据接收的控制信息,对接收的来自对端的信息进行解码后发送至本地的数据处理模块302。通信模块301还可以配置为根据接收的控制信息,对来自音视频采集模块303的信息进行编码,将编码后的信息可以发送至对端的通信模块301。
数据处理模块302还可以配置为将融合三维人物形象的虚拟立体场景、本地的音频信息以及对端的音频信息同步输出至输出模块305。
输出模块305可以设置为接收主控模块304发送的控制信息,根据接收的控制信息,可以将融合三维人物形象的虚拟立体场景、本地用户的语音信息以及对端用户的语音信息同步呈现给用户。输出模块305可以由显示器和扬声器实现。
上述实施例可提供为方法、系统、或计算机程序产品。因此,上述实施例可采用硬件、软件、或软件和硬件结合的形式实现。
上述实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式,该计算机软件产品存储在一个存储介质中,包括一个或多个指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行上述实施例中方法的全部或部分步骤。而前述的存储介质可以是非暂态存储介质,包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等多种可以存储程序代码的介质,也可以是暂态存储介质。
一实施例提供了一种终端的硬件结构示意图。参见图4,该终端包括:
至少一个处理器(processor)40,图4中以一个处理器40为例;以及存储器(memory)41,还可以包括图像采集设备42、显示器43、通信接口(Communications Interface)44和总线45。其中,处理器40、存储器41、图像采集设备42、显示器43以及通信接口44可以通过总线45完成相互间的通信。
图像采集设备42设置为采集本地用户的当前视频画面。显示器43设置为显示所述融合人物形象的虚拟立体场景。通信接口44可以用于信息传输。处理器40可以调用存储器41中的逻辑指令,以执行图1c所在实施例中的方法。
此外,上述的存储器41中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。
存储器41作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序,如图1c所在实施例中的方法对应的程序指令或模块。处理器40通过运行存储在存储器41中的软件程序、指令或模块,从而执行功能应用以及数据处理,即实现图1c所在实施例中的方法。
存储器41可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器41可以包括高速随机存取存储器,还可以包括非易失性存储器。
本申请是参照根据上述实施例的方法、设备(系统)、和计算机程序产品的流程图和方框图中至少之一来描述的。应理解可由计算机程序指令实现流程图中的每一流程、或方框图中的每一方框、或者,流程图中的每一流程和方框图中每一方框。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现以下至少之一中指定的功能的装置:流程图中的一个流程或多个流程,以及方框图中的一个方框或多个方框。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现以下至少之一中指定的功能的装置:流程图中的一个流程或多个流程,以及方框图中的一个方框或多个方框。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现以下至少之一:在流程图中一个流程或多个流程指定的功能的步骤,和方框图中一个方框或多个方框中指定的功能。
工业实用性
视频通信方法、设备和终端,能够解决可视通信技术中出现的通讯效果单一、整体关联性差和互动性低的问题。

Claims (13)

  1. 一种视频通信方法,包括:
    获取本地用户的第一当前视频画面,并接收对端用户的第二当前视频画面;
    分别对所述第一当前视频画面和所述第二当前视频画面进行人脸识别,基于人脸识别结果构建所述本地用户的第一三维人物形象和所述对端用户的第二三维人物形象;以及,
    将所述第一三维人物形象和所述第二三维人物形象融合至预先构建的第一虚拟立体场景中,得到融合人物形象的第二虚拟立体场景,以使所述第二虚拟立体场景呈现在本地。
  2. 根据权利要求1所述的方法,还包括:将所述第一当前视频画面发送至对端。
  3. 根据权利要求1或2所述的方法,其中,所述人脸识别结果包括:识别出的所述本地用户的第一人脸图像和识别出的所述对端用户的第二人脸图像;
    所述基于人脸识别结果构建所述本地用户的第一三维人物形象和所述对端用户的第二三维人物形象,包括:
    在所述第一当前视频画面中,对所述本地用户的整体人物图像进行边缘检测,得到所述本地用户的第一边缘检测结果;根据所述第一人脸图像和所述第一边缘检测结果,生成所述第一三维人物形象;
    在所述第二当前视频画面中,对所述对端用户的整体人物图像进行边缘检测,得到所述对端用户的第二边缘检测结果;以及,根据所述第二人脸图像和所述第二边缘检测结果,生成所述第二三维人物形象。
  4. 根据权利要求3所述的方法,在所述生成所述第一三维人物形象之前,所述方法还包括:根据所述第一人脸图像、以及人脸图像到三维人物形象的尺寸映射关系,确定所述第一三维人物形象的第一尺寸;以及
    在所述生成所述第二三维人物形象之前,所述方法还包括:根据所述第二人脸图像、以及人脸图像到三维人物形象的尺寸映射关系,确定所述第二三维人物形象的第二尺寸。
  5. 根据权利要求1或2所述的方法,其中,所述将所述第一三维人物形象和所述第二三维人物形象融合至预先构建的第一虚拟立体场景中,包括:
    获取本地拍摄角度数据和对端拍摄角度数据,其中,所述本地拍摄角度数据用于表示所述第一当前视频画面对应的摄像头拍摄角度,所述对端拍摄角度数据用于表示所述第二当前视频画面对应的摄像头拍摄角度;
    根据所述本地拍摄角度数据和所述对端拍摄角度数据,确定所述第一虚拟立体场景中所述第一三维人物形象与所述第二三维人物形象的相对方位关系;以及,
    基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中。
  6. 根据权利要求4所述的方法,在所述将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中之前,所述方法还包括:设置所述第一三维人物形象在所述第一虚拟立体场景中的第一位置区域,以及所述第二三维人物形象在所述第一虚拟立体场景中的第二位置区域;其中,
    所述基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中,包括:基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象同时布置在所述第一虚拟立体场景中;根据所述第一位置区域以及所述第二位置区域,确定所述第一虚拟立体场景中一个或多个虚拟立体元素的位置;以及,根据所述一个或多个虚拟立体元素的位置,在所述第一虚拟立体场景中的相应位置生成所述一个或多个虚拟立体元素。
  7. 一种视频通信设备,包括:获取模块、识别模块、构建模块以及融合模块;其中,
    获取模块,设置为获取本地用户的第一当前视频画面,,并接收对端用户的第二当前视频画面;
    识别模块,设置为分别对所述第一当前视频画面和所述第二当前视频画面进行人脸识别,得出人脸识别结果;
    构建模块,设置为基于所述人脸识别结果构建所述本地用户的第一三维人物形象和所述对端用户的第二三维人物形象;以及
    融合模块,设置为将所述第一三维人物形象和所述第二三维人物形象融合至预先构建的第一虚拟立体场景中,得到融合人物形象的第二虚拟立体场景, 以使所述第二虚拟立体场景呈现在本地。
  8. 根据权利要求7所述的设备,还包括:
    发送模块,设置为将所述第一当前视频画面发送至对端。
  9. 根据权利要求7或8所述的设备,其中,所述人脸识别结果包括:识别出的所述本地用户的第一人脸图像和识别出的所述对端用户的第二人脸图像;
    所述构建模块,设置为在所述第一当前视频画面中,对所述本地用户的整体人物图像进行边缘检测,得到所述本地用户的第一边缘检测结果;根据所述第一人脸图像和所述第一边缘检测结果,生成所述第一三维人物形象;
    在所述第二当前视频画面中,对所述对端用户的整体人物图像进行边缘检测,得到所述对端用户的第二边缘检测结果;以及,根据所述第一人脸图像和所述第一边缘检测结果,生成所述第一三维人物形象。
  10. 根据权利要求9所述的设备,其中,所述构建模块,还设置为在生成所述第一三维人物形象之前,根据所述第一人脸图像、以及所述第一人脸图像到所述第一三维人物形象的第一尺寸映射关系,确定所述第一三维人物形象的第一尺寸;以及,
    在生成所述第二三维人物形象之前,根据所述第二人脸图像、以及所述第二人脸图像到所述第二三维人物形象的第二尺寸映射关系,确定所述第二三维人物形象的第二尺寸。
  11. 根据权利要求7或8所述的设备其中,所述融合模块,设置为获取本地拍摄角度数据和对端拍摄角度数据,根据所述本地拍摄角度数据和所述对端拍摄角度数据,确定所述第一虚拟立体场景中所述第一三维人物形象与所述第二三维人物形象的相对方位关系;以及,基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中;其中,所述本地拍摄角度数据用于表示所述第一当前视频画面对应的摄像头拍摄角度,以及所述对端拍摄角度数据用于表示所述第二当前视频画面对应的摄像头拍摄角度。
  12. 根据权利要求11所述的设备,其中,所述融合模块,还设置为在将所述第一三维人物形象和所述第二三维人物形象融合至所述第一虚拟立体场景中之前,设置所述第一三维人物形象在所述的第一虚拟立体场景中的第一位置区 域,以及所述第二三维人物形象在所述第一虚拟立体场景中的第二位置区域;所述融合模块,设置为基于所述相对方位关系,将所述第一三维人物形象和所述第二三维人物形象同时布置在所述第一虚拟立体场景中;根据所述第一位置区域以及所述第二位置区域,确定所述第一虚拟立体场景中一个或多个虚拟立体元素的位置;以及,根据所述一个或多个虚拟立体元素的位置,在所述第一虚拟立体场景中的相应位置生成所述一个或多个虚拟立体元素。
  13. 一种终端,包括权利要求7至12任一项所述的设备。14、一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1-6任一项视频通信的方法。
PCT/CN2017/119602 2016-12-29 2017-12-28 视频通信方法、设备和终端 WO2018121699A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611245934.7A CN108259806A (zh) 2016-12-29 2016-12-29 一种视频通信方法、设备和终端
CN201611245934.7 2016-12-29

Publications (1)

Publication Number Publication Date
WO2018121699A1 true WO2018121699A1 (zh) 2018-07-05

Family

ID=62707922

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/119602 WO2018121699A1 (zh) 2016-12-29 2017-12-28 视频通信方法、设备和终端

Country Status (2)

Country Link
CN (1) CN108259806A (zh)
WO (1) WO2018121699A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111047647A (zh) * 2019-11-28 2020-04-21 咪咕视讯科技有限公司 定位方法、电子设备和计算机可读存储介质
CN112445995A (zh) * 2020-11-30 2021-03-05 北京邮电大学 WebGL下的场景融合展示方法及装置
CN114880535A (zh) * 2022-06-09 2022-08-09 昕新讯飞科技(北京)有限公司 一种基于通讯大数据的用户画像生成方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109525483A (zh) * 2018-11-14 2019-03-26 惠州Tcl移动通信有限公司 移动终端及其互动动画的生成方法、计算机可读存储介质
CN112492231B (zh) * 2020-11-02 2023-03-21 重庆创通联智物联网有限公司 远程交互方法、装置、电子设备和计算机可读存储介质
CN115396390A (zh) * 2021-05-25 2022-11-25 Oppo广东移动通信有限公司 基于视频聊天的互动方法、系统、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127737A (zh) * 2007-09-25 2008-02-20 腾讯科技(深圳)有限公司 用户界面的实现方法、用户终端和即时通讯系统
CN101635705A (zh) * 2008-07-23 2010-01-27 上海赛我网络技术有限公司 基于三维虚拟地图和人物的交互方法及实现该方法的系统
CN103617029A (zh) * 2013-11-20 2014-03-05 中网一号电子商务有限公司 一种3d即时通讯系统
CN104935860A (zh) * 2014-03-18 2015-09-23 北京三星通信技术研究有限公司 视频通话实现方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8976224B2 (en) * 2012-10-10 2015-03-10 Microsoft Technology Licensing, Llc Controlled three-dimensional communication endpoint
CN105578145A (zh) * 2015-12-30 2016-05-11 天津德勤和创科技发展有限公司 一种三维虚拟场景与视频监控实时智能融合的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127737A (zh) * 2007-09-25 2008-02-20 腾讯科技(深圳)有限公司 用户界面的实现方法、用户终端和即时通讯系统
CN101635705A (zh) * 2008-07-23 2010-01-27 上海赛我网络技术有限公司 基于三维虚拟地图和人物的交互方法及实现该方法的系统
CN103617029A (zh) * 2013-11-20 2014-03-05 中网一号电子商务有限公司 一种3d即时通讯系统
CN104935860A (zh) * 2014-03-18 2015-09-23 北京三星通信技术研究有限公司 视频通话实现方法及装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111047647A (zh) * 2019-11-28 2020-04-21 咪咕视讯科技有限公司 定位方法、电子设备和计算机可读存储介质
CN111047647B (zh) * 2019-11-28 2024-04-09 咪咕视讯科技有限公司 定位方法、电子设备和计算机可读存储介质
CN112445995A (zh) * 2020-11-30 2021-03-05 北京邮电大学 WebGL下的场景融合展示方法及装置
CN112445995B (zh) * 2020-11-30 2024-02-13 北京邮电大学 WebGL下的场景融合展示方法及装置
CN114880535A (zh) * 2022-06-09 2022-08-09 昕新讯飞科技(北京)有限公司 一种基于通讯大数据的用户画像生成方法
CN114880535B (zh) * 2022-06-09 2023-04-21 武汉十月科技有限责任公司 一种基于通讯大数据的用户画像生成方法

Also Published As

Publication number Publication date
CN108259806A (zh) 2018-07-06

Similar Documents

Publication Publication Date Title
WO2018121699A1 (zh) 视频通信方法、设备和终端
WO2019192351A1 (zh) 短视频拍摄方法、装置及电子终端
JP7135141B2 (ja) 情報処理システム、情報処理方法、および情報処理プログラム
CN110176077B (zh) 增强现实拍照的方法、装置及计算机存储介质
CN111402399B (zh) 人脸驱动和直播方法、装置、电子设备及存储介质
CN110401810B (zh) 虚拟画面的处理方法、装置、系统、电子设备及存储介质
CN109997175B (zh) 确定虚拟对象的大小
CN113228625A (zh) 支持复合视频流的视频会议
WO2018095317A1 (zh) 视频数据处理方法、装置及设备
CN109257559A (zh) 一种全景视频会议的图像显示方法、装置及视频会议系统
CN109242940B (zh) 三维动态图像的生成方法和装置
WO2018214746A1 (zh) 一种视频会议实现方法、装置、系统及计算机存储介质
WO2018040510A1 (zh) 一种图像生成方法、装置及终端设备
WO2013178188A1 (zh) 视频会议显示方法及装置
WO2017124870A1 (zh) 一种处理多媒体信息的方法及装置
US10244220B2 (en) Multi-camera time slice system and method of generating integrated subject, foreground and background time slice images
CN111163280B (zh) 非对称性视频会议系统及其方法
CN114531564A (zh) 处理方法及电子设备
CN108320331B (zh) 一种生成用户场景的增强现实视频信息的方法与设备
CN105893452B (zh) 一种呈现多媒体信息的方法及装置
CN104780341B (zh) 一种信息处理方法以及信息处理装置
KR20130067855A (ko) 시점 선택이 가능한 3차원 가상 콘텐츠 동영상을 제공하는 장치 및 그 방법
JP6091850B2 (ja) テレコミュニケーション装置及びテレコミュニケーション方法
US20230138434A1 (en) Extraction of user representation from video stream to a virtual environment
CN116962744A (zh) 网络直播的连麦互动方法、装置及直播系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17889367

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17889367

Country of ref document: EP

Kind code of ref document: A1