WO2018153267A1 - 群组视频会话的方法及网络设备 - Google Patents

群组视频会话的方法及网络设备 Download PDF

Info

Publication number
WO2018153267A1
WO2018153267A1 PCT/CN2018/075749 CN2018075749W WO2018153267A1 WO 2018153267 A1 WO2018153267 A1 WO 2018153267A1 CN 2018075749 W CN2018075749 W CN 2018075749W WO 2018153267 A1 WO2018153267 A1 WO 2018153267A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
data
virtual
dimensional
video data
Prior art date
Application number
PCT/CN2018/075749
Other languages
English (en)
French (fr)
Inventor
李凯
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201710104442.4A external-priority patent/CN108513089B/zh
Priority claimed from CN201710104439.2A external-priority patent/CN108513088B/zh
Priority claimed from CN201710104669.9A external-priority patent/CN108513090B/zh
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018153267A1 publication Critical patent/WO2018153267A1/zh
Priority to US16/435,733 priority Critical patent/US10609334B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/161Encoding, multiplexing or demultiplexing different image signal components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/194Transmission of image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/157Conference systems defining a virtual conference space and using avatars or agents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras

Definitions

  • the present invention relates to the field of VR (Virtual Reality) technology, and in particular, to a method and a network device for a group video session.
  • VR Virtual Reality
  • VR technology is a technology that can create and experience virtual worlds. It can simulate a realistic environment and intelligently perceive users' behaviors, making users feel immersive. Therefore, the application of VR technology in social aspects has received extensive attention, and a method of group video conversation based on VR technology has emerged.
  • the server may create a virtual environment for multiple virtual users using the VR device, and superimpose the virtual characters selected by the virtual user with the virtual environment to express the virtual user's image in the virtual environment.
  • the server can send the video of the virtual user's audio and image overlay to the virtual user, bringing a visual and auditory experience to the virtual user, making the virtual user seem to talk to other virtual users in the virtual world.
  • Virtual users can only perform group video conversations with virtual users.
  • Today, VR devices are not yet widely used. There are many communication barriers between ordinary users and virtual users who do not use VR devices, resulting in restrictions on group video sessions. Strong and flexible.
  • the embodiments of the present invention provide a group video session method and a network device, so that different types of users can perform group video sessions without restriction, and the flexibility of the group video session is improved.
  • the technical solution is as follows:
  • a method of group video session comprising:
  • a user type of the user according to the device information of the user, where the user type includes an ordinary user and a virtual user, and the ordinary user is used to indicate the user
  • the user type includes an ordinary user and a virtual user
  • the ordinary user is used to indicate the user
  • the virtual user is used to indicate that the user adopts a virtual reality display mode when participating in the group video session
  • the video display mode indicated by the user type matches;
  • the target video data is sent to the user equipment of the user, so that the user performs a group video session.
  • a method of group video session comprising:
  • the receiving server sends the target video data of the group video session, where the video display mode of the target video data matches the video display mode indicated by the user type of the terminal user, and the user type of the terminal user is an ordinary user, and the ordinary user And configured to indicate that the terminal user adopts a two-dimensional display mode when participating in the group video session;
  • the target video data is displayed such that a common user in the group video session is displayed in the form of a two-dimensional character, and the virtual user in the group video session is displayed in the form of a two-dimensional virtual character.
  • a method of group video session comprising:
  • the receiving server sends the target video data of the group video session, where the video display mode of the target video data matches the video display mode indicated by the user type of the VR device user, and the user type of the VR device user is a virtual user,
  • the virtual user is configured to indicate that the VR device user adopts a virtual reality display mode when participating in the group video session;
  • Displaying the target video data so that a common user in the group video session is displayed in a virtual environment in the form of a two-dimensional character or a three-dimensional character, and the virtual user in the group video session is virtualized in three dimensions in the virtual environment.
  • the form of the character is displayed.
  • an apparatus for a group video session comprising:
  • a determining module configured to determine, for each user in the group video session, a user type of the user according to device information of the user, where the user type includes an ordinary user and a virtual user, where the common user uses Instructing the user to adopt a two-dimensional display mode when participating in the group video session, where the virtual user is used to indicate that the user adopts a virtual reality display mode when participating in the group video session;
  • a processing module configured to process video data of the group video session according to a video display mode indicated by a user type of the user, to obtain target video data of the user, and a video display mode of the target video data Matching the video display mode indicated by the user type of the user;
  • a sending module configured to send target video data to the user equipment of the user during the progress of the group video session, so that the user performs a group video session.
  • an apparatus for a group video session comprising:
  • a receiving module configured to receive, by the server, target video data of a group video session, where a video display mode of the target video data matches a video display mode indicated by a user type of the terminal user, where the user type of the terminal user is a normal user
  • the normal user is configured to indicate that the terminal user adopts a two-dimensional display mode when participating in the group video session;
  • a display module configured to display the target video data, so that a common user in the group video session is displayed in the form of a two-dimensional character, and the virtual user in the group video session is displayed in the form of a two-dimensional virtual character.
  • an apparatus for a group video session comprising:
  • a receiving module configured to receive, by the server, target video data of a group video session, where a video display mode of the target video data matches a video display mode indicated by a user type of the VR device user, where the user type of the VR device user is a virtual user, where the virtual user is used to indicate that the VR device user adopts a virtual reality display mode when participating in the group video session;
  • a display module configured to display the target video data, so that a common user in the group video session is displayed in a virtual environment in the form of a two-dimensional character or a three-dimensional character, and the virtual user in the group video session is in the virtual The environment is displayed in the form of a three-dimensional virtual character.
  • a network device comprising a memory and a processor, the memory for storing instructions, the processor being configured to execute the instructions to perform a group video session as described below A step of:
  • a user type of the user according to the device information of the user, where the user type includes an ordinary user and a virtual user, and the ordinary user is used to indicate the user
  • the user type includes an ordinary user and a virtual user
  • the ordinary user is used to indicate the user
  • the virtual user is used to indicate that the user adopts a virtual reality display mode when participating in the group video session
  • the video display mode indicated by the user type matches;
  • the target video data is sent to the user equipment of the user, so that the user performs a group video session.
  • a terminal comprising a memory and a processor, the memory for storing instructions, the processor being configured to perform the instructions to perform the steps of the group video session described below :
  • the common The user is configured to indicate that the terminal user adopts a two-dimensional display mode when participating in the group video session;
  • the target video data is displayed such that a common user in the group video session is displayed in the form of a two-dimensional character, and the virtual user in the group video session is displayed in the form of a two-dimensional virtual character.
  • a virtual reality VR device comprising a memory and a processor, the memory for storing instructions, the processor being configured to execute the instructions to perform a group video session described below Steps to the method:
  • the target video data of the group video session where the video display mode of the target video data matches the video display mode indicated by the user type of the VR device user, and the user type of the VR device user is a virtual user.
  • the virtual user is used to indicate that the VR device user adopts a virtual reality display mode when participating in the group video session;
  • Displaying the target video data so that a common user in the group video session is displayed in a virtual environment in the form of a two-dimensional character or a three-dimensional character, and the virtual user in the group video session is virtualized in three dimensions in the virtual environment.
  • the form of the character is displayed.
  • a group video session system comprising:
  • a network device configured to create a group video session; for each user in the group video session, determining a user type of the user according to device information of the user, the user type including a normal user and a virtual a user, the normal user is used to indicate that the user adopts a two-dimensional display mode when participating in the group video session, and the virtual user is used to indicate that the user adopts a virtual reality display when participating in the group video session.
  • a mode of processing video data of the group video session according to a video display mode indicated by a user type of the user, to obtain target video data of the user, a video display mode of the target video data, and the The video display mode indicated by the user type of the user is matched; during the progress of the group video session, the target video data is sent to the user equipment of the user, so that the user performs a group video session;
  • the terminal is configured to receive the target video data of the group video session, where the video display mode of the target video data matches the video display mode indicated by the user type of the terminal user, and the user type of the terminal user is normal.
  • the normal user is used to indicate that the terminal user adopts a two-dimensional display mode when participating in the group video session; displaying the target video data, so that an ordinary user in the group video session is displayed as a two-dimensional character
  • the virtual user in the group video session is displayed in the form of a two-dimensional virtual character;
  • a virtual reality VR device configured to receive, by the network device, target video data of a group video session, where a video display mode of the target video data matches a video display mode indicated by a user type of the VR device user, where the VR device user
  • the user type is a virtual user, and the virtual user is used to indicate that the VR device user adopts a virtual reality display mode when participating in the group video session; displaying the target video data to make a common user in the group video session Displayed in the form of a two-dimensional character or a three-dimensional character in a virtual environment, virtual users in the group video session are displayed in the virtual environment in the form of three-dimensional virtual characters.
  • a method of group video session comprising:
  • an apparatus for a group video session comprising:
  • a virtual character acquisition module configured to acquire a virtual character of the first user in the group video session, where the virtual character of the first user is based at least according to the head feature data of the first user and the limb model corresponding to the first user get;
  • a video data obtaining module configured to acquire video data of the first user, based on behavior characteristics data of the virtual character of the first user and the first user, in the process of the group video session, The action of the virtual character of the first user in the video data matches the actual action of the first user;
  • a sending module configured to send the video data of the first user to a terminal where the second user participating in the group video session is located, to implement the group video session.
  • a virtual reality VR device comprising a memory and a processor, the memory for storing instructions, the processor being configured to execute the instructions to perform a group video session described below Steps to the method:
  • a network device comprising a memory and a processor, the memory for storing instructions, the processor being configured to execute the instructions to perform a group video session as described below A step of:
  • a method of group video session comprising:
  • an apparatus for a group video session comprising:
  • An interaction model obtaining module configured to acquire a three-dimensional interaction model of the object to be displayed during the group video session
  • a processing module configured to process, according to a perspective of each user of the multiple users in the group video session, a three-dimensional interaction model of the target object, to obtain video data of the user, where the video data of the user includes Model data obtained by performing a perspective transformation on a three-dimensional interaction model of the target;
  • a sending module configured to separately send video data of the multiple users to the terminal where the multiple users are located.
  • a network device comprising a memory and a processor, the memory for storing instructions, the processor being configured to execute the instructions to perform a group video session as described below A step of:
  • the embodiment of the present invention determines the user type of each user in the group video session, and processes the video data of the group video session according to the user type, so that when the user type is a virtual user, the virtual reality display indicated by the virtual user can be obtained.
  • the target video data of the pattern matching when the user type is a normal user, can obtain the target video data that matches the two-dimensional display mode indicated by the ordinary user, thereby displaying the video data in a reasonable display mode for different types of users, so that different Group video users can perform group video sessions without restrictions, improving the flexibility of group video sessions.
  • FIG. 1 is a schematic diagram of an implementation environment of a group video session according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for a group video session according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a user display position according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a group video session scenario according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a display scenario according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a virtual user performing a group video session according to an embodiment of the present invention.
  • FIG. 7 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • FIG. 8 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • FIG. 9 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • FIG. 10 is a flowchart of a method for a group video session according to an embodiment of the present invention.
  • FIG. 11 is a flowchart of acquiring a virtual character according to an embodiment of the present invention.
  • FIG. 12 is a flowchart of acquiring head orientation data according to an embodiment of the present invention.
  • FIG. 13 is a flowchart of acquiring video data according to an embodiment of the present invention.
  • FIG. 14 is a flowchart of a group video session according to an embodiment of the present invention.
  • FIG. 15 is a flowchart of displaying video data according to an embodiment of the present invention.
  • 16 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • FIG. 17 is a flowchart of a method for group video session according to an embodiment of the present invention.
  • FIG. 18 is a schematic diagram of a three-dimensional interaction model according to an embodiment of the present invention.
  • FIG. 19 is a flowchart of adjusting a three-dimensional interaction model according to an embodiment of the present invention.
  • 21 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • 22 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • FIG. 23 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • 24 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • 25 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • 26 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • FIG. 27 is a structural block diagram of a terminal 2700 according to an exemplary embodiment of the present invention.
  • FIG. 28 is a block diagram of a network device according to an embodiment of the present invention.
  • FIG. 1 is a schematic diagram of an implementation environment of a group video session according to an embodiment of the present invention.
  • the implementation environment includes:
  • At least one terminal 101 e.g., mobile terminal and tablet
  • at least one VR device 102 may correspond to the process of the group video session in the following embodiments
  • the server 103 is configured to create a group video session for different types of users, receive and process the terminal 101, and The video data sent by the VR device 102, and the processed video data are sent to the terminal 101 or the VR device 102, so that group video sessions can be performed between different types of users.
  • the terminal 101 is configured to transmit the video data captured by the camera to the server 103 in real time, and receive and display the video data processed by the server 103.
  • the VR device 102 is configured to send the behavior characteristic data of the user collected by the sensing device to the server 103, and receive and display the video data processed by the server 103.
  • the server 103 may also be configured to obtain a virtual character of the user using the terminal 101 or the VR device 102, obtain video data based on the virtual character of the user and behavior characteristic data.
  • the terminal 101 is configured to receive and display video data transmitted by the server 103.
  • the VR device 102 can also be used to acquire the virtual character of the user of the VR device 102, obtain video data based on the virtual character and behavior characteristic data of the user.
  • the server 103 can also configure at least one database, such as a five-member model database, a limb model data, a virtual character database, a user profile database, a user relationship chain database, and the like.
  • the facial features database is used to store the cartoonized five-member model;
  • the limb model database is used to store the cartoonized limb model, the limb model database can also store the loaded;
  • the virtual character database is used to store the user identifier and the virtual character of the user;
  • the user profile database is used to store at least user attributes such as age data, gender data, and occupation data of the user;
  • the user relationship chain database is used to store user relationship chain data that the user has, for example, the user relationship chain data is at least used to indicate that the user is A user of a friend relationship or group relationship.
  • a facial feature model, a limb model, or a virtual character may be acquired from at least one database configured by the server 103.
  • the virtual characters (including the head model and the limb model) involved in the embodiments of the present invention may be in a three-dimensional form.
  • FIG. 2 is a flowchart of a method for a group video session according to an embodiment of the present invention. Referring to FIG. 2, the method is applied to an interaction process between a server and a terminal and a VR device.
  • the server creates a group video session.
  • a group video session is a video session made by multiple (two or more) users based on the server.
  • the multiple users may be multiple users on the social platform corresponding to the server, and the multiple users may be a group relationship or a friend relationship.
  • a group video session can be created.
  • the manner of initiating the group video session request is not limited in this embodiment of the present invention.
  • a user initiates a group video session request for all users in the group in an established group.
  • the group video session request may carry the group identifier of the group, so that the server may The group ID gets the user ID of each user in the group.
  • the user may also initiate a group video session request after selecting some users from the established group or the user relationship chain.
  • the group video session request may carry the user identifier of the user and the selected user. .
  • the server obtains the user ID the user corresponding to the user ID can be added to the group video session to create a group video session.
  • the server determines the user type of the user according to the device information of the user.
  • the device information may be the device model of the user device used by the user to log in to the server.
  • the device model may be in the form of a mobile phone brand or a mobile phone model, so that the server can determine the device type of the user device according to the corresponding relationship between the device model and the device type.
  • the type can be a PC (Personal Computer) terminal, a mobile terminal, or a VR device.
  • the server may obtain the device information in multiple manners. For example, when the user device sends a login request to the server, the login request may carry the user identifier and the device information, so that the server can extract the user identifier and the device information when receiving the login request. And correspondingly storing, or the server sends a device information acquisition request to the user equipment, so that the user equipment sends the device information to the server.
  • the server needs to process video data in different ways for users using different user equipments to obtain video data that matches the video display mode supported by the user equipment, and in order to determine how to process video data for a certain user, the server needs to determine first.
  • the user type of the user includes a normal user and a virtual user.
  • the normal user is used to indicate that the user uses the two-dimensional display mode when participating in the group video session. If the user is a normal user, the user is a user who logs in to the server using a non-VR device, and is not a VR.
  • the virtual user is used to indicate that the user uses the virtual reality display mode when participating in the group video session. If the user is a virtual user, the user is the user who logs in to the server using the VR device.
  • the server may query the user type corresponding to the device information of the user according to the pre-configured device information, the correspondence between the device type and the user type. For an example of the correspondence, see Table 1:
  • the user can also set the device information by himself.
  • the device information setting page is provided on the VR device, and the VR device user can set the current device information to “WW N7” or leave the default setting “UU N7”.
  • the server can obtain the device information set by the VR device user, thereby determining the type of user that the VR device user tends to experience.
  • the server processes the video data of the group video session according to the video display mode indicated by the user type of the user, to obtain the target video data of the user.
  • the video display mode of the target video data matches the video display mode indicated by the user type of the user.
  • the server determines that the user adopts a two-dimensional display mode when participating in the group video session, and adopts a video data processing manner corresponding to the two-dimensional display mode for the user.
  • the server determines that the user adopts a virtual reality display mode when participating in the video session, and adopts a video data processing manner corresponding to the virtual reality display mode for the user.
  • the specific processing procedure is not limited in the embodiment of the present invention. The following describes the video data processing methods corresponding to each type of user:
  • Steps 203A-203C The process when the user type is a normal user is as follows: Steps 203A-203C:
  • the server converts the three-dimensional virtual character corresponding to the virtual user in the group video session into a two-dimensional virtual character.
  • the three-dimensional virtual character is used to express the character image of the virtual user in the three-dimensional image data, so that the user can be displayed as a three-dimensional virtual character in the group video session.
  • the server can obtain a three-dimensional virtual character in a variety of ways. For example, before the virtual user confirms entering the group video session, the virtual user is provided with a plurality of three-dimensional virtual characters, and the three-dimensional virtual characters selected by the virtual user are used as the three-dimensional virtual characters corresponding to the virtual user. For another example, the server acquires the user attribute of the virtual user, and uses the three-dimensional virtual character that matches the user attribute as the three-dimensional virtual character corresponding to the virtual user.
  • the user attribute includes information such as age, gender, and occupation, and the virtual user's
  • the user attribute is a 30-year-old female teacher.
  • the server can select a three-dimensional virtual character of the female teacher image as a three-dimensional virtual character corresponding to the virtual user.
  • the server may convert the three-dimensional virtual character into a two-dimensional virtual character based on the acquired three-dimensional virtual character.
  • the two-dimensional virtual character may be static or dynamic, and the embodiment of the present invention is This is not limited.
  • the two-dimensional image data of a certain angle of view can be directly extracted from the three-dimensional image data corresponding to the three-dimensional virtual character, and the two-dimensional image data of the viewing angle is used as a two-dimensional virtual character, in order to be as comprehensive as possible.
  • the perspective can be a frontal perspective.
  • the server may acquire behavior characteristic data of the virtual user collected by the three-dimensional virtual character and the VR device, and the behavior feature data includes expression feature data or limb feature data of the virtual user, and further, the server
  • the behavior characteristics of the three-dimensional virtual character can be determined according to the behavior characteristic data, and the three-dimensional virtual character conforming to the behavior characteristic is generated, so that the behavior of the three-dimensional virtual character is synchronized with the behavior of the virtual user, and then the three-dimensional virtual character is converted into a two-dimensional virtual character, the specific
  • the process can be referred to the process shown in FIG. 10 below, and will not be described in detail herein.
  • the server synthesizes the two-dimensional virtual character, the two-dimensional background selected by the virtual user, and the audio data corresponding to the virtual user, to obtain the first two-dimensional video data.
  • the server may also add a two-dimensional background to the two-dimensional virtual character.
  • the two-dimensional background refers to the background of a two-dimensional virtual character, such as a two-dimensional conference background and a two-dimensional beach background.
  • the server may provide multiple two-dimensional backgrounds before entering the group video session for the virtual user, or obtain a two-dimensional background selected by the virtual user. In fact, the server can also obtain the two-dimensional background by other means, for example, randomly acquiring the two-dimensional background corresponding to the virtual user.
  • the server may use the two-dimensional image data mapped by the virtual environment corresponding to the group video session as a two-dimensional background, or the server may acquire the The label of the virtual environment uses the same two-dimensional image data as the two-dimensional background.
  • the label of the virtual environment is “forest”, and the server can use the two-dimensional image data labeled “forest” as a two-dimensional background.
  • the two-dimensional background can be static or dynamic.
  • the server can determine the display position and the composite size of the two-dimensional virtual character on the two-dimensional background, and adjust the original display size of the two-dimensional virtual character to obtain a two-dimensional virtual character conforming to the composite size, and the two-dimensional virtual
  • the characters are synthesized to the corresponding display position on the two-dimensional background, and the layer of the two-dimensional virtual character is above the layer of the two-dimensional background, and the image data corresponding to the virtual user is obtained.
  • the server may also determine a display area corresponding to the display position and the composite size on the two-dimensional background, remove the pixel points in the display area, and embed the image data corresponding to the two-dimensional virtual character into the display area, thereby The embedded two-dimensional image data is used as the image data currently corresponding to the virtual user.
  • the user equipment can send the recorded audio data to the server in real time, so when the server receives the audio data corresponding to the virtual user, the current image can be The data is synthesized with the audio data to obtain the first two-dimensional video data to express the current words and deeds of the virtual user.
  • the server does not currently receive the audio data corresponding to the virtual user, the current image data may be directly used as the first two-dimensional video data.
  • the server synthesizes the at least one first two-dimensional video data and the at least one second two-dimensional video data to obtain target video data of the user.
  • the second two-dimensional video data refers to two-dimensional video data of a general user in a group video session.
  • the server determines the display position and the composite size of the current two-dimensional video data of each user in the group video session, and synthesizes the current video data of each user into a virtual environment according to the determined display position and the combined size.
  • the two-dimensional video data, and the layer of the user's two-dimensional video data is above the layer of the virtual environment, and the synthesized two-dimensional video data is used as the target video data of the user.
  • steps 202B and 202C may also correspond to a synthesis process in which the server omits the step of synthesizing the first two-dimensional video data, directly to the two-dimensional virtual character, the two-dimensional background, The audio data corresponding to the virtual user and the second two-dimensional video data are combined to obtain target video data.
  • Step 203D-203H The process when the user type is a virtual user is as follows: Step 203D-203H:
  • the server determines a virtual environment corresponding to the group video session.
  • the virtual environment refers to the three-dimensional background of the virtual user in the group video session, such as a three-dimensional image such as a round table virtual environment, a beach virtual environment, and a board game virtual environment.
  • the specific manner of determining the virtual environment is not limited in the embodiment of the present invention.
  • the server can use the following three methods of determination:
  • the server determines the virtual environment corresponding to the virtual environment option triggered by the user as the virtual environment corresponding to the user in the group video session.
  • the server can provide a variety of virtual environments, and the user can freely select the virtual environment in the group video session.
  • the server may provide at least one virtual environment option and a corresponding virtual environment thumbnail on the VR device (or the terminal bound to the VR device), and each virtual environment option corresponds to one virtual environment.
  • the VR device detects the triggering operation of the virtual environment option, the VR device may send the virtual environment identifier corresponding to the virtual environment option to the server, and when the server obtains the virtual environment identifier, the virtual environment corresponding to the virtual environment identifier may be Determine the virtual environment for the user in the group video session.
  • the second determining mode determines the capacity of the virtual environment corresponding to the group video session according to the number of users in the group video session, and determines the virtual environment that meets the capacity as the virtual environment corresponding to the group video session.
  • the server can acquire the number of users in the group video session, thereby determining the capacity that the virtual environment should have, and the capacity is used to indicate the virtual
  • the number of users that the environment can accommodate, for example, the capacity of the round table virtual environment corresponds to the number of seats in the virtual environment.
  • the server may select a virtual environment that is closest to the capacity from the plurality of stored virtual environments according to the determined capacity. For example, the number of users is 12, the server stores three roundtable virtual environments, and the number of seats in each roundtable virtual environment is 5, 10, and 15, so the server can determine the round table virtual environment with 12 seats.
  • the third determining mode is to analyze the virtual environment selected by each user in the group video session, and obtain the selected number of times of each virtual environment, and determine the virtual environment with the most selected times as the virtual environment corresponding to the group video session. .
  • the server comprehensively analyzes the virtual environment selected by each user, and obtains a virtual environment that more users prefer. For example, there are 5 users in the group video session, and each user selects the virtual environment as shown in Table 2. Therefore, the server can determine that the virtual environment 1 is selected the most (4 times) through the table 2, and the virtual environment is selected. 1 Determine the virtual environment corresponding to the user in the group video session.
  • Virtual environment A Virtual environment 1, virtual environment 2 B Virtual environment 3, C Virtual environment 1, D Virtual environment 1, virtual environment 3, E Virtual environment 1,
  • the server determines the virtual environment for a certain user
  • the virtual environment corresponding to the user may be directly determined as each virtual user in the group video session. Corresponding virtual environment.
  • any two or three of the above three determination manners may also be combined.
  • the embodiment of the present invention does not limit the combination manner.
  • the first determination mode is combined with the third determination mode. If the server receives the virtual environment identifier triggered by the user, the virtual environment corresponding to the virtual environment identifier is determined. Otherwise, the server adopts a third determination manner.
  • the virtual environment is a three-dimensional background, and the server determines a display position of each user in the group video session in the virtual environment.
  • the server needs to determine the display position of each user in the virtual environment, and the display position refers to the composite location of the video data of the ordinary user or the virtual user.
  • the embodiment of the present invention does not limit the manner in which the display position is determined. For example, for the user, the user's perspective can be set to a front view, so that the orientation of the corresponding three-dimensional virtual character is consistent with the orientation of the front view. Therefore, the user may or may not display in the group video session. If displayed, referring to FIG. 3, the user may correspond to the display position indicated by the arrow in FIG.
  • the server can determine the display position by the following five determination methods (determination mode 1 - determination mode 5).
  • the determination method takes into account the social tendency of each user in the actual session, and determines the display position of each user according to the intimacy.
  • social data is not limited to data such as the number of chats, the length of time to be a friend, and the number of comments.
  • the method for analyzing the intimacy is not limited in the embodiment of the present invention.
  • the intimacy is represented by C
  • the number of chats is represented by chat
  • the weight is 0.4
  • the duration of becoming a friend is represented by time
  • the weight is 0.3
  • the number of comments is praised by comment
  • the weight is 0.3
  • the intimacy can be expressed as:
  • the server can determine the location closest to the user as the display position of the user 3, and sequentially arrange the display positions of the user 4, the user 1 and the user 2 according to the level of intimacy.
  • Determine the mode 2 obtain the user identity of other users, determine the opposite location of the user as the display location of the user with the highest user identity among other users, and randomly determine the display location of the remaining users among other users.
  • the server can determine the display location based on the identity of the user.
  • the user identity is used to indicate the importance of the user in the video session of the group.
  • the embodiment of the present invention does not limit the standard for measuring user identity. For example, if user A is the originating user of the group video session among other users, it is indicated that user A is likely to lead the current group video session, so user A is determined to be the user with the highest identity. For another example, if the user B is an administrator in the group corresponding to the group video session, the user B may also be determined as the user with the highest identity.
  • Determination mode 3 According to the chronological order of other users joining the group video session, the display positions of other users are arranged from either side of the user.
  • the display position can be determined directly according to the time when the user joins the group video session.
  • the user confirms whether to join the group video session. Therefore, when the user equipment detects a certain user's confirmation operation for joining the group video session, the user may send a confirmation join message to the server, when the server receives the group.
  • the user corresponding to the acknowledgment join message may be arranged in the display position closest to the user, and sequentially arranged to display the display position of the user corresponding to the acknowledgment join message.
  • the determining mode 4 determines the location selected by the user as the display position of the user in the virtual environment according to the location selected by the user in the virtual environment.
  • the server In order to determine the process of displaying the location more arbitrarily, the server also supports the user to select the display location. In the determining manner, the server may provide a virtual environment template to each user before the start of the group video session, and each user selects a display location on the virtual environment template. Of course, in order to avoid conflicts between individual users when selecting a display location The server should display the currently displayed display position in real time. For example, when a certain display position is selected, the server may add a non-selectable mark to the display position, so that each user selects the display position in the selectable display position. .
  • the determining mode 5 determines the opposite position of the user as the display position of the ordinary user, and randomly determines the display position of the remaining users among the other users.
  • the server can determine the opposite position of the user. It is the display position of the ordinary user, and randomly determines the display position of the remaining users.
  • each user should correspond to a display area. Therefore, when a certain user A selects a display position, the server determines the display area corresponding to the user A. Moreover, in order to more evenly display the individual users in the virtual environment, the server may pre-define the display area in the virtual environment, for example, for a round table virtual environment, each seat corresponds to a display area.
  • any two or more of the above five determining manners may also be combined.
  • the determining manner 4 and the determining manner 5 are combined, and the server first determines the opposite position of the user as the display position of the ordinary user.
  • each virtual user is provided with a virtual environment template, and the display position determined by the normal user on the virtual environment template has a non-selectable mark, so that each virtual user can select a display position in the optional display position.
  • the server synthesizes the specified video data of the ordinary user to the display position corresponding to the normal user.
  • the specified video data refers to video data conforming to the virtual reality display mode obtained based on the received video data of the ordinary user.
  • the first ordinary user refers to the use.
  • the ordinary user of the binocular camera, the second ordinary user refers to the ordinary user who uses the monocular camera, and the video data of the two ordinary users are different, so the manner in which the server obtains the specified video data is also different, and the case 1 and the situation in the embodiment of the present invention are different. 2 to explain:
  • Case 1 If the ordinary user includes the first ordinary user, convert the two-way two-dimensional video data of the first ordinary user into the first three-dimensional video data, and use the first three-dimensional video data as the designated video data, or, if the ordinary user includes the first A normal user uses two-way two-dimensional video data of the first ordinary user as the designated video data.
  • the server in order to display the first ordinary user in the form of a three-dimensional character in the virtual environment, the server can obtain the specified video data in two ways:
  • two-way two-dimensional video data is converted into first three-dimensional video data. Since the two-way two-dimensional video data respectively correspond to the actual scene of the ordinary user captured from the two angles of view, one pixel of one of the two-dimensional video data is used as a reference, and the pixel corresponding to the pixel in the other two-dimensional video is determined. The two pixels correspond to the same position in the actual scene, thereby determining the disparity of the two pixels.
  • each pixel in the two-way two-dimensional video data can obtain a disparity map, and construct an actual scene according to the disparity map. 3D image data.
  • the second method directly uses two-way two-dimensional video data as the designated video data, and when the specified video data is sent to the VR device, also sends a designated display instruction, which is used to instruct the VR device to take two-way two-dimensional video.
  • the data is respectively rendered in the left and right eye screens, and two-dimensional video data of different viewing angles are respectively rendered in the left and right eye screens, so that parallax can be formed during display to achieve a three-dimensional display effect.
  • the manner of determining the user type of the ordinary user is not limited in the embodiment of the present invention. For example, if the server receives two-way two-dimensional video data of an ordinary user at the same time, it can be determined that the user type of the ordinary user is the first normal user, otherwise, the normal user can be determined to be the second ordinary user.
  • the server may synthesize the specified video data to the display position corresponding to the normal user.
  • the server may adjust the display size corresponding to the specified video data to the composite size according to the preset size of the preset setting, and the composite size may be determined by the ratio of the virtual environment to the real person. Each virtual environment can correspond to a composite size.
  • the specified video data since the specified video data is only one view (for the second ordinary user) or two views (for the first ordinary user), the specified video data only occupies two in the virtual environment. Dimensional space location. Moreover, the display position of each ordinary user is different. In order to provide a better display effect for the user, the server may add a border for the layer edge of the specified video data during the synthesis, so that the display effect of the specified video data is rendered in the virtual environment. On the "virtual screen". Of course, if the display positions of two or more specified video data are adjacent, the server may also add a border to the edge of the specified video data at the time of composition, so that two or more ordinary users can display In a "virtual screen". Referring to FIG.
  • an embodiment of the present invention provides a schematic diagram of a group video session scenario. If shown in FIG. 4(a), an ordinary user is displayed in a “virtual screen”, as shown in FIG. 4(b). As shown, two normal users are displayed in a “virtual screen”.
  • the server synthesizes the virtual user's three-dimensional virtual character and audio data to a display position corresponding to the virtual user.
  • the server may acquire the three-dimensional virtual character of the virtual user (the acquisition process is the same as step 203A), adjust the three-dimensional virtual character to the composite size, and synthesize the adjusted three-dimensional virtual character to the display position corresponding to the virtual user, and The synthesized three-dimensional image data is synthesized with the acquired audio data of the virtual user to obtain audio and video data of the virtual user.
  • the server uses the synthesized video data as the target video data of the user.
  • the server can finally obtain the target video data, which includes the virtual characters corresponding to each virtual user in the group video session and the video data of each ordinary user.
  • the server sends the target video data to the user equipment of the user, so that the user performs a group video session.
  • an embodiment of the present invention provides a schematic diagram of a display scenario.
  • the user who logs in to the server is the terminal user, and the user who logs in to the server as the VR device is the VR device user.
  • the embodiment of the present invention provides a flowchart for a virtual user to perform a group video session.
  • the virtual user may invite other users outside the group video session to enter the group video session, or remove a user from the group video session, or send a private chat request to other users, or accept other users' Private chat request.
  • the terminal When the terminal receives the target video data of the group video session, the terminal displays the target video data, so that the ordinary user in the group video session is displayed in the form of a two-dimensional character, and the virtual user in the group video session is in two dimensions.
  • the form of the virtual character is displayed.
  • the user type of the end user is a normal user. Therefore, the end user adopts a two-dimensional display mode when participating in a group video session.
  • the target video data can be rendered on the screen, thereby displaying the common areas in various areas on the screen.
  • the VR device When the VR device receives the target video data of the group video session, the VR device displays the target video data, so that the ordinary user in the group video session is displayed in the virtual environment as a two-dimensional character or a three-dimensional character.
  • the virtual user in the video session is displayed in the virtual environment as a three-dimensional virtual character.
  • the user type of the VR device user is a virtual user. Therefore, the VR device user adopts the virtual reality display mode when participating in the group video session.
  • the VR device when the VR device receives the target video data, it may be in the left and right eye screens of the VR device.
  • the target video data is rendered, so that the VR device can display the two-dimensional character or the three-dimensional character of the ordinary user on the display position corresponding to the common user, and display the three-dimensional virtual character of the virtual user on the display position corresponding to the virtual user.
  • the spoof prompt is displayed at the display position corresponding to the user.
  • the presentation form of the speech prompt is not limited to the text prompt, the arrow icon or the flashing icon of “speaking”.
  • the embodiment of the present invention does not limit the manner in which the user is detected to speak. For example, when the VR device detects the audio data of the user from the current target video data, it is determined that the user is speaking, and further determines the display position corresponding to the user, and displays the speaking prompt on the display position thereof.
  • the embodiment of the present invention determines the user type of each user in the group video session, and processes the video data of the group video session according to the user type, so that when the user type is a virtual user, the virtual reality display indicated by the virtual user can be obtained.
  • the target video data of the pattern matching when the user type is a normal user, can obtain the target video data that matches the two-dimensional display mode indicated by the ordinary user, thereby displaying the video data in a reasonable display mode for different types of users, so that different Group video users can perform group video sessions without restrictions, improving the flexibility of group video sessions.
  • the three-dimensional virtual character corresponding to the virtual user in the group video session is converted into a two-dimensional virtual character, and the two-dimensional virtual character is combined with the two-dimensional background and audio data to obtain the
  • the two-dimensional video data of the virtual user matches the two-dimensional video data of the virtual user with the two-dimensional display mode corresponding to the user, thereby providing the user with a specific manner of processing the video data of the virtual user in the group video session.
  • the display position of each user in the virtual environment can be determined, and the two-dimensional video data of the ordinary user and the three-dimensional virtual characters of the virtual user are respectively synthesized into corresponding displays.
  • the location is such that the synthesized video data matches the virtual reality display mode corresponding to the user, thereby providing the user with a specific way of processing the video data of the virtual user in the group video session.
  • the two-way two-dimensional video data of the first ordinary user is processed into the first three-dimensional video data, or two channels are directly
  • the dimension video data is acquired as the specified video data, and the VR device is notified of the display mode; the second ordinary video of the second ordinary user is used as the designated video data.
  • At least three specific methods for determining a virtual environment corresponding to the group video session are provided, which can support the user to select the virtual environment by itself, or select the virtuality that matches the number of users according to the number of users in the group video session.
  • the environment can also analyze the virtual environment that each user has selected, and select the virtual environment that has been selected the most frequently, so that the way to determine the virtual environment is more diverse.
  • At least five determination methods are provided to determine the display position of each user in the virtual environment: according to the intimacy between users, the user identity or the time when the user joins the group video session, the server intelligently for each The user selects the seat, or more user-friendlyly selects the display position by the user, or, in order to display the full picture of the ordinary user as much as possible, the display position of the ordinary user is opposite to the front view of the user.
  • FIG. 7 is a block diagram of a device for a group video session according to an embodiment of the present invention. Referring to FIG. 7, the device specifically includes:
  • the determining module 702 determines, for each user in the group video session, the user type of the user according to the device information of the user, where the user type includes a normal user and a virtual user, and the common user is used to indicate that the user uses the group video session.
  • the virtual user is used to indicate that the user adopts the virtual reality display mode when participating in the group video session;
  • the processing module 703 is configured to process video data of the group video session according to the video display mode indicated by the user type of the user, to obtain target video data of the user, where the video display mode of the target video data is indicated by the user type of the user. Video display pattern matching;
  • the sending module 704 is configured to send target video data to the user equipment of the user during the progress of the group video session, so that the user performs a group video session.
  • the embodiment of the present invention determines the user type of each user in the group video session, and processes the video data of the group video session according to the user type, so that when the user type is a virtual user, the virtual reality display indicated by the virtual user can be obtained.
  • the target video data of the pattern matching when the user type is a normal user, can obtain the target video data that matches the two-dimensional display mode indicated by the ordinary user, thereby displaying the video data in a reasonable display mode for different types of users, so that different Group video users can perform group video sessions without restrictions, improving the flexibility of group video sessions.
  • the processing module 703 is configured to: if the user type of the user is a normal user, convert the three-dimensional virtual character corresponding to the virtual user in the group video session into a two-dimensional virtual character; Combining the two-dimensional background selected by the user and the audio data corresponding to the virtual user to obtain the first two-dimensional video data; synthesizing the at least one first two-dimensional video data and the at least one second two-dimensional video data to obtain the target of the user
  • the video data, the second two-dimensional video data refers to two-dimensional video data of a common user in a group video session.
  • the processing module 703 is configured to determine a virtual environment corresponding to the group video session if the user type of the user is a virtual user, and determine each user in the group video session by using the virtual environment as a three-dimensional background.
  • the character and the audio data are synthesized to a display position corresponding to the virtual user; the synthesized video data is used as the target video data of the user.
  • the processing module 703 is further configured to: if the normal user includes the first common user, convert the two-way two-dimensional video data of the first ordinary user into the first three-dimensional video data, and the first three-dimensional video data As the designated video data, the first ordinary user refers to an ordinary user who uses a binocular camera, or, if the ordinary user includes the first ordinary user, the two-way two-dimensional video data of the first ordinary user is used as the designated video data; The second ordinary user is included, and the second ordinary video of the second ordinary user is used as the designated video data, and the second ordinary user is the ordinary user who uses the monocular camera.
  • the processing module 703 is configured to: determine a virtual environment corresponding to the user-triggered virtual environment option as a virtual environment corresponding to the user in the group video session; or
  • the processing module 703 is configured to determine, according to the number of users in the group video session, the capacity of the virtual environment corresponding to the group video session, and determine the virtual environment that meets the capacity as the virtual environment corresponding to the group video session; or
  • the processing module 703 is configured to: analyze the virtual environment selected by each user in the group video session, obtain the selected number of times of each virtual environment, and determine the virtual environment with the most selected times as the virtual environment corresponding to the group video session. .
  • the processing module 703 is configured to analyze the intimacy between the user and other users according to the social data between the user and other users in the group video session, and select the user according to the level of intimacy. One side starts to arrange the display positions of other users; or,
  • the processing module 703 is configured to: acquire the user identity of the other user, determine the opposite location of the user as the display location of the user with the highest user identity among the other users, and randomly determine the display position of the remaining users among the other users; or
  • the processing module 703 is configured to: arrange the display positions of other users from any side of the user according to the chronological order in which the other users join the group video session; or
  • the processing module 703 is configured to determine, according to a location selected by the user in the virtual environment, a location selected by the user as a display location of the user in the virtual environment; or
  • the processing module 703 is configured to: determine the opposite position of the user as the display position of the ordinary user, and randomly determine the display position of the remaining users among the other users.
  • FIG. 8 is a block diagram of a device for a group video session according to an embodiment of the present invention. Referring to FIG. 8, the device specifically includes:
  • the receiving module 801 is configured to receive, by the server, the target video data of the group video session, where the video display mode of the target video data matches the video display mode indicated by the user type of the terminal user, and the user type of the terminal user is an ordinary user, an ordinary user. Used to indicate that the end user adopts a two-dimensional display mode when participating in a group video session;
  • the display module 802 is configured to display the target video data, so that the ordinary users in the group video session are displayed in the form of two-dimensional characters, and the virtual users in the group video session are displayed in the form of two-dimensional virtual characters.
  • the embodiment of the present invention receives the target video data, because the target video data is processed by the server according to the user type, so that the target video data matches the two-dimensional display mode indicated by the ordinary user, thereby displaying the video for the terminal user in a reasonable display mode.
  • the data enables unrestricted group video sessions between different types of users, improving the flexibility of group video sessions.
  • FIG. 9 is a block diagram of a device for a group video session according to an embodiment of the present invention.
  • the device specifically includes:
  • the receiving module 901 is configured to receive, by the server, the target video data of the group video session, where the video display mode of the target video data matches the video display mode indicated by the user type of the VR device user, and the user type of the VR device user is a virtual user.
  • the virtual user is used to indicate that the VR device user adopts a virtual reality display mode when participating in the group video session;
  • the display module 902 is configured to display the target video data, so that the ordinary user in the group video session is displayed in the form of a two-dimensional character or a three-dimensional character in the virtual environment, and the virtual user in the group video session is virtualized in the virtual environment in three dimensions.
  • the form of the character is displayed.
  • the embodiment of the present invention receives the target video data, because the target video data is processed by the server according to the user type, so that the target video data matches the two-dimensional display mode indicated by the virtual user, thereby displaying a reasonable display mode for the VR device user.
  • the video data enables unrestricted group video sessions between different types of users, improving the flexibility of group video sessions.
  • the display module 902 is configured to: display a two-dimensional character or a three-dimensional character of a common user on a display position corresponding to a common user; and display a three-dimensional virtual character of the virtual user on a display position corresponding to the virtual user .
  • the display module 902 is further configured to display, according to the target video data, a speaking prompt on the display position corresponding to the user if any user in the group video session is detected to be speaking.
  • the device of the group video session provided by the foregoing embodiment is only illustrated by the division of the foregoing functional modules. In actual applications, the functions may be assigned differently according to requirements.
  • the function module is completed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the apparatus for the group video session provided by the foregoing embodiment is the same as the method embodiment of the group video session. For details, refer to the method embodiment, and details are not described herein again.
  • each virtual user's actual image has its own characteristics, and the virtual characters provided on the VR device are limited, which may be quite different from the virtual user's real image, resulting in the virtual character expressing the virtual user.
  • the effect is poor, and the visual effect of the group video session is poor.
  • the embodiment also provides a session method that more closely matches the user's actual image and actual action, in order to compete for the visual effect of the group video session, the process may
  • the video data of the group video session is processed to obtain the target video data of the user, and may be performed when the video data of the user is generated on the VR device or the video data is synthesized. This example does not limit this.
  • FIG. 10 is a flowchart of a method for a group video session according to an embodiment of the present invention.
  • the method may be applied to a server or a VR device, and the server is used as an execution entity.
  • the method specifically includes:
  • the server acquires a virtual character of the first user in the group video session.
  • a group video session refers to a video session made by multiple (two or more) users based on a server.
  • the multiple users may be multiple users on the social platform corresponding to the server, and the multiple users may be a group relationship or a friend relationship.
  • the user in the group video session may be a virtual user using a VR device, or a traditional user using a traditional terminal (eg, a desktop computer, a mobile phone).
  • the first user can be any of the group video sessions.
  • the avatar of the first user is obtained according to at least the head feature data of the first user and the limb model corresponding to the first user.
  • the embodiment of the present invention does not limit the timing of acquiring a virtual character. For example, when a server creates a group video session for a plurality of users, a virtual character of each of the plurality of users is acquired. For another example, in a process of a group video session, the first user accepts an invitation of a user in the group video session, so that the server determines that the first user joins the group video session, and acquires the virtuality of the first user. character.
  • the server may create a virtual character for the first user in real time according to the first user's head feature data and the corresponding limb model, thereby acquiring the virtual character.
  • the avatar database configured by the server may also pre-store the virtual character of the first user. Therefore, the server may also query, in the avatar database, whether there is a virtual character corresponding to the user identifier according to the user identifier of the first user, if Yes, the virtual character of the first user can be directly obtained, and if not, the virtual character can be created for the first user in real time.
  • the pre-stored virtual characters in the avatar database are also created by the server, that is, the process of acquiring the avatar includes the creation process. The process of obtaining the virtual character based on the creation process may be performed by the following steps 1001A-1001D:
  • the server acquires head feature data of the first user.
  • the head feature data is used to describe an actual head image of the first user, and may be used to indicate at least one of a hair area, a hair color, a face area, a face color, a facial feature, and a facial feature of the first user. item.
  • the facial features form at least the facial features and facial features.
  • the manner in which the head feature data is obtained is not limited in the embodiment of the present invention. E.g:
  • the server acquires the head image data of the first user, analyzes the tone distribution of the head image data, and obtains the head feature data.
  • the source of the head image data may be various, such as head image data (big head shot) in the cloud album of the first user, or head image data currently captured by the camera of the first user.
  • the server can also acquire a plurality of first user's head images to more comprehensively analyze the head image data.
  • the server can also provide a shooting prompt for prompting the user to shoot at different shooting angles, so that the server can acquire the head image data of different shooting angles, so that the obtained head model and the first one are obtained. The actual image of the user is more closely matched.
  • the server can obtain the head feature data based on the above characteristics:
  • the server may compare the color value of the pixel in the head image data with the configured plurality of skin colors, if the color values of the consecutive pixels exceeding the first ratio are both A skin color matching, the skin color can be determined as a face color tone, and an image region formed by matching consecutive pixel points is determined as a face region.
  • the server may determine successive pixel points adjacent to the face area as the hair area and extract the color values of the consecutive pixel points as the hair tone.
  • the server can determine the hollow regions in the determined face regions as the mouth, eyes, and eyebrow positions, respectively. Among them, the position of the eyebrows is at the top, followed by the eyes, and the mouth is at the bottom. Moreover, since the ear protrudes outward from the face, the server can determine the edge pixel points on both sides of the face region, analyze the tangent slope of the edge pixel point, and if the rate of change of the tangent slope from the pixel point A to the pixel point B If the preset change rate is satisfied, the area where the pixel point A to the pixel point B is located can be determined as the ear position.
  • the server can analyze successive pixels in the face region whose brightness is higher than the first brightness, and The contiguous pixel points located on both sides of the consecutive pixel points and the continuous pixel points below are darker than the second brightness, and the area where the three consecutive pixels are located is determined as the nose position.
  • the server can determine the facial features according to the shape of the edge pixels where the facial features are located, and determine the color of the pixel points where the facial features are located as the facial features, thereby obtaining the facial features.
  • the server can record the ratio of the brightness of the pixel position higher than the first brightness and the pixel point lower than the second brightness in the nose position, the higher the brightness ratio, indicating the first user's The more the nose is three-dimensional.
  • the manner of obtaining the head feature data is only exemplary.
  • the embodiment of the present invention may also acquire the head feature data in any manner, for example, a face template based recognition algorithm or a neural network for identifying. algorithm.
  • the server may further modify the determined head feature data, for example, determining the facial features ratio according to the facial features in the head feature data, and comparing the facial features ratio with the configured normal facial features ratio, if not The normal five senses, the server can adapt the position of a part of the five senses, so that the proportion of five senses is in line with the normal five sense.
  • the normal five-member ratio is used to indicate the range of the normal facial features, so in the comparison process, the proportion of the five senses is in line with the normal five-guane ratio.
  • the server may also analyze necessary header feature data, and the necessary header feature data is used to briefly describe the actual head image of the first user, for example, the necessary header feature data may be Used to indicate facial tones, facial features, and facial features.
  • the server generates a head model that matches the head feature data according to the head feature data.
  • the step may be specifically: according to the face region and a hair region, the head contour model is determined, the head contour model includes a facial contour model and a hair contour model; the facial contour model and the hair contour model are filled according to the facial color tone and the hair color tone; and a facial feature model matching the facial features is obtained; According to the facial features, the facial features are synthesized into the facial contour model to generate a head model that matches the head feature data.
  • the server may determine the face contour (hair contour) according to the shape formed by the edge pixels of the face region (hair region), thereby generating a facial contour model ( The hair contour model) determines the head contour model. Further, the server can fill the facial contour model with milky white to obtain a facial model, and fill the hair contour model with brown to obtain a hair model. Further, the server can compare the facial form, the mouth shape, and the facial features with the cartoonized facial features in the facial features database, and obtain the facial features model with the highest similarity to the facial features and facial features, and obtain the facial features according to the facial features. The facial features are respectively synthesized onto the filled facial contour model, and the three-dimensional head model is constructed according to the curvature of the facial contour model and the hair contour model, so that the generated head model matches the actual head image of the first user.
  • the server can also generate a cartoonized facial features based on the facial features, for example, filling the mouth contour with the mouth color in the mouth shape, and deepening the pixel points on both ends of the mouth contour to generate the mouth.
  • the model, and the mouth model has a "two-lobed" effect.
  • the eye tone in the eye form includes at least two types, that is, the eye color tone and the eye white tone, and the eye white tone is generally a white color tone. Therefore, the server can fill the eye contour with the white color in the eye color tone.
  • Another hue in the eye tones fills the spherical contour in the contour of the eye, which is tangent to the contour of the eye.
  • the server may further process the head model. For example, the server adds texture to the hair model and obtains the age data of the first user, and adds a texture matching the age of the first user on the face model. For another example, the server acquires the gender data of the first user. If the first user is a female, the length of the eyelash on the eye model can be extended to enhance the brightness of the mouth model. For another example, the server acquires the occupation data of the first user, and if the first user is a student, the glasses model may be added to the facial model.
  • the server determines, according to the user attribute of the first user, a limb model corresponding to the first user.
  • the user attribute is not limited to the user's gender, age, and occupation.
  • the user fills in the user attribute when registering the account on the social platform, so that the server can obtain the user attribute and store the user attribute corresponding to the user identifier.
  • the server can obtain the user according to the user identifier of the first user.
  • the corresponding user attribute is identified, and then the limb model matching the user attribute is selected from the limb model database according to the user attribute.
  • the server also provides a dress model.
  • the limb model may include dressing, or the server may separately provide a dress model, which may be stored in the limb model database or in the dress model database configured by the server. If the server provides the dress model separately, the dress model and the corresponding dressing options can be provided to the first user so that the first user can select the corresponding dress model through the dressing option.
  • the server may also acquire the image data of the first user, determine the clothing worn by the first user in the image data, match the dress model corresponding to the clothing worn by the first user, and provide the dress model to the first user. .
  • the server may determine the dress model of the first user according to the user attribute, and the specific process is similar to the process of determining the limb model described below.
  • the server can also determine the limb model using at least three user attributes:
  • the limb model database can provide a variety of male or female-specific limb models for male and female body characteristics.
  • Each limb model corresponds to a gender tag, making the server
  • a limb model matching the gender data of the first user may be determined according to the gender tag, and the body model of the male tag may be a pant, and the body model of the female tag may be a dress.
  • the limb model database can provide a plurality of clothing style limb models for the age group to which the user belongs, and each limb model corresponds to an age group label.
  • the age group label corresponding to the body model of the cartoon character is 18 Below the age, the server can determine the limb model that matches the age data of the first user based on the age group label.
  • each limb model corresponds to a professional label, for example, a suit limb model.
  • the corresponding occupational label is a white-collar
  • the occupational label corresponding to the school uniform limb model is a student, so that the server can determine the limb model that matches the occupational data of the first user according to the occupational label.
  • each limb model may correspond to at least two types of labels at the same time, or one label corresponding to each limb model has two meanings at the same time, for example, the label is a female teacher label.
  • the server may be configured to determine the limb model corresponding to the first user according to at least two user attributes.
  • the server may search for a limb model with a gender label as a female and a professional label as a doctor, or find a label as a female from the limb model database.
  • the physical model of the doctor can determine the found limb model as the limb model corresponding to the first user.
  • the group type corresponding to the group video session, the virtual environment in the group video session, and the current actual temperature may also be referred to.
  • the group type refers to a group type of a group to which multiple users belong in the group video session.
  • the server determines the group type of the group to which the plurality of users belong in the group video session, and determines the limb model matching the group type as the limb model with the first user.
  • each limb model corresponds to a group type label
  • the suit limb model can correspond to a company group label. Therefore, when the group type is a company group, the server can find the suit body model corresponding to the company group label.
  • the suit limb model is determined as the limb model of the first user.
  • the server determines the virtual environment type corresponding to the group video session, and determines the limb model that matches the virtual environment type as the limb model of the first user. For example, if the type of the virtual environment is a beach, the server may determine the beachwear limb model as the limb model corresponding to the first user.
  • the server obtains the current actual temperature, and determines the limb model that matches the current actual temperature as the limb model of the first user. For example, if the current actual temperature is 35 degrees, the server may determine the summer limb model as the limb model corresponding to the first user.
  • the server when the server determines the first user's limb model, it can also provide adjustment options for the first user.
  • the manner of adjusting the option and providing the adjustment option is not specifically limited in the embodiment of the present invention.
  • the initial limb model and adjustment options are provided to the first user.
  • the adjustment options include a height adjustment option, a body adjustment option, and a dress adjustment option, and the first user can adjust the height by triggering.
  • the option adjusts the height of the height, triggers the body shape adjustment option to adjust the size of the body, and triggers the dress adjustment option to change the dress.
  • step 1001C is an optional step of the embodiment of the present invention.
  • the head model is sufficient to represent the actual image of the first user, in order to simplify the process and reduce the computing resources of the server, A user's gender data randomly selects a limb model that matches the gender data from the limb model database.
  • the server can also determine the limb model first, or the server can determine both the head model and the limb model.
  • the server synthesizes the head model and the limb model to obtain the virtual character of the first user.
  • the server acquires the image data of the user's head, performs face and hair technical processing, obtains the face and facial features, generates a head model according to the facial features model data and the limb model database, and determines the limb model.
  • the head model is synthesized on the upper part of the limb model to obtain a complete virtual character.
  • an embodiment of the present invention provides a flowchart for acquiring a virtual character.
  • the server can also combine the ratio of the head model and the limb model in the synthesis. For example, the server determines the combined size of the head model and the limb model according to the first user's height data and the configured normal person's head body ratio data, and adjusts the head model and the limb model to the determined composite size, and then performs The process of synthesizing virtual characters makes the obtained virtual characters more in line with the actual image of the first user. In fact, in order to make the virtual characters more attractive, the server can also synthesize the "Q version" of the virtual characters. The "Q version" of the virtual characters refers to the virtual characters whose head ratio does not match the normal person's head body proportion.
  • the head body ratio data can be more exaggerated, for example, the head body ratio data is 1:1.
  • the server can determine the combined size of the head model and the limb model according to the configured head-to-body ratio data of the "Q version", and adjust the head model and the limb model to the determined composite size, and then synthesize, thereby obtaining "Q. Version of the avatar.
  • the server acquires video data of the first user based on the virtual character of the first user and the behavior characteristic data of the first user in the process of the group video session.
  • the behavior feature data is used to indicate the actual action of the first user, and at least includes any one of the expression feature data, the mouth shape feature data, the head orientation feature data, and the eye direction feature data.
  • the server acquires a static virtual character.
  • the server acquires the video data of the first user, and the action of the virtual character of the first user in the video data is The actual action of the first user matches.
  • the manner in which the video data is obtained is not limited in the embodiment of the present invention. For example, based on the foregoing at least four behavior characteristic data, the embodiment of the present invention provides at least four ways of acquiring video data:
  • the server detects that the expression data of the first user is the specified expression feature data
  • the body feature data corresponding to the specified expression feature data is acquired; and the specified expression feature data is mapped in real time. Go to the head model of the virtual character of the first user, and map the limb feature data to the limb model of the virtual character of the first user in real time to obtain the video data of the first user.
  • the server may jointly map the specified expression feature data and the limb feature data to the virtual character.
  • the server can acquire image data captured by the camera of the first user in real time, mark and track the pixel points of the face region and the facial features in the image data, or the key pixels of the face region and the facial features, thereby capturing
  • key pixels are used to fundamentally describe the facial features and facial features.
  • the server may compare the pixel point distribution of the expression feature data with the pixel point distribution of the specified expression feature data, where the specified expression feature data refers to the expression feature data that the server has configured, and each specified expression feature data corresponds to a limb feature. Data, if the similarity between the two reaches a preset threshold, the expression feature data is detected as the specified expression feature data.
  • the server can establish three-dimensional coordinates for the mouth model, and adjust the pixel point distribution of the mouth model according to the pixel point distribution indicated by the mouth large feature data on the three-dimensional coordinates, thereby
  • the mouth large feature data is mapped to the mouth model in the head model; similarly, the server can also adjust the pixel point distribution of the arm model according to the pixel point distribution indicated by the hand grin feature data, thereby licking the hand
  • the feature data is mapped to the arm model in the limb model, so that the virtual character is dynamicized, thereby obtaining the video data of the first user.
  • the corresponding data corresponding to the crying expression feature data may also be acquired.
  • Hand blinking feature data mapping the crying facial expression feature data to the eye model in the head model, and adjusting the pixel point distribution of the arm model according to the pixel point distribution indicated by the hand blinking feature data, thereby blinking the hand
  • the feature data is mapped to the arm model in the limb model.
  • the server may gradually adjust the pixel point distribution corresponding to the mouth model and the arm model in the continuous multi-frame video data, thereby obtaining a multi-frame capable of reflecting the change of the virtual character motion.
  • Video data may gradually adjust the pixel point distribution corresponding to the mouth model and the arm model in the continuous multi-frame video data, thereby obtaining a multi-frame capable of reflecting the change of the virtual character motion.
  • the obtaining manner acquires the limb feature data matching the specified emoticon feature data when the emoticon feature data of the actual character image of the user is detected and matches the configured emoticon feature data, and assigns the specified emoticon feature to the virtual character of the user.
  • the physical features, thereby obtaining video data because the user does not easily express his or her emotions directly through the body motion when wearing the VR device, the acquisition process not only enables the virtual character to simulate the actual expression of the user, but also predicts the user's emotion through the expression feature. And the user's emotion is highlighted by the physical features, so that the user's character image is simulated by the combination of expression and body movement, so that the virtual character's expressiveness and authenticity are stronger.
  • the server maps the mouth type feature data of the first user to the head model of the virtual character of the first user in real time to obtain the video data of the first user.
  • the server In order to synchronize the video data of the first user with the mouth action when the first user speaks, when the server receives the audio data of the first user, the configured mouth type feature data is acquired, and the mouth type feature data is used to indicate the mouth part. Continuously in the open and close state, the server can map the mouth type feature data to the mouth model in the head model in real time, and synthesize the audio data with the mapped virtual character to obtain the video data of the first user. Until the process of receiving the audio data ends, the server unmaps the mouth model and restores the mouth model to a default state, which refers to the state in which the mouth model remains closed.
  • the server acquires the head orientation data of the first user collected by the sensor of the first user; and maps the head orientation feature data of the first user to the first user in real time.
  • the head model of the avatar gets the video data of the first user.
  • the server can acquire the head orientation data collected by the first user's sensor (eg, the nine-axis sensor on the VR device) in real time, and the head orientation data is at least used. Instructing the pitch angle or the left and right rotation angles of the first user, and further, the server may rotate the head model relative to the limb model of the avatar according to the pitch angle or the left and right rotation angle indicated by the head orientation data, thereby positioning the head Features This data is mapped to the head model in real time.
  • the first user's sensor eg, the nine-axis sensor on the VR device
  • the server may also combine the image data captured by the camera of the first user.
  • an embodiment of the present invention provides a flowchart for acquiring head orientation data.
  • the server may acquire image data captured by the camera, and according to the pixel point change of the face region in the image data, when the pixel points of the face region are collectively shifted to one side, the head is determined to be in a deflected state, and the offset direction is The opposite direction is determined as the head deflection direction (for the case of self-photographing), and the deflection angle is determined based on the offset of the pixel point, thereby obtaining head orientation characteristic data.
  • the server may determine the data error between the two head orientation feature data, and if the data error is greater than the tolerance error, the process of acquiring the head orientation feature data may be re-executed. If the data error is smaller than the tolerance error, the header feature data may be obtained by means of data fusion, for example, taking the average value of the head feature data as the correct head feature data.
  • the acquisition mode 4 the behavior feature data includes the eye orientation feature data
  • the server acquires the eye image data of the first user captured by the camera of the first user, and acquires the eye orientation feature of the first user according to the eye image data of the first user.
  • Data mapping the first user's eye direction feature data to the head model of the first user's virtual character in real time to obtain video data of the first user.
  • the server may also obtain the eye direction feature data, which is used to indicate the position of the eyeball of the first user relative to the eye, and may be used to indicate the first user.
  • the eyes stare at the direction.
  • the server can lock the eyeball region in the eye image data, and track the position of the eyeball region relative to the eye in real time, thereby acquiring the eye direction feature data. Further, the server may adjust the eyeball position in the eye model according to the eye orientation characteristic data, and generate video data, thereby mapping the eye direction feature data to the eye model in the head model.
  • the acquiring method acquires the user's eye direction feature data by the captured eye image data, thereby mapping the user's eye direction feature data to the head model of the first user's virtual character in real time. Not only makes the virtual character express the user's real character more delicately, but also makes the virtual character more closely match the user's real character image, and can enhance each user's group video session based on the expression of each user's eye details. Eye contact to improve the efficiency of group video sessions.
  • the video data obtained in step 1002 can be used as the initial video data of the first user.
  • the server may further process the initial video data.
  • FIG. 13 an embodiment of the present invention provides a flowchart for acquiring video data, where the server acquires perspective data of the second user, and processes the initial video data according to the perspective indicated by the perspective data of the second user. The video data of the first user that matches the view.
  • the server may obtain the view data corresponding to the head orientation feature data of the second user according to the head orientation feature data collected by the sensor of the second user.
  • the server may determine, according to the head orientation data, that the orientation of the rotated head model is the perspective of the second user, thereby acquiring the perspective data of the second user.
  • the server acquires the eye orientation characteristic data of the second user according to the eye image data captured by the camera of the second user, and obtains the perspective data of the second user according to the eye orientation characteristic data of the second user.
  • the server may determine the perspective of the second user based on the direction of the eyeball indicated by the eye orientation characteristic data, and the direction of the eyeball position of the head model is obtained, thereby acquiring the perspective data.
  • the server may determine, according to the view data of the second user, a field of view of the view angle indicated by the view data in the initial video data, thereby extracting video data in the view range as the video data of the first user.
  • an embodiment of the present invention provides a flowchart of a group video session, in which a server can acquire a virtual character and track the face and facial features of the first user in real time, thereby obtaining real-time. Video data, and send the video data to the terminal where the second user is located in real time.
  • the server sends the video data of the first user to the terminal where the second user participating in the group video session is located to implement the group video session.
  • the server may obtain the video data of the user according to steps 1001 and 1002. Therefore, in order to synchronously display the virtual characters of each user, the server may synthesize the group video session.
  • the video data of each user sends the synthesized video data to the terminal where the second user is located.
  • the terminal where the second user is located receives the video data, the video data can be displayed in real time, and the video data matches the perspective of the second user, thereby implementing a group video session.
  • an embodiment of the present invention provides a flowchart for displaying video data.
  • the server processes initial video data according to the view data of the second user by acquiring initial video data, and sends the processed video data to the second user.
  • the terminal enables the terminal where the second user is located to display the video data in real time according to the perspective of the second user.
  • the video data may be sent to the server, and the video data is sent by the server to the terminal where the second user is located.
  • the embodiment of the present invention obtains the virtual character of the first user in the group video session, and the virtual character is obtained according to the head feature data of the first user and the corresponding limb model, so that the virtual character can match the actual content of the first user. And obtaining video data of the first user based on the avatar and behavior characteristic data, so that the action of the avatar of the first user can simulate the actual action of the first user in real time, thereby more intelligently expressing the first user.
  • the actual image enhances the visual effect of the group video session.
  • a specific manner of acquiring a virtual character is provided, and a head model matching the head feature data is generated according to the head feature data, and the limb model corresponding to the first user is determined according to the user attribute of the first user, and the The head model and the limb model obtain virtual characters, which refine the acquisition process of each part of the virtual character, so that the virtual character has more detailed features, thereby more detailed expression of the actual image of the first user.
  • the limb model is derived from the user attributes, making the virtual character closer to the user's actual image.
  • the first user's head feature data is determined by analyzing the tone distribution of the first user's head image, and the head feature data can be used to indicate the first user's hair.
  • the area, hair tone, face area, facial tone, facial features and facial features form a plurality of features of the first user's actual head image, which can describe the actual head image of the first user in a more detailed and comprehensive manner. .
  • a specific process of generating a head model matching the head feature data is provided, and the face contour model and the hair contour model are determined according to the face region and the hair region, and are filled according to the face hue and the hair hue, and according to the facial features position.
  • the facial features model matching the facial features is synthesized into the facial contour model, and the process of generating the head model is refined, and the generating process of each part in the head model is matched with the actual head image of the first user. Thereby, the degree of matching between the virtual character and the actual image of the first user is improved.
  • At least three ways of determining a limb model of the first user are provided, and a limb model matching the user attribute of the first user is determined according to a user attribute such as gender, age or occupation of the first user, and the three determinations are The methods can also be combined to not only make the limb model more in line with the actual image of the first user, but also to make the way of determining the limb model more diverse.
  • the specific manner of acquiring the video data of the first user when the behavior feature data includes the expression feature data is specifically described.
  • the expression feature data is detected as the specified expression feature data
  • the limb corresponding to the specified expression feature data may be acquired.
  • the feature data thereby mapping the specified expression feature data to the face, and mapping the limb feature data to the limb model, so that the expression form of the first user's virtual character is more vivid.
  • the specific manner of acquiring the video data of the first user when the behavior feature data includes the mouth feature data, the head orientation feature data, and the eye orientation feature data is specifically described, so that the virtual character can not express the first user more vividly.
  • the actual image, and the way to get the first video data is more diverse.
  • a manner of processing the initial video data according to the perspective indicated by the perspective data of the second user is provided, thereby obtaining video data of the first user that matches the perspective of the second user, so that the first user is displayed for the second user.
  • the perspective of the virtual character is more in line with the actual visual effect.
  • the view data of the second user is obtained according to the head orientation feature data collected by the sensor of the second user or according to the eye image data captured by the camera of the second user. Not only can the perspective of the second user be acquired in real time, but also the manner in which the perspective data is acquired is diversified.
  • FIG. 16 is a block diagram of a device for a group video session according to an embodiment of the present invention. Referring to FIG. 16, the device specifically includes:
  • the virtual character acquisition module 1601 is configured to acquire a virtual character of the first user in the group video session, where the virtual character of the first user is obtained according to at least the head feature data of the first user and the limb model corresponding to the first user;
  • the video data obtaining module 1602 is configured to acquire, according to the virtual character of the first user and the behavior characteristic data of the first user, the video data of the first user in the process of the group video session, and the virtual character of the first user in the video data The action matches the actual action of the first user;
  • the sending module 1603 is configured to send the video data of the first user to the terminal where the second user participating in the group video session is located to implement the group video session.
  • the embodiment of the present invention obtains the virtual character of the first user in the group video session, and the virtual character is obtained according to the head feature data of the first user and the corresponding limb model, so that the virtual character can match the actual content of the first user. And obtaining video data of the first user based on the avatar and behavior characteristic data, so that the action of the avatar of the first user can simulate the actual action of the first user in real time, thereby more intelligently expressing the first user.
  • the actual image enhances the visual effect of the group video session.
  • the avatar acquiring module 1601 is configured to: acquire head feature data of the first user; generate a head model that matches the head feature data according to the head feature data; and determine, according to the user attribute of the first user A limb model corresponding to a user; synthesizing the head model and the limb model to obtain a virtual character of the first user.
  • the avatar acquiring module 1601 is configured to: acquire head image data of the first user; analyze the tone distribution of the head image data to obtain head feature data, where the head feature data is used to indicate the first user Hair area, hair tone, face area, face tones, facial features, and facial features.
  • the avatar acquisition module 1601 is configured to: determine a head contour model according to the face region and the hair region, the head contour model includes a facial contour model and a hair contour model; and fill the face according to the facial color and hair color
  • the contour model and the hair contour model; the facial features model matching the facial features are acquired; according to the facial features, the facial features are synthesized into the facial contour model to generate a head model that matches the head feature data.
  • the avatar acquiring module 1601 is configured to: determine, according to the gender data of the first user, a limb model that matches the gender data of the first user; and/or, the avatar acquiring module 1601 is configured to: according to the first user The age data determines a limb model that matches the age data of the first user; and/or the avatar acquisition module 1601 is configured to: determine a limb model that matches the occupation data of the first user based on the occupation data of the first user.
  • the behavior feature data includes the expression feature data
  • the video data acquisition module 1602 is configured to: when detecting the expression feature data of the first user as the specified expression feature data, acquire the limb feature data corresponding to the specified expression feature data;
  • the specified expression feature data is mapped to the head model of the virtual character of the first user in real time, and the limb feature data is mapped to the limb model of the virtual character of the first user in real time, and the video data of the first user is obtained.
  • the behavior characteristic data includes the mouth type feature data
  • the video data obtaining module 1602 is configured to: map the mouth type feature data of the first user to the head model of the virtual character of the first user in real time, to obtain the video of the first user. data.
  • the behavior feature data includes the head orientation feature data
  • the video data acquisition module 1602 is configured to: acquire the head orientation data of the first user collected by the sensor of the first user; and set the head orientation feature data of the first user.
  • the head model of the virtual character of the first user is mapped in real time to obtain video data of the first user.
  • the behavior feature data includes the eye orientation feature data
  • the video data acquisition module 1602 is configured to: acquire the eye image data of the first user captured by the camera of the first user; and obtain the image according to the eye image data of the first user.
  • the first user's eye direction feature data; the first user's eye direction feature data is mapped to the head model of the first user's virtual character in real time, and the first user's video data is obtained.
  • the video data obtaining module 1602 is configured to: acquire initial video data of the first user based on the virtual character of the first user and behavior characteristic data of the first user; acquire perspective data of the second user; The initial video data is processed by the perspective indicated by the perspective data to obtain video data of the first user that matches the viewing angle.
  • the video data acquiring module 1602 is configured to: obtain, according to the head orientation feature data collected by the sensor of the second user, the view data corresponding to the head orientation feature data of the second user; or, the video data acquiring module 1602 Obtaining the eye user direction feature data of the second user according to the eye image data captured by the camera of the second user, and obtaining the angle of view data of the second user according to the eye user direction feature data of the second user.
  • FIG. 17 is a flowchart of a method for group video session according to an embodiment of the present invention. Referring to FIG. 17, the method is applied to a server, and specifically includes:
  • the server acquires a three-dimensional interaction model of the object to be displayed.
  • the group video session refers to a video session made by multiple (two or more) users based on the server.
  • the multiple users may be multiple users on the social platform corresponding to the server, and the multiple users may be a group relationship or a friend relationship.
  • a target is a physical object that a user wants to display in a group video session.
  • the three-dimensional interaction model refers to a three-dimensional model generated according to an object for displaying in video data of a plurality of users based on the control of any user in the group video session.
  • FIG. 18 is a schematic diagram of a three-dimensional interaction model provided by an embodiment of the present invention. Referring to FIG. 18, the three-dimensional interaction model may be a three-dimensional geometric model, a three-dimensional automobile model, and a three-dimensional chart model.
  • the server can acquire the three-dimensional interaction model in various ways.
  • the server can acquire a three-dimensional object model uploaded by the fifth user.
  • the three-dimensional interaction model may be a model obtained by a fifth user through CAD (Computer Aided Design), such as a three-dimensional automobile model.
  • CAD Computer Aided Design
  • the server acquires a two-dimensional form uploaded by the sixth user, and processes the two-dimensional form to obtain a three-dimensional form model.
  • the server can directly generate a three-dimensional table model corresponding to the two-dimensional table through the EXCEL table.
  • the server can also build a three-dimensional coordinate model (x, y, z).
  • the server can use different plane areas on the (x, y) plane to represent different "class" parameter values, and each "class” The value of the "number of people" parameter corresponding to the parameter value is determined as the z coordinate corresponding to the value of the "class” parameter, thereby generating a three-dimensional table model in the form of a histogram.
  • the server can also generate other forms of three-dimensional table models, such as pie charts and bar charts.
  • the server can also set the hue of the three-dimensional form model, for example, different parameters correspond to different hue.
  • the server can perform three-dimensional modeling on the target based on at least one two-dimensional image data corresponding to the target object uploaded by the user, for example, using a SFS (Shape From Shading) algorithm to obtain a three-dimensional interaction model.
  • SFS Shape From Shading
  • the fifth user or the sixth user may be any user in the group video session. Further, the fifth user or the sixth user may also be a user with upload permission.
  • the embodiment of the present invention does not limit the user who has the upload permission.
  • the user with upload permission is the originator of the group video session, or a VIP (Very Important People, VIP) user.
  • the server processes the three-dimensional interaction model of the target according to the perspective of each of the multiple users in the group video session, and obtains video data of the user, where the video data of the user includes a three-dimensional interaction model of the target object. Model data obtained from the perspective transformation.
  • the server may obtain the view data of each user in the group video session, determine the view angle of the user according to the view data of the user and the display position of the virtual person of the user, and then the server may extract the view corresponding to the view.
  • the image data of the three-dimensional interaction model is combined with the extracted image data and the session environment data, and the combined image data is stereo-encoded to obtain video data of one frame and one frame of the user.
  • the method for stereo coding is not limited in the embodiment of the present invention.
  • the server encodes the synthesized image data into video data of two fields, and the two fields, that is, the even field formed by the single field and the even line formed by the singular line, make
  • the VR device receives the video data
  • the video data of the two fields can be alternately displayed on the left and right eye screens, thereby causing the user to generate parallax for both eyes to achieve a three-dimensional display effect.
  • the session environment data is not limited to the virtual environment corresponding to the group video session, the virtual characters corresponding to the plurality of users, the audio data of each user, and the like.
  • the server may obtain perspective data corresponding to the head orientation feature data of the second user according to the head orientation feature data collected by the user's sensor.
  • the server acquires the user's eye direction feature data according to the eye image data captured by the user's camera, and determines the user's angle of view data according to the eyeball position indicated by the eye direction feature data.
  • the server can determine the display position of the three-dimensional interaction model in different ways before obtaining the video data.
  • a default display location is configured on the server, and the default display location may be the opposite location of a virtual character corresponding to multiple users.
  • the server determines the position of the user who uploads the three-dimensional interaction model as the display position, so that the user can demonstrate the three-dimensional interaction model.
  • the server when the server receives the operation instruction for the three-dimensional interaction model, the three-dimensional interaction model may be operated according to the operation mode corresponding to the operation instruction.
  • the adjustment is performed, and the step of processing and transmitting according to the perspective of each of the plurality of users in the group video session is performed based on the adjusted three-dimensional interaction model.
  • the operation instruction is used to indicate that the three-dimensional interaction model is adjusted according to a corresponding operation mode.
  • the manner of obtaining the operation instruction is not limited in the embodiment of the present invention.
  • the server can take at least two of the following acquisition methods:
  • the server acquires the gesture feature data of the first user. When the gesture feature data matches any operation mode of the three-dimensional interaction model, it is determined that the operation instruction corresponding to the operation mode is received.
  • the gesture feature data is used to represent the gesture of the first user, and the manner of acquiring the gesture feature data may be various, such as a camera or a gesture sensor.
  • the server may acquire the gesture feature data collected by the gesture sensor, and determine the gesture of the first user according to the gesture feature data, when the gesture and the preset gesture (eg, pointing to the left)
  • the operation mode corresponding to the preset gesture is determined to determine the operation mode of the gesture matching, and the operation instruction corresponding to the operation mode is generated and acquired.
  • the specific operation mode is not limited in the embodiment of the present invention. For example, referring to Table 4, the embodiment of the present invention provides a correspondence between a preset gesture and an operation mode:
  • the acquisition mode 2 the server obtains the operation information of the external device of the second user, and when the operation information matches any operation mode of the three-dimensional interaction model, determines that the operation instruction corresponding to the operation mode is received, and the external device is tied to the terminal of the second user. set.
  • the external device can be a mouse or a keyboard.
  • the server obtains the operation information of the second user to the external device, it may determine whether there is an operation mode corresponding to the operation information, and if yes, generate and acquire an operation instruction corresponding to the operation mode.
  • Table 5 the embodiment of the present invention provides a correspondence between a preset gesture and an operation mode:
  • the first user and the second user may be any user in the group video session, or may be a user who has the operation authority for the three-dimensional interaction model, which is not limited by the embodiment of the present invention.
  • the user may be prompted to operate the three-dimensional interaction model and how to perform the operation.
  • the embodiment of the present invention does not limit the timing of the prompt. For example, when determining that the user needs to operate the three-dimensional interaction model, promptly: when the server detects that the gaze duration of the third user to the three-dimensional interaction model is greater than the preset duration, the operation prompt information is sent to the terminal where the seventh user is located. The operation prompt information is used to prompt the seventh user to operate the three-dimensional interaction model.
  • the description of the seventh user is the same as the description of the first user.
  • the server can monitor the direction of the gaze of the seventh user in real time. Once the third user's eye gaze direction is detected to be aligned with the three-dimensional interaction model, the time is counted, and the duration of the time (ie, the gaze duration) is greater than the preset. When the duration is long, it indicates that the seventh user is likely to have the requirement to operate the three-dimensional interaction model, so the operation prompt information is sent to the terminal where the seventh user is located.
  • the specific content included in the operation prompt information is not limited in the embodiment of the present invention.
  • the operation of the mouse is supported by the server.
  • the operation prompt information may include a text prompt message that “the car model can be operated by the mouse” and a specific method of operating through the mouse, for example, “click the left mouse button to zoom in on the car model. And “Click the right mouse button to zoom out the car model.”
  • the server can obtain an operation instruction, and adjust the three-dimensional interaction model according to the operation mode corresponding to the operation instruction.
  • the specific adjustment process is not limited in the embodiment of the present invention.
  • the operation instruction is an example of a rotation operation instruction, a scaling operation instruction, and a shift operation instruction, respectively, and the corresponding adjustment process may be specifically:
  • Adjustment process 1 When the operation instruction is a rotation operation instruction, the server acquires a rotation angle and a rotation direction corresponding to the rotation operation instruction, and rotates the three-dimensional interaction model according to the rotation angle and the rotation direction.
  • the server can extract the rotation angle and the rotation direction carried in the rotation operation instruction, and rotate the three-dimensional interaction model based on the three-dimensional interaction model seen by the two parameters and the current user perspective.
  • the rotation angle and the rotation direction are determined when the rotation operation command is generated.
  • the specific manner of determining is not limited in the embodiment of the present invention.
  • Adjustment process 2 When the operation instruction is a zoom operation instruction, the server acquires a reduction ratio or an enlargement ratio corresponding to the zoom operation instruction, and reduces or enlarges the three-dimensional interaction model according to the reduction ratio and the enlargement ratio.
  • the server may extract the reduction ratio or the magnification ratio carried in the zoom operation instruction, and scale the three-dimensional interaction model based on the zoom ratio and the three-dimensional interaction model seen by the current user perspective.
  • the scaling ratio can be determined when generating a scaling operation instruction.
  • the specific manner of determining is not limited in the embodiment of the present invention.
  • each operation may correspond to a default zoom ratio, for example, one click of the left mouse button corresponds to 10% of the enlarged three-dimensional interaction model.
  • Adjustment process 3 When the operation instruction is a shift operation instruction, the server acquires a shift direction and a shift distance corresponding to the shift operation instruction, and performs a shift operation on the three-dimensional interaction model according to the shift direction and the shift distance.
  • the server may extract the shift direction and the shift distance carried in the shift operation instruction, and shift the three-dimensional interaction model based on the two parameters and the three-dimensional interaction model seen by the current user perspective.
  • the shift direction and the shift distance may be determined when a shift operation instruction is generated.
  • the specific manner of determining is not limited in the embodiment of the present invention.
  • the server may receive at least two operation instructions at the same time. At this time, the server may perform at least two adjustment processes in series, or may perform at least two adjustment processes in parallel. For example, when the server receives the rotation operation instruction and the shift operation instruction at the same time, in order to more clearly demonstrate the change process of the three-dimensional interaction model, the server may first rotate the three-dimensional interaction model and then perform the shift; or, in order to make the adjustment process The user's operation process is connected, and the server can rotate and shift the three-dimensional interaction model at the same time.
  • the server may generate video data of one frame and one frame in real time according to the adjustment process, that is, according to the currently adjusted three-dimensional interaction model, the server adjusts the current adjustment according to the current perspective of the user.
  • the three-dimensional interaction model is synthesized and encoded with the session environment data to obtain a current frame of video data, thereby demonstrating the dynamic adjustment process of the three-dimensional interaction model for the user.
  • the above adjustment process may be that the server separately provides services for each user, that is, the three-dimensional interaction model is processed according to the operation instruction triggered by each user, and the video data of the user is obtained; and the operation of the three-dimensional interaction model needs to be operated.
  • the server may also process the three-dimensional interaction model according to the operation instruction triggered by the user with the operation authority, and obtain the video data of each user according to the perspective of each user.
  • an embodiment of the present invention provides a flowchart for adjusting a three-dimensional interaction model, and the server acquires a three-dimensional interaction model, monitors a user's eye gaze direction, acquires operation information, and further operates according to the operation.
  • the operation mode corresponding to the information adjusts the three-dimensional interaction model.
  • the specified video data may be generated.
  • the video data is used to demonstrate the process of transferring the virtual microphone from the virtual host to the virtual user of the third user; based on the specified video data, performing the processing and transmitting according to the perspective of each of the plurality of users in the group video session.
  • the third user may be any user in the group video session.
  • the embodiment of the present invention does not limit the triggering manner of the request for speaking. For example, when the server receives the audio data of the third user, it is triggered automatically, or when the specified operation information of the third user is detected, the specified operation information may be a double-click of the left mouse button.
  • the virtual host can be a virtual character obtained by the server from the virtual character database, or a virtual character of a user in the group video session.
  • the embodiment of the present invention does not limit the manner in which the server obtains the virtual host. For example, the server obtains a virtual host that matches the group attribute according to the group attribute of the group corresponding to the group video session.
  • the server randomly specifies a virtual character of a user as a virtual host, or, at the beginning of the group video session, the server sends voting information for voting the virtual host to the VR device, the voting information includes at least a plurality of users.
  • the VR device displays the voting interface according to the voting information.
  • the server may determine that the user A votes for the user B corresponding to the user information b, and further, the server may The user who counts the most votes has the virtual person of the user as the virtual host.
  • the moving path of the virtual microphone may be determined according to the display position C of the third user in the virtual environment and the current display position D of the virtual microphone, and the moving path may be Is the path from D to C (or the server then determines the path from D to E to C as the moving path according to the display position E of the virtual host), and further, the server can generate one frame and one frame according to the moving path of the virtual microphone.
  • the video data is specified to dynamically characterize the delivery process of the virtual microphone. Further, the server can process and transmit the video data in accordance with each user's perspective.
  • the server may determine the lifting path of the arm model of the third user's virtual character, so that the generated at least one frame specifies the video data corresponding to the arm. The process of lifting and holding the virtual microphone.
  • the server may synthesize the virtual audio of the virtual host to the specified video data, the designated audio data is used to indicate that the third user is about to speak, and may include a piece of speech that is “speaking now by the third user”.
  • the server may adjust the volume V2 of the audio data of the fourth user to be less than V1 according to the volume V1 of the audio data of the third user.
  • the above two methods for highlighting the user's speaking process may also be combined, that is, when the server receives the speaking request of the third user, the specified video data may be generated, and the specified video data is used to display the virtual microphone.
  • the process of transferring from the virtual host to the virtual character of the third user, and specifying the volume of the audio data of the fourth user in the video data is lowered.
  • the server may receive a request for the fourth user to be uttered when the third user speaks.
  • the manner in which the server processes the request for the fourth user by the server is not limited.
  • the server temporarily stores the speech request of the fourth user until the end of the audio data of the third user is detected, and the speech request of the fourth user is continued in the manner of processing the request of the third user in the order of receiving the request.
  • the server may send the utterance prompt information to the terminal where the fourth user is located, and the spoof prompt information user indicates when the fourth user speaks, which may include, for example, “the next one is you. Oh" text message.
  • the interaction mode of the group video session is extended.
  • the multimedia file corresponding to the multimedia play request may be synthesized into multiple users.
  • the multimedia file is an audio file, a video file, or a text file.
  • the multimedia file play request may directly carry the multimedia file, or may carry the file identifier of the multimedia file, so that the server obtains the multimedia file corresponding to the file identifier from the multimedia database or the network.
  • the method for synthesizing a multimedia file is not limited in the embodiment of the present invention.
  • the server may synthesize the audio file as background audio into the video data; when the multimedia file is a video file, the server may synthesize the video file to the user according to the perspective of each user.
  • the video files are embedded in the virtual environment in a "screen play" manner.
  • an embodiment of the present invention provides an interaction flowchart.
  • the server may authorize the user 1 to operate the three-dimensional interaction model, and authorize the user 2 to play the multimedia file. Therefore, the server
  • the three-dimensional interaction model may be adjusted based on the operation information of the user 1, thereby providing a service for operating the three-dimensional interaction model, and may also synthesize the multimedia file to the video data based on the multimedia file play request of the user 2, thereby providing a service for sharing the multimedia file.
  • the server sends video data of multiple users to a terminal where multiple users are located.
  • the video data when the terminal receives the video data, the video data can be displayed. Since the video data is processed according to the user's perspective, each user can see the three-dimensional interaction model of the self-view from the video data.
  • the server can directly send the video data to the VR device where the user is located.
  • the server can extract the two-dimensional video of a certain angle when processing the three-dimensional interaction model. The data, so that the two-dimensional video data is sent to the traditional terminal where the user is located, so that multiple users can freely communicate without being restricted by the type of the device.
  • the embodiment of the present invention acquires a three-dimensional interaction model of a target to be displayed, processes a three-dimensional interaction model according to a perspective of each user in the group video session, and obtains video data obtained by performing perspective transformation on the three-dimensional interaction model, and the video data is obtained.
  • the method is sent to a terminal where multiple users are located, so that multiple users can experience the same three-dimensional interaction model in a group video session and communicate through the three-dimensional interaction model, thereby improving the efficiency of the video session based on the extended communication mode.
  • the three-dimensional interaction model may be adjusted according to an operation mode corresponding to the operation instruction, thereby providing a service for the user to operate the three-dimensional interaction model, and may be based on the adjusted three-dimensional interaction.
  • the model sends video data to multiple users, so that multiple users can interact based on the same three-dimensional interaction model, further improving the efficiency of the video session.
  • At least two manners of obtaining an operation instruction are provided, and when the gesture feature data is matched with any operation mode of the three-dimensional interaction model by using the gesture feature data of the first user, it is determined that the operation instruction corresponding to the operation mode is received,
  • the operation information corresponding to the operation mode is determined by the second user to access the operation information of the device, and the operation instruction corresponding to the operation mode is determined, and the operation instruction may be triggered according to the user gesture, or may be performed according to the user.
  • the operation information triggers the operation instruction, thereby providing a variety of operation instructions to acquire, and the operability is stronger.
  • At least three processes for adjusting the three-dimensional interaction model according to the operation instruction are provided, for example, rotating the three-dimensional interaction model according to the rotation operation instruction, reducing or enlarging the three-dimensional interaction model according to the zoom operation instruction, and moving the three-dimensional interaction model according to the shift operation instruction Bits, thus providing a variety of adjustment methods, increase the interactive strength of video sessions, and further improve the efficiency of video sessions.
  • At least two methods for processing the speaking request are provided, such as generating specified video data, which is used to display the virtual microphone from the virtual
  • the host passes the avatar to the third user, or decreases the volume of the fourth user's audio data.
  • At least two ways of obtaining a three-dimensional interaction model are provided, such as acquiring a three-dimensional object model uploaded by a fifth user, or acquiring a two-dimensional form uploaded by a sixth user, and processing a three-dimensional form model to provide diversification 3D interaction model.
  • the communication mode in the video session is further extended.
  • the multimedia file when receiving the multimedia file play request, the multimedia file can be synthesized to the video data of multiple users, so that multiple users can share the multimedia file.
  • the seventh user is likely to have an operation.
  • the operation prompt information can be sent to the terminal where the seventh user is located, so as to prompt the seventh user to operate the three-dimensional interaction model in time.
  • FIG. 21 is a block diagram of a device for a group video session according to an embodiment of the present invention. Referring to FIG. 21, the device specifically includes:
  • the interaction model acquisition module 2101 is configured to acquire a three-dimensional interaction model of the object to be displayed during the group video session;
  • the processing module 2102 is configured to process the three-dimensional interaction model of the target according to the perspective of each of the multiple users in the group video session to obtain video data of the user, where the video data of the user includes a three-dimensional interaction model of the target object.
  • Model data obtained by performing perspective transformation
  • the sending module 2103 is configured to separately send video data of multiple users to terminals of multiple users.
  • the embodiment of the present invention acquires a three-dimensional interaction model of a target to be displayed, processes a three-dimensional interaction model according to a perspective of each user in the group video session, and obtains video data obtained by performing perspective transformation on the three-dimensional interaction model, and the video data is obtained.
  • the method is sent to a terminal where multiple users are located, so that multiple users can experience the same three-dimensional interaction model in a group video session and communicate through the three-dimensional interaction model, thereby improving the efficiency of the video session based on the extended communication mode.
  • the device further includes: an adjustment module 2104;
  • the adjusting module 2104 is configured to: when receiving an operation instruction to the three-dimensional interaction model, adjust the three-dimensional interaction model according to an operation mode corresponding to the operation instruction;
  • the processing module 2102 is configured to perform, according to the adjusted three-dimensional interaction model, a step of processing according to a perspective of each of a plurality of users in the group video session;
  • the sending module 2103 is configured to send, by the processing module, the video data processed according to the perspective of each of the plurality of users in the group video session.
  • the device further includes:
  • the gesture acquiring module 2105 is configured to acquire the gesture feature data of the first user, and when the gesture feature data matches any operation mode of the three-dimensional interaction model, determine that an operation instruction corresponding to the operation mode is received; or
  • the operation information obtaining module 2106 is configured to obtain operation information of the second user external device, and when the operation information matches any operation mode of the three-dimensional interaction model, determine that the operation instruction corresponding to the operation mode is received, and the external device and the second user Bind at the terminal.
  • the adjustment module 2104 is configured to: when the operation instruction is a rotation operation instruction, acquire a rotation angle and a rotation direction corresponding to the rotation operation instruction, rotate the three-dimensional interaction model according to the rotation angle and the rotation direction; and/or The adjustment module is configured to: when the operation instruction is a zoom operation instruction, obtain a reduction ratio or an enlargement ratio corresponding to the zoom operation instruction, and reduce or enlarge the three-dimensional interaction model according to the reduction ratio and the enlargement ratio; and/or, the adjustment module is configured to: When the operation instruction is a shift operation instruction, the shift direction and the shift distance corresponding to the shift operation instruction are acquired, and the three-dimensional interaction model is shifted according to the shift direction and the shift distance.
  • the device further includes:
  • the generating module 2107 is configured to: when receiving the speaking request of the third user, generate specified video data, and specify the video data to display a process of transmitting the virtual microphone from the virtual host to the virtual person of the third user;
  • the processing module 2102 is configured to perform, according to the specified video data, a step of processing according to a perspective of each of the plurality of users in the group video session;
  • the sending module 2103 is configured to send, by the processing module, the specified video data processed according to the perspective of each of the plurality of users in the group video session.
  • the device further includes:
  • the lowering module 2108 is configured to: when receiving the request for the third user, reduce the volume of the audio data of the fourth user, where the fourth user is a user other than the third user in the group video session;
  • the processing module 2102 is configured to perform, according to the adjusted audio data, a step of processing according to a perspective of each of a plurality of users in the group video session;
  • the sending module 2103 is configured to send, by the processing module, the video data processed according to the perspective of each of the plurality of users in the group video session.
  • the interaction model acquisition module 2101 is configured to: acquire a three-dimensional object model uploaded by the fifth user; or, the interaction model acquisition module is configured to: acquire a two-dimensional table uploaded by the sixth user, and obtain a two-dimensional table. Processed to get a three-dimensional table model.
  • the device further includes: a synthesizing module 2109, configured to synthesize the multimedia file corresponding to the multimedia play request at most when receiving the multimedia file play request Video data of users.
  • a synthesizing module 2109 configured to synthesize the multimedia file corresponding to the multimedia play request at most when receiving the multimedia file play request Video data of users.
  • the sending module 2103 is further configured to: when detecting that the gaze duration of the third user to the three-dimensional interaction model is greater than the preset duration, send the operation prompt information to the terminal where the seventh user is located, and use the operation prompt information At the prompt, the seventh user can operate on the three-dimensional interaction model.
  • the device of the group video session provided by the foregoing embodiment is only illustrated by the division of the foregoing functional modules. In actual applications, the functions may be assigned differently according to requirements.
  • the function module is completed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the apparatus for the group video session provided by the foregoing embodiment is the same as the method embodiment of the group video session. For details, refer to the method embodiment, and details are not described herein again.
  • FIG. 27 is a block diagram showing the structure of a terminal 2700 according to an exemplary embodiment of the present invention.
  • the terminal 2700 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III), and an MP4 (Moving Picture Experts Group Audio Layer IV). Level 4) Player, laptop or desktop computer.
  • Terminal 2700 may also be referred to as a user device, a portable terminal, a laptop terminal, a desktop terminal, and the like.
  • the terminal 2700 includes a processor 2701 and a memory 2702.
  • the processor 2701 can include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 2701 may be configured by at least one of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). achieve.
  • the processor 2701 may also include a main processor and a coprocessor.
  • the main processor is a processor for processing data in an awake state, which is also called a CPU (Central Processing Unit); the coprocessor is A low-power processor for processing data in standby.
  • the processor 2701 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and rendering of the content that the display needs to display.
  • the processor 2701 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
  • AI Artificial Intelligence
  • Memory 2702 can include one or more computer readable storage media, which can be non-transitory. Memory 2702 can also include high speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer readable storage medium in memory 2702 is for storing at least one instruction for execution by processor 2701 to implement the XXXX method provided by the method embodiments of the present application. .
  • the terminal 2700 also optionally includes a peripheral device interface 2703 and at least one peripheral device.
  • the processor 2701, the memory 2702, and the peripheral device interface 2703 may be connected by a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 2703 via a bus, signal line or circuit board.
  • the peripheral device includes at least one of a radio frequency circuit 2704, a touch display screen 2705, a camera 2706, an audio circuit 2707, a positioning component 2708, and a power source 2709.
  • Peripheral device interface 2703 can be used to connect at least one peripheral device associated with an I/O (Input/Output) to processor 2701 and memory 2702.
  • processor 2701, memory 2702, and peripheral interface 2703 are integrated on the same chip or circuit board; in some other embodiments, any of processor 2701, memory 2702, and peripheral interface 2703 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the RF circuit 2704 is configured to receive and transmit an RF (Radio Frequency) signal, also referred to as an electromagnetic signal.
  • the radio frequency circuit 2704 communicates with the communication network and other communication devices via electromagnetic signals.
  • the radio frequency circuit 2704 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal.
  • the radio frequency circuit 2704 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like.
  • Radio frequency circuitry 2704 can communicate with other terminals via at least one wireless communication protocol.
  • the wireless communication protocol includes, but is not limited to, a metropolitan area network, various generations of mobile communication networks (2G, 3G, 4G, and 5G), a wireless local area network, and/or a WiFi (Wireless Fidelity) network.
  • the radio frequency circuit 2704 may further include an NFC (Near Field Communication) related circuit, which is not limited in this application.
  • the display 2705 is used to display a UI (User Interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display 2705 also has the ability to capture touch signals over the surface or surface of the display 2705.
  • the touch signal can be input to the processor 2701 for processing as a control signal.
  • display 2705 can also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
  • the display screen 2705 can be one, and the front panel of the terminal 2700 is disposed; in other embodiments, the display screen 2705 can be at least two, respectively disposed on different surfaces of the terminal 2700 or in a folded design; In still other embodiments, the display screen 2705 can be a flexible display screen disposed on a curved surface or a folded surface of the terminal 2700. Even the display screen 2705 can be set to a non-rectangular irregular pattern, that is, a profiled screen.
  • the display screen 2705 can be made of a material such as an LCD (Liquid Crystal Display) or an OLED (Organic Light-Emitting Diode).
  • Camera component 2706 is used to capture images or video.
  • camera assembly 2706 includes a front camera and a rear camera.
  • the front camera is placed on the front panel of the terminal, and the rear camera is placed on the back of the terminal.
  • the rear camera is at least two, which are respectively a main camera, a depth camera, a wide-angle camera, and a telephoto camera, so as to realize the background blur function of the main camera and the depth camera, and the main camera Combine with a wide-angle camera for panoramic shooting and VR (Virtual Reality) shooting or other integrated shooting functions.
  • camera assembly 2706 can also include a flash.
  • the flash can be a monochrome temperature flash or a two-color temperature flash.
  • the two-color temperature flash is a combination of a warm flash and a cool flash that can be used for light compensation at different color temperatures.
  • the audio circuit 2707 can include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals for input to the processor 2701 for processing, or to the radio frequency circuit 2704 for voice communication.
  • the microphones may be multiple, and are respectively disposed at different parts of the terminal 2700.
  • the microphone can also be an array microphone or an omnidirectional acquisition microphone.
  • the speaker is then used to convert electrical signals from the processor 2701 or the RF circuit 2704 into sound waves.
  • the speaker can be a conventional film speaker or a piezoelectric ceramic speaker.
  • the audio circuit 2707 can also include a headphone jack.
  • the positioning component 2708 is configured to locate the current geographic location of the terminal 2700 to implement navigation or LBS (Location Based Service).
  • the positioning component 2708 can be a positioning component based on a US-based GPS (Global Positioning System), a Chinese Beidou system, a Russian Greiner system, or an EU Galileo system.
  • Power source 2709 is used to power various components in terminal 2700.
  • the power source 2709 can be an alternating current, a direct current, a disposable battery, or a rechargeable battery.
  • the rechargeable battery can support wired charging or wireless charging.
  • the rechargeable battery can also be used to support fast charging technology.
  • terminal 2700 also includes one or more sensors 2710.
  • the one or more sensors 2710 include, but are not limited to, an acceleration sensor 2711, a gyro sensor 2712, a pressure sensor 2713, a fingerprint sensor 2714, an optical sensor 2715, and a proximity sensor 2716.
  • the acceleration sensor 2711 can detect the magnitude of the acceleration on the three coordinate axes of the coordinate system established by the terminal 2700.
  • the acceleration sensor 2711 can be used to detect the component of the gravitational acceleration on three coordinate axes.
  • the processor 2701 can control the touch display 2705 to display the user interface in a landscape view or a portrait view according to the gravity acceleration signal acquired by the acceleration sensor 2711.
  • the acceleration sensor 2711 can also be used for the acquisition of game or user motion data.
  • the gyro sensor 2712 can detect the body direction and the rotation angle of the terminal 2700, and the gyro sensor 2712 can cooperate with the acceleration sensor 2711 to collect the 3D motion of the user to the terminal 2700. Based on the data collected by the gyro sensor 2712, the processor 2701 can implement functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
  • functions such as motion sensing (such as changing the UI according to the user's tilting operation), image stabilization at the time of shooting, game control, and inertial navigation.
  • the pressure sensor 2713 can be disposed on a side border of the terminal 2700 and/or a lower layer of the touch display screen 2705.
  • the pressure sensor 2713 When the pressure sensor 2713 is disposed on the side frame of the terminal 2700, the user's holding signal to the terminal 2700 can be detected, and the processor 2701 performs left and right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 2713.
  • the operability control on the UI interface is controlled by the processor 2701 according to the user's pressure operation on the touch display screen 2705.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 2714 is used to collect the fingerprint of the user.
  • the processor 2701 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 2714, or the fingerprint sensor 2714 identifies the identity of the user according to the collected fingerprint. Upon identifying that the identity of the user is a trusted identity, the processor 2701 authorizes the user to perform related sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying and changing settings, and the like.
  • the fingerprint sensor 2714 can be provided with the front, back or side of the terminal 2700. When the physical button or vendor logo is set on the terminal 2700, the fingerprint sensor 2714 can be integrated with the physical button or the manufacturer logo.
  • Optical sensor 2715 is used to collect ambient light intensity.
  • the processor 2701 can control the display brightness of the touch display 2705 based on the ambient light intensity acquired by the optical sensor 2715. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 2705 is raised; when the ambient light intensity is low, the display brightness of the touch display screen 2705 is lowered.
  • the processor 2701 can also dynamically adjust the shooting parameters of the camera assembly 2706 based on the ambient light intensity acquired by the optical sensor 2715.
  • Proximity sensor 2716 also referred to as a distance sensor, is typically disposed on the front panel of terminal 2700. Proximity sensor 2716 is used to capture the distance between the user and the front of terminal 2700. In one embodiment, when the proximity sensor 2716 detects that the distance between the user and the front side of the terminal 2700 is gradually decreasing, the touch screen 2705 is controlled by the processor 2701 to switch from the bright screen state to the screen state; when the proximity sensor 2716 detects When the distance between the user and the front of the terminal 2700 gradually becomes larger, the processor 2701 controls the touch display 2705 to switch from the state of the screen to the bright state.
  • FIG. 27 does not constitute a limitation to terminal 2700, may include more or fewer components than illustrated, or may combine certain components, or employ different component arrangements.
  • FIG. 28 is a schematic structural diagram of a network device according to an embodiment of the present invention.
  • the network device 2800 may have a large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 2801. And one or more memories 2802, wherein the memory 2802 stores at least one instruction that is loaded and executed by the processor 2801 to implement the methods provided by the various method embodiments described above.
  • the network device may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output.
  • the network device may also include other components for implementing the functions of the device, and details are not described herein.
  • a computer readable storage medium such as a memory including instructions executable by a processor in a terminal to perform a resource issuance method or a resource collection method in the following embodiments.
  • the computer readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Processing Or Creating Images (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本发明是关于一种群组视频会话的方法及网络设备,涉及网络技术领域。本发明通过确定群组视频会话中每个用户的用户类型,根据用户类型处理群组视频会话的视频数据,从而当用户类型为虚拟用户时,可以得到与虚拟用户所指示的虚拟现实显示模式匹配的目标视频数据,当用户类型为普通用户时,可以得到与普通用户所指示的二维显示模式匹配的目标视频数据,从而为不同类型的用户采用合理的显示模式显示视频数据,使得不同类型的用户之间能够不受限制地进行群组视频会话,提高了群组视频会话的灵活性。

Description

群组视频会话的方法及网络设备
本申请要求于2017年2月24日提交中国国家知识产权局、申请号为2017101044392、2017101044424、2017101046699,发明名称均为“群组视频会话的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及VR(Virtual Reality,虚拟现实)技术领域,特别涉及一种群组视频会话的方法及网络设备。
背景技术
VR技术是一种可以创建和体验虚拟世界的技术,能够模拟出的逼真环境并智能地感知用户的行为,使得用户感觉身临其境。因此,VR技术在社交方面的应用受到了广泛关注,基于VR技术进行群组视频会话的方法应运而生。
目前,在群组视频会话时,服务器可以为多个使用VR设备的虚拟用户创建出虚拟环境,将虚拟用户选择的虚拟人物与虚拟环境叠加,以表达虚拟用户在虚拟环境中的影像,进而,服务器可以将虚拟用户的音频与影像叠加的视频发送给虚拟用户,为虚拟用户带来视觉和听觉体验,使虚拟用户仿佛在虚拟的世界中与其他虚拟用户畅谈。
在实现本发明的过程中,发明人发现现有技术至少存在以下问题:
虚拟用户只能和虚拟用户之间进行群组视频会话,在VR设备尚未普及的今天,众多未使用VR设备的普通用户与虚拟用户之间存在很大地沟通障碍,导致群组视频会话时的限制性强,灵活性差。
发明内容
本发明实施例提供了一种群组视频会话的方法及网络设备,使得不同类型的用户之间能够不受限制地进行群组视频会话,提高了群组视频会话的灵活 性。所述技术方案如下:
一方面,提供了一种群组视频会话的方法,所述方法包括:
创建群组视频会话;
对于所述群组视频会话中的每个用户,根据所述用户的设备信息,确定所述用户的用户类型,所述用户类型包括普通用户和虚拟用户,所述普通用户用于指示所述用户在参与所述群组视频会话时采用二维显示模式,所述虚拟用户用于指示所述用户在参与所述群组视频会话时采用虚拟现实显示模式;
根据所述用户的用户类型所指示的视频显示模式,对所述群组视频会话的视频数据进行处理,得到所述用户的目标视频数据,所述目标视频数据的视频显示模式与所述用户的用户类型所指示的视频显示模式匹配;
在所述群组视频会话的进行过程中,向所述用户的用户设备发送目标视频数据,使所述用户进行群组视频会话。
一方面,提供了一种群组视频会话的方法,所述方法包括:
接收服务器发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与终端用户的用户类型所指示的视频显示模式匹配,所述终端用户的用户类型为普通用户,所述普通用户用于指示所述终端用户在参与所述群组视频会话时采用二维显示模式;
显示所述目标视频数据,使群组视频会话中的普通用户以二维人物形式显示,所述群组视频会话中的虚拟用户以二维虚拟人物的形式显示。
一方面,提供了一种群组视频会话的方法,所述方法包括:
接收服务器发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与VR设备用户的用户类型所指示的视频显示模式匹配,所述VR设备用户的用户类型为虚拟用户,所述虚拟用户用于指示所述VR设备用户在参与所述群组视频会话时采用虚拟现实显示模式;
显示所述目标视频数据,使群组视频会话中的普通用户在虚拟环境中以二维人物或三维人物的形式显示,所述群组视频会话中的虚拟用户在所述虚拟环境中以三维虚拟人物的形式显示。
一方面,提供了一种群组视频会话的装置,所述装置包括:
创建模块,用于创建群组视频会话;
确定模块,用于对于所述群组视频会话中的每个用户,根据所述用户的设备信息,确定所述用户的用户类型,所述用户类型包括普通用户和虚拟用户, 所述普通用户用于指示所述用户在参与所述群组视频会话时采用二维显示模式,所述虚拟用户用于指示所述用户在参与所述群组视频会话时采用虚拟现实显示模式;
处理模块,用于根据所述用户的用户类型所指示的视频显示模式,对所述群组视频会话的视频数据进行处理,得到所述用户的目标视频数据,所述目标视频数据的视频显示模式与所述用户的用户类型所指示的视频显示模式匹配;
发送模块,用于在所述群组视频会话的进行过程中,向所述用户的用户设备发送目标视频数据,使所述用户进行群组视频会话。
一方面,提供了一种群组视频会话的装置,所述装置包括:
接收模块,用于接收服务器发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与终端用户的用户类型所指示的视频显示模式匹配,所述终端用户的用户类型为普通用户,所述普通用户用于指示所述终端用户在参与所述群组视频会话时采用二维显示模式;
显示模块,用于显示所述目标视频数据,使群组视频会话中的普通用户以二维人物形式显示,所述群组视频会话中的虚拟用户以二维虚拟人物的形式显示。
一方面,提供了一种群组视频会话的装置,所述装置包括:
接收模块,用于接收服务器发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与VR设备用户的用户类型所指示的视频显示模式匹配,所述VR设备用户的用户类型为虚拟用户,所述虚拟用户用于指示所述VR设备用户在参与所述群组视频会话时采用虚拟现实显示模式;
显示模块,用于显示所述目标视频数据,使群组视频会话中的普通用户在虚拟环境中以二维人物或三维人物的形式显示,所述群组视频会话中的虚拟用户在所述虚拟环境中以三维虚拟人物的形式显示。
一方面,提供了一种网络设备,所述网络设备包括存储器和处理器,所述存储器用于存储指令,所述处理器被配置为执行所述指令,以执行下述群组视频会话的方法的步骤:
创建群组视频会话;
对于所述群组视频会话中的每个用户,根据所述用户的设备信息,确定所述用户的用户类型,所述用户类型包括普通用户和虚拟用户,所述普通用户用 于指示所述用户在参与所述群组视频会话时采用二维显示模式,所述虚拟用户用于指示所述用户在参与所述群组视频会话时采用虚拟现实显示模式;
根据所述用户的用户类型所指示的视频显示模式,对所述群组视频会话的视频数据进行处理,得到所述用户的目标视频数据,所述目标视频数据的视频显示模式与所述用户的用户类型所指示的视频显示模式匹配;
在所述群组视频会话的进行过程中,向所述用户的用户设备发送目标视频数据,使所述用户进行群组视频会话。
一方面,提供了一种终端,所述终端包括存储器和处理器,所述存储器用于存储指令,所述处理器被配置为执行所述指令,以执行下述群组视频会话的方法的步骤:
接收网络设备发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与终端用户的用户类型所指示的视频显示模式匹配,所述终端用户的用户类型为普通用户,所述普通用户用于指示所述终端用户在参与所述群组视频会话时采用二维显示模式;
显示所述目标视频数据,使群组视频会话中的普通用户以二维人物形式显示,所述群组视频会话中的虚拟用户以二维虚拟人物的形式显示。
一方面,提供了一种虚拟现实VR设备,所述VR设备包括存储器和处理器,所述存储器用于存储指令,所述处理器被配置为执行所述指令,以执行下述群组视频会话的方法的步骤:
接收网络设备发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与VR设备用户的用户类型所指示的视频显示模式匹配,所述VR设备用户的用户类型为虚拟用户,所述虚拟用户用于指示所述VR设备用户在参与所述群组视频会话时采用虚拟现实显示模式;
显示所述目标视频数据,使群组视频会话中的普通用户在虚拟环境中以二维人物或三维人物的形式显示,所述群组视频会话中的虚拟用户在所述虚拟环境中以三维虚拟人物的形式显示。
一方面,提供了一种群组视频会话系统,所述系统包括:
网络设备,被配置为创建群组视频会话;对于所述群组视频会话中的每个用户,根据所述用户的设备信息,确定所述用户的用户类型,所述用户类型包括普通用户和虚拟用户,所述普通用户用于指示所述用户在参与所述群组视频会话时采用二维显示模式,所述虚拟用户用于指示所述用户在参与所述群组视 频会话时采用虚拟现实显示模式;根据所述用户的用户类型所指示的视频显示模式,对所述群组视频会话的视频数据进行处理,得到所述用户的目标视频数据,所述目标视频数据的视频显示模式与所述用户的用户类型所指示的视频显示模式匹配;在所述群组视频会话的进行过程中,向所述用户的用户设备发送目标视频数据,使所述用户进行群组视频会话;
终端,被配置为接收网络设备发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与终端用户的用户类型所指示的视频显示模式匹配,所述终端用户的用户类型为普通用户,所述普通用户用于指示所述终端用户在参与所述群组视频会话时采用二维显示模式;显示所述目标视频数据,使群组视频会话中的普通用户以二维人物形式显示,所述群组视频会话中的虚拟用户以二维虚拟人物的形式显示;
虚拟现实VR设备,被配置为接收网络设备发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与VR设备用户的用户类型所指示的视频显示模式匹配,所述VR设备用户的用户类型为虚拟用户,所述虚拟用户用于指示所述VR设备用户在参与所述群组视频会话时采用虚拟现实显示模式;显示所述目标视频数据,使群组视频会话中的普通用户在虚拟环境中以二维人物或三维人物的形式显示,所述群组视频会话中的虚拟用户在所述虚拟环境中以三维虚拟人物的形式显示。
一方面,提供了一种群组视频会话的方法,所述方法包括:
获取群组视频会话中第一用户的虚拟人物,所述第一用户的虚拟人物至少根据所述第一用户的头部特征数据和所述第一用户对应的肢体模型得到;
在所述群组视频会话的过程中,基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据,所述视频数据中所述第一用户的虚拟人物的动作与所述第一用户的实际动作匹配;
向参与所述群组视频会话的第二用户所在终端发送所述第一用户的视频数据,以实现所述群组视频会话。
一方面,提供了一种群组视频会话的装置,所述装置包括:
虚拟人物获取模块,用于获取群组视频会话中第一用户的虚拟人物,所述第一用户的虚拟人物至少根据所述第一用户的头部特征数据和所述第一用户对应的肢体模型得到;
视频数据获取模块,用于在所述群组视频会话的过程中,基于所述第一用 户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据,所述视频数据中所述第一用户的虚拟人物的动作与所述第一用户的实际动作匹配;
发送模块,用于向参与所述群组视频会话的第二用户所在终端发送所述第一用户的视频数据,以实现所述群组视频会话。
一方面,提供了一种虚拟现实VR设备,所述VR设备包括存储器和处理器,所述存储器用于存储指令,所述处理器被配置为执行所述指令,以执行下述群组视频会话的方法的步骤:
获取群组视频会话中第一用户的虚拟人物,所述第一用户的虚拟人物至少根据所述第一用户的头部特征数据和所述第一用户对应的肢体模型得到;
在所述群组视频会话的过程中,基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据,所述视频数据中所述第一用户的虚拟人物的动作与所述第一用户的实际动作匹配;
向参与所述群组视频会话的第二用户所在终端发送所述第一用户的视频数据,以实现所述群组视频会话。
一方面,提供了一种网络设备,所述网络设备包括存储器和处理器,所述存储器用于存储指令,所述处理器被配置为执行所述指令,以执行下述群组视频会话的方法的步骤:
获取群组视频会话中第一用户的虚拟人物,所述第一用户的虚拟人物至少根据所述第一用户的头部特征数据和所述第一用户对应的肢体模型得到;
在所述群组视频会话的过程中,基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据,所述视频数据中所述第一用户的虚拟人物的动作与所述第一用户的实际动作匹配;
向参与所述群组视频会话的第二用户所在终端发送所述第一用户的视频数据,以实现所述群组视频会话。
一方面,提供了一种群组视频会话的方法,所述方法包括:
在群组视频会话过程中,获取待展示的目标物的三维交互模型;
根据所述群组视频会话中多个用户中每个用户的视角,对所述目标物的三维交互模型进行处理,得到所述用户的视频数据,所述用户的视频数据包含对所述目标物的三维交互模型进行视角变换得到的模型数据;
将所述多个用户的视频数据分别发送至所述多个用户所在终端。
一方面,提供了一种群组视频会话的装置,所述装置包括:
交互模型获取模块,用于在群组视频会话过程中,获取待展示的目标物的三维交互模型;
处理模块,用于根据所述群组视频会话中多个用户中每个用户的视角,对所述目标物的三维交互模型进行处理,得到所述用户的视频数据,所述用户的视频数据包含对所述目标物的三维交互模型进行视角变换得到的模型数据;
发送模块,用于将所述多个用户的视频数据分别发送至所述多个用户所在终端。
一方面,提供了一种网络设备,所述网络设备包括存储器和处理器,所述存储器用于存储指令,所述处理器被配置为执行所述指令,以执行下述群组视频会话的方法的步骤:
在群组视频会话过程中,获取待展示的目标物的三维交互模型;
根据所述群组视频会话中多个用户中每个用户的视角,对所述目标物的三维交互模型进行处理,得到所述用户的视频数据,所述用户的视频数据包含对所述目标物的三维交互模型进行视角变换得到的模型数据;
将所述多个用户的视频数据分别发送至所述多个用户所在终端。
本发明实施例通过确定群组视频会话中每个用户的用户类型,根据用户类型处理群组视频会话的视频数据,从而当用户类型为虚拟用户时,可以得到与虚拟用户所指示的虚拟现实显示模式匹配的目标视频数据,当用户类型为普通用户时,可以得到与普通用户所指示的二维显示模式匹配的目标视频数据,从而为不同类型的用户采用合理的显示模式显示视频数据,使得不同类型的用户之间能够不受限制地进行群组视频会话,提高了群组视频会话的灵活性。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种群组视频会话的实施环境示意图;
图2是本发明实施例提供的一种群组视频会话的方法流程图;
图3是本发明实施例提供的一种用户显示位置的示意图;
图4是本发明实施例提供的一种群组视频会话场景的示意图;
图5是本发明实施例提供的一种显示场景示意图;
图6是本发明实施例提供的一种虚拟用户进行群组视频会话的流程图;
图7是本发明实施例提供的一种群组视频会话的装置框图;
图8是本发明实施例提供的一种群组视频会话的装置框图;
图9是本发明实施例提供的一种群组视频会话的装置框图;
图10是本发明实施例提供的一种群组视频会话的方法流程图;
图11是本发明实施例提供的一种获取虚拟人物的流程图;
图12是本发明实施例提供的一种获取头部方位数据的流程图;
图13是本发明实施例提供的一种获取视频数据的流程图;
图14是本发明实施例提供的一种群组视频会话的流程图;
图15是本发明实施例提供的一种显示视频数据的流程图;
图16是本发明实施例提供的一种群组视频会话的装置框图;
图17是本发明实施例提供的一种群组视频会话的方法流程图;
图18是本发明实施例提供的一种三维交互模型的示意图;
图19是本发明实施例提供的一种调整三维交互模型的流程图;
图20是本发明实施例提供的一种交互流程图;
图21是本发明实施例提供的一种群组视频会话的装置框图;
图22是本发明实施例提供的一种群组视频会话的装置框图;
图23是本发明实施例提供的一种群组视频会话的装置框图;
图24是本发明实施例提供的一种群组视频会话的装置框图;
图25是本发明实施例提供的一种群组视频会话的装置框图;
图26是本发明实施例提供的一种群组视频会话的装置框图;
图27示出了本发明一个示例性实施例提供的终端2700的结构框图;
图28是本发明实施例提供的一种网络设备的框图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
图1是本发明实施例提供的一种群组视频会话的实施环境示意图。参见图 1,该实施环境中包括:
至少一个终端101(如,移动终端和平板电脑)、至少一个VR设备102和至少一个服务器103。其中,终端101、VR设备102和服务器103的交互过程可对应下述实施例中的群组视频会话的过程;服务器103用于为不同类型的用户创建群组视频会话、接收并处理终端101和VR设备102所发送的视频数据、将处理后的视频数据发送至终端101或VR设备102,使得不同类型的用户之间能够进行群组视频会话。终端101用于将摄像头拍摄到的视频数据实时发送至服务器103、接收并显示服务器103处理后的视频数据。VR设备102用于将传感设备采集到的用户的行为特征数据发送至服务器103、接收并显示服务器103处理后的视频数据。
其中,服务器103还可以用于获取使用终端101或VR设备102的用户的虚拟人物、基于该用户的虚拟人物和行为特征数据得到视频数据。终端101用于接收并显示服务器103发送的视频数据。VR设备102也可以还用于获取VR设备102的用户的虚拟人物、基于该用户的虚拟人物和行为特征数据得到视频数据。
另外,该服务器103还可以配置至少一个数据库,如五官模型数据库、肢体模型数据、虚拟人物数据库、用户资料数据库和用户关系链数据库等等。该五官模型数据库用于存储卡通化的五官模型;肢体模型数据库用于存储卡通化的肢体模型,该肢体模型数据库还可以存储有着装;虚拟人物数据库用于对应存储用户的用户标识和虚拟人物;用户资料数据库至少用于存储用户的年龄数据、性别数据和职业数据等用户属性;用户关系链数据库用于存储用户具有的用户关系链数据,如,用户关系链数据至少用于指示与该用户为好友关系或群组关系的用户。
需要说明的是,当VR设备102用于获取虚拟人物时,可以从服务器103配置的至少一个数据库中获取五官模型、肢体模型或虚拟人物。而且,本发明实施例中涉及的虚拟人物(包括头部模型和肢体模型)可以为三维形式。
图2是本发明实施例提供的一种群组视频会话的方法流程图。参见图2,该方法应用于服务器与终端、VR设备的交互过程。
201、服务器创建群组视频会话。
群组视频会话是指多个(两个或两个以上)用户基于服务器进行的视频会 话。其中,多个用户可以是该服务器对应的社交平台上的多个用户,该多个用户之间可能是群组关系或好友关系。
该步骤中,当服务器接收任一用户设备的群组视频会话请求时,可以创建群组视频会话。本发明实施例对该群组视频会话请求的发起方式不做限定。例如,由某用户在已建立的群组中对该群组中的所有用户发起群组视频会话请求,该举例中,群组视频会话请求可以携带该群组的群组标识,使得服务器可以根据群组标识获取该群组中每个用户的用户标识。又例如,该用户也可以从已建立的群组中或者用户关系链中选择一些用户后发起群组视频会话请求,该举例中,群组视频会话请求可以携带该用户和被选择用户的用户标识。服务器获取到用户标识后,可以将用户标识对应的用户添加到群组视频会话中,从而创建群组视频会话。
202、对于群组视频会话中的每个用户,服务器根据该用户的设备信息,确定该用户的用户类型。
设备信息可以是用户登录服务器所使用的用户设备的设备型号,设备型号的表现形式如:手机品牌+手机型号,使得服务器可以根据设备型号与设备类型的对应关系确定该用户设备的设备类型,设备类型可以为PC(Personal Computer,个人电脑)终端、移动终端或VR设备。
该步骤中,服务器可以通过多种方式获取设备信息,例如,用户设备向服务器发送登录请求时,登录请求可以携带用户标识和设备信息,使得服务器接收到登录请求时能够提取出用户标识和设备信息,并对应存储,或者,服务器向用户设备发送设备信息获取请求,使得用户设备将设备信息发送至服务器。
由于群组视频会话中的用户可能使用不同的用户设备登录服务器,不同的用户设备支持的视频显示模式不同(VR设备支持虚拟现实显示模式,终端支持二维显示模式)。因此,服务器需要为使用不同用户设备的用户采用不同的方式处理视频数据,以得到与用户设备支持的视频显示模式匹配的视频数据,而为了确定如何为某个用户处理视频数据,服务器需要先确定该用户的用户类型。用户类型包括普通用户和虚拟用户,普通用户用于指示用户在参与群组视频会话时采用二维显示模式,如果该用户为普通用户,说明该用户是使用非VR设备登录服务器的用户,非VR设备如移动终端、平板电脑等,虚拟用户用于指示用户在参与群组视频会话时采用虚拟现实显示模式,如果该用户为虚拟用户,说明该用户是使用VR设备登录服务器的用户。
该步骤中,服务器可以根据预先配置的设备信息、设备类型与用户类型的对应关系,查询与用户的设备信息对应的用户类型。该对应关系的举例参见表1:
表1
设备信息 设备类型 用户类型
XX thinkpad PC终端 普通用户
WW N7 移动终端 普通用户
UU VR VR设备 虚拟用户
事实上,用户也可以自行设置设备信息,例如,在VR设备上提供设备信息设置页面,VR设备用户可以将当前的设备信息设置为“WW N7”,也可以保留默认设置的“UU N7”,使得服务器可以获取到VR设备用户所设置的设备信息,从而确定VR设备用户趋于体验的用户类型。
203、服务器根据用户的用户类型所指示的视频显示模式,对群组视频会话的视频数据进行处理,得到用户的目标视频数据。
其中,目标视频数据的视频显示模式与用户的用户类型所指示的视频显示模式匹配。该步骤中,如果该用户的用户类型为普通用户,服务器确定该用户在参与本次群组视频会话时采用二维显示模式,并为该用户采用与二维显示模式对应的视频数据处理方式,如果该用户的用户类型为虚拟用户,服务器确定该用户在参与本次视频会话时采用虚拟现实显示模式,并为该用户采用与虚拟现实显示模式对应的视频数据处理方式。本发明实施例对具体的处理过程不做限定。下面,针对每种类型的用户对应的视频数据处理方式,分别进行介绍:
用户类型为普通用户时的处理过程如以下步骤203A-203C:
203A、如果该用户的用户类型为普通用户,服务器将群组视频会话中虚拟用户对应的三维虚拟人物转换为二维虚拟人物。
三维虚拟人物用于以三维图像数据表达虚拟用户的人物形象,使得在群组视频会话时可以将该用户显示为三维虚拟人物。该步骤中,服务器可以通过多种方式获取三维虚拟人物。例如,在虚拟用户确认进入群组视频会话之前,为虚拟用户提供多个三维虚拟人物,将虚拟用户所选择的三维虚拟人物作为该虚拟用户对应的三维虚拟人物。又例如,服务器获取该虚拟用户的用户属性,将与用户属性匹配的三维虚拟人物作为该虚拟用户对应的三维虚拟人物,该举例中,用户属性包括年龄、性别和职业等信息,以虚拟用户的用户属性是30岁 的女教师为例,服务器可以选择女教师形象的三维虚拟人物作为该虚拟用户对应的三维虚拟人物。
进一步地,服务器基于获取到的三维虚拟人物,可以将三维虚拟人物转换成二维虚拟人物,需要说明的是,该二维虚拟人物可以是静止的,也可以是动态的,本发明实施例对此不做限定。例如,为了节约服务器的运算资源,可以直接从三维虚拟人物对应的三维图像数据中提取出某一视角的二维图像数据,将该视角的二维图像数据作为二维虚拟人物,为了尽可能全面地表达虚拟用户,该视角可以是正面视角。又例如,为了形象地展示虚拟用户的行为,服务器可以获取三维虚拟人物和VR设备采集到的虚拟用户的行为特征数据,该行为特征数据包括虚拟用户的表情特征数据或肢体特征数据,进而,服务器可以根据行为特征数据确定三维虚拟人物的行为特征,生成与行为特征符合的三维虚拟人物,使得三维虚拟人物的行为与虚拟用户的行为同步,再将三维虚拟人物转换成二维虚拟人物,该具体处理过程可以参见如下述图10所示过程,在此不做详述。
203B、服务器对二维虚拟人物、虚拟用户选择的二维背景、以及虚拟用户对应的音频数据进行合成,得到第一二维视频数据。
基于步骤203A获取到的二维虚拟人物,为了给该用户提供更丰富的视觉效果,服务器还可以为该二维虚拟人物添加二维背景。该二维背景是指二维虚拟人物的背景,如二维会议背景和二维沙滩背景。服务器可以在为虚拟用户进入群组视频会话之前提供多个二维背景,或获取虚拟用户所选择的二维背景。事实上,服务器也可以通过其他方式获取该二维背景,例如,随机获取该虚拟用户对应的二维背景。又例如,为了尽可能给群组视频会话中的用户带来相同的体验效果,服务器可以该群组视频会话对应的虚拟环境所映射的二维图像数据作为二维背景,或者,服务器可以获取该虚拟环境的标签,将与该标签相同的二维图像数据作为二维背景,如,虚拟环境的标签为“森林”,服务器可以将标签为“森林”的二维图像数据作为二维背景,当然,该二维背景可以是静态的,也可以是动态的。
该步骤中,服务器可以确定二维虚拟人物在二维背景上的显示位置和合成尺寸,对二维虚拟人物原来的显示尺寸进行调整,得到符合合成尺寸的二维虚拟人物,将该二维虚拟人物合成至二维背景上对应的显示位置,且二维虚拟人物的图层在二维背景的图层之上,得到虚拟用户当前对应的图像数据。事实上, 服务器也可以确定二维背景上与显示位置和合成尺寸对应的显示区域,移除该显示区域内的像素点,并将该二维虚拟人物对应的图像数据嵌入该显示区域,从而将嵌入后的二维图像数据作为虚拟用户当前对应的图像数据。
在群组视频会话的过程中,当任一用户发言时,用户设备可以将所录制的音频数据实时发送至服务器,因此,当服务器接收到该虚拟用户对应的音频数据时,可以将当前的图像数据与音频数据进行合成,得到第一二维视频数据,以表达虚拟用户当前的言行。当然,如果服务器当前没有接收到该虚拟用户对应的音频数据,可以直接将当前的图像数据作为第一二维视频数据。
203C、服务器对至少一个第一二维视频数据与至少一个第二二维视频数据进行合成,得到该用户的目标视频数据。
第二二维视频数据是指群组视频会话中普通用户的二维视频数据。该步骤中,服务器确定群组视频会话中各个用户当前的二维视频数据的显示位置和合成尺寸,将各个用户当前的视频数据按照所确定的显示位置和合成尺寸,与虚拟环境合成为一份二维视频数据,且用户的二维视频数据的图层在虚拟环境的图层之上,将合成的二维视频数据作为该用户的目标视频数据。
需要说明的是,步骤202B和202C的两步合成过程也可以对应于一个合成过程,该合成过程中,服务器省略合成第一二维视频数据的步骤,直接对二维虚拟人物、二维背景、虚拟用户对应的音频数据和第二二维视频数据进行合成,从而得到目标视频数据。
用户类型为虚拟用户时的处理过程如以下步骤203D-203H:
203D、如果该用户的用户类型为虚拟用户,服务器确定群组视频会话对应的虚拟环境。
虚拟环境是指虚拟用户在群组视频会话时的三维背景,如,圆桌会议虚拟环境、沙滩虚拟环境和桌游虚拟环境等三维图像。本发明实施例对确定虚拟环境的具体方式不做限定。例如,服务器可以采用以下三种确定方式:
第一种确定方式、服务器将用户触发的虚拟环境选项对应的虚拟环境确定为用户在群组视频会话中对应的虚拟环境。
为使提供虚拟环境的过程更加人性化,服务器可以提供多样化的虚拟环境,并由用户自由选择群组视频会话时的虚拟环境。该确定方式中,服务器可以在VR设备(或者与VR设备绑定的终端)上提供至少一个虚拟环境选项和对应的虚拟环境缩略图,每个虚拟环境选项对应一个虚拟环境。当VR设备检 测到虚拟用户对某个虚拟环境选项的触发操作时,可以向服务器发送虚拟环境选项对应的虚拟环境标识,服务器获取到该虚拟环境标识时,可以将该虚拟环境标识对应的虚拟环境确定为该用户在群组视频会话时的虚拟环境。
第二种确定方式、根据群组视频会话中的用户数量,确定群组视频会话对应的虚拟环境的容量,将符合容量的虚拟环境确定为群组视频会话对应的虚拟环境。
为了给用户呈现合理的虚拟环境,以避免虚拟环境显得拥挤或者空旷,该确定方式中,服务器可以获取群组视频会话中的用户数量,从而确定虚拟环境应该具有的容量,该容量用于指示虚拟环境所能容纳的用户数量,例如,圆桌会议虚拟环境的容量对应于该虚拟环境中的座椅数量。进一步地,服务器根据所确定的容量,可以从已存储的多个虚拟环境中选择一个与该容量最相近的虚拟环境。例如,用户数量为12,服务器存储了三个圆桌会议虚拟环境,每个圆桌会议虚拟环境中的座椅数量为5、10和15,因此服务器可以将座椅数量为12的圆桌会议虚拟环境确定为该用户在群组视频会话时对应的虚拟环境。
第三种确定方式、分析群组视频会话中的每个用户选择过的虚拟环境,得到每个虚拟环境的被选择次数,将被选择次数最多的虚拟环境确定为群组视频会话对应的虚拟环境。
该确定方式中,服务器通过综合分析每个用户选择过的虚拟环境,得出了更多用户所偏爱的虚拟环境。例如,群组视频会话中有5个用户,每个用户选择虚拟环境的情况如表2所示,因此,服务器通过表2可以确定该虚拟环境1被选择次数最多(4次),将虚拟环境1确定为该用户在群组视频会话时对应的虚拟环境。
表2
用户 虚拟环境
A 虚拟环境1、虚拟环境2、
B 虚拟环境3、
C 虚拟环境1、
D 虚拟环境1、虚拟环境3、
E 虚拟环境1、
需要说明的是,在以上三种确定方式中,为了节省服务器的运算资源,服务器为某一用户确定虚拟环境后,可以直接将该用户对应的虚拟环境确定为群 组视频会话中每个虚拟用户对应的虚拟环境。
事实上,以上三种确定方式中的任意两种或三种确定方式也可以相结合,本发明实施例对结合方式不做限定。例如,第一种确定方式和第三种确定方式结合,如果服务器接收到该用户触发的虚拟环境标识,则确定虚拟环境标识对应的虚拟环境,否则,服务器采用第三种确定方式。
203E、以虚拟环境为三维背景,服务器确定群组视频会话中的每个用户在虚拟环境中的显示位置。
该步骤中,为使群组视频会话中各个用户合理地融入虚拟环境,服务器需要确定每个用户在虚拟环境中的显示位置,该显示位置是指普通用户的视频数据的合成位置或虚拟用户的三维虚拟人物的合成位置。本发明实施例对确定显示位置的方式不做限定,例如,对于该用户来说,可以默认该用户的视角为正面视角,使该用户对应的三维虚拟人物的朝向与正面视角的朝向一致。因此,该用户可以在群组视频会话中显示,也可以不显示,如果显示,参见图3,该用户可以对应图3中箭头所指的显示位置。另外,对于其他用户来说,服务器可以采用以下五种确定方式(确定方式1-确定方式5)来确定显示位置。
确定方式1、根据该用户与群组视频会话中其他用户之间的社交数据,分析用户与其他用户之间的亲密度,按照亲密度高低顺序从该用户的任一侧开始排列其他用户的显示位置。
为了营造更逼真的会话场景,该确定方式顾及了各个用户实际会话时的社交倾向,依据亲密度确定各个用户的显示位置。其中,社交数据不限于聊天次数、成为好友的时长和评论点赞次数等数据。本发明实施例对分析亲密度的方法不做限定。例如,以C表示亲密度,聊天次数以chat表示,权重为0.4;成为好友的时长以time表示,权重为0.3;评论点赞次数以comment表示,权重为0.3,则亲密度可以表示为:
C=0.4*chat+0.3*time+0.3*comment
因此,假如其他用户分别为用户1、用户2、用户3和用户4,这些用户与该用户之间的社交数据参见表3,以C1、C2、C3和C4表示与这些用户该用户之间的亲密度,则C1为37、C2为4、C3为82、C4为76。因此,服务器可以将距离该用户最近的位置确定为用户3的显示位置,并按照亲密度高低依次排列用户4、用户1和用户2的显示位置。
表3
用户 chat(次) time(天) comment(次)
用户1 10 100天 10次
用户2 1 10天 2次
用户3 40 200天 20次
用户4 100 100天 20次
确定方式2、获取其他用户的用户身份,将该用户的对面位置确定为其他用户中用户身份最高的用户的显示位置,并随机确定其他用户中剩余用户的显示位置。
为了突出某些用户在群组视频会话时的主导作用,服务器可以依据用户身份确定显示位置。其中,用户身份用于指示该用户在本次群组视频会话中的重要程度。本发明实施例对衡量用户身份的标准不做限定。例如,如果其他用户中用户A是群组视频会话的发起用户,说明用户A很可能主导本次群组视频会话,因此将用户A确定为身份最高的用户。又例如,如果其他用户中用户B是该群组视频会话对应的群组中的管理员,也可以将用户B确定为身份最高的用户。
确定方式3、按照其他用户加入群组视频会话的时间先后顺序,从用户的任一侧开始排列其他用户的显示位置。
为了确定显示位置的过程更加简便,节约服务器的运算资源,可以直接依据用户加入群组视频会话的时间确定显示位置。一般地,由用户自行确认是否加入群组视频会话,因此,当用户设备检测到某一用户对加入群组视频会话的确认操作时,可以向服务器发送确认加入消息,当服务器接收到该群组视频会话中的第一个确认加入消息时,可以将该确认加入消息对应的用户排列在与该用户距离最近的显示位置,并依次排列之后接收到的确认加入消息对应的用户的显示位置。
确定方式4、根据该用户在虚拟环境中选择的位置,将该用户所选择的位置确定为用户在虚拟环境中的显示位置。
为了确定显示位置的过程更加任性化,服务器也支持用户自行选择显示位置。该确定方式中,服务器可以在群组视频会话开始之前向每个用户提供虚拟环境模板,由每个用户在虚拟环境模板上自行选择显示位置,当然,为了避免各个用户在选择显示位置时发生冲突,服务器理应实时更显当前已被选择的显示位置,例如,当某一显示位置被选择时,服务器可以为该显示位置添加不可 选标记,使得各个用户在可选的显示位置中选择出显示位置。
确定方式5、将该用户的对面位置确定为普通用户的显示位置,并随机确定其他用户中剩余用户的显示位置。
考虑到普通用户一般以二维人物形式显示,在三维的虚拟环境中,为了避免该普通用户对应的二维视频数据失真,以尽可能展示普通用户的全貌,服务器可以将该用户的对面位置确定为普通用户的显示位置,并随机确定剩余用户的显示位置。
需要说明的是,每个用户理应对应一块显示区域,因此,当某一用户A选择一个显示位置时,服务器确定的是用户A所对应的显示区域。而且,为了在虚拟环境中显示各个用户时的间距更加均匀,服务器可以事先在虚拟环境中划分出显示区域,例如,对于圆桌会议虚拟环境,每个座椅处对应一块显示区域。
当然,以上五种确定方式中的任意两种或两种以上确定方式也可以相结合,例如,确定方式4和确定方式5结合,服务器先将该用户的对面位置确定为普通用户的显示位置,并向每个虚拟用户提供虚拟环境模板,且该虚拟环境模板上已为普通用户确定的显示位置处具有不可选标记,使得每个虚拟用户可以在可选的显示位置中自行选择一个显示位置。
203F、对于群组视频会话中的普通用户,服务器将普通用户的指定视频数据合成至该普通用户对应的显示位置。
指定视频数据是指基于接收到的普通用户的视频数据得到的符合虚拟现实显示模式的视频数据,该步骤中,由于普通用户包括第一普通用户和第二普通用户,第一普通用户是指使用双目摄像头的普通用户,第二普通用户是指使用单目摄像头的普通用户,两种普通用户的视频数据不同,因此服务器得到指定视频数据的方式也不同,本发明实施例以情况1和情况2进行说明:
情况1、如果普通用户包括第一普通用户,将第一普通用户的两路二维视频数据转换为第一三维视频数据,将第一三维视频数据作为指定视频数据,或,如果普通用户包括第一普通用户,将第一普通用户的两路二维视频数据作为指定视频数据。
该情况下,为了在虚拟环境中以三维人物的形式显示第一普通用户,服务器可以采用两种方式得到指定视频数据:
第一种方式、将两路二维视频数据转换成第一三维视频数据。由于两路二维视频数据分别对应从两个视角捕捉的普通用户的实际场景,以其中一路二维 视频数据的一个像素点为参照,确定另一路二维视频中与该像素点对应的像素点,这两个像素点对应实际场景中同一位置,从而确定两个像素点的视差,两路二维视频数据中的各个像素点经上述处理后,可以得到视差图,根据视差图构建出实际场景的三维图像数据。
第二种方式、直接将两路二维视频数据作为指定视频数据,在将指定视频数据发送至VR设备时,也发送指定显示指令,该指定显示指令用于指示VR设备将两路二维视频数据分别渲染在左右眼屏幕中,通过将不同视角的两路二维视频数据分别渲染在左右眼屏幕中,可以在显示时形成视差,达到三维显示效果。
情况2、如果普通用户包括第二普通用户,将第二普通用户的二维视频数据作为指定视频数据。
需要说明的是,本发明实施例对确定普通用户的用户类型的方式不做限定。例如,如果服务器同时接收到一个普通用户的两路二维视频数据,可以确定该普通用户的用户类型为第一普通用户,否则,可以确定该普通用户为第二普通用户。
基于步骤203E确定的显示位置以及该步骤202F得到的指定视频数据,服务器可以将该指定视频数据合成至该普通用户对应的显示位置。当然,为了显示效果更加真实,在合成之前,服务器可以根据预设设置的合成尺寸,将指定视频数据对应的显示尺寸调整至该合成尺寸,该合成尺寸可以通过虚拟环境与真实人物的比例确定,每个虚拟环境可以对应一个合成尺寸。
需要说明的是,由于该指定视频数据仅是一个视角(对于第二普通用户)或两个视角(对于第一普通用户)的视频数据,在合成时该指定视频数据仅占据虚拟环境中的二维空间位置。而且,每个普通用户的显示位置不同,为了给用户提供更好的显示效果,服务器可以在合成时为指定视频数据的图层边缘添加边框,使得指定视频数据的显示效果为渲染在虚拟环境中的“虚拟屏幕”上。当然,如果两个或两个以上的指定视频数据的显示位置相邻,服务器也可以在合成时为这些指定视频数据的图层边缘添加边框,使得两个或两个以上的普通用户能够显示在一个“虚拟屏幕”中。参见图4,本发明实施例提供了一种群组视频会话场景的示意图,如果图4中(a)图所示,一个普通用户在一个“虚拟屏幕”中显示,如图4中(b)图所示,两个普通用户在一个“虚拟屏幕”中显示。
203G、对于群组视频会话中的虚拟用户,服务器将虚拟用户的三维虚拟人物和音频数据合成至虚拟用户对应的显示位置。
该步骤中,服务器可以获取虚拟用户的三维虚拟人物(获取过程与步骤203A同理),将三维虚拟人物调整至合成尺寸,将调整后的三维虚拟人物合成至虚拟用户对应的显示位置,并将合成后的三维图像数据与获取到的虚拟用户的音频数据合成,得该虚拟用户的音视频数据。
203H、服务器将合成后的视频数据作为用户的目标视频数据。
通过步骤203F和203G的合成过程,服务器最终可以得到目标视频数据,该目标视频数据中包括了群组视频会话中每个虚拟用户对应的虚拟人物以及每个普通用户的视频数据。
204、在群组视频会话的进行过程中,服务器向用户的用户设备发送目标视频数据,使该用户进行群组视频会话。
对于群组视频会话中的每个用户来说,如果该用户的用户类型为普通用户,服务器可以将步骤203A-203C所得到的目标视频数据发送至该用户的终端,如果该用户的用户类型为虚拟用户,服务器可以将步骤203D-203H所得到的目标视频数据发送至该用户的VR设备,使得每个用户都能够进行群组视频会话。参见图5,本发明实施例提供了一种显示场景示意图。其中,以终端登录服务器的用户为终端用户,以VR设备登录服务器的用户为VR设备用户。
需要说明的是,在群组视频会话的过程中的某些用户也可以具有指定管理权限,指定管理权限是指在群组视频会话的过程中邀请或移除用户的权限,本发明实施例对哪些用户具有指定管理权限不做限定。例如,服务器可以将该指定管理权限对群组视频会话的发起用户开放。如图6所示,本发明实施例提供了一种虚拟用户进行群组视频会话的流程图。该虚拟用户可以邀请群组视频会话之外的其他用户进入群组视频会话,也可以将某一用户从群组视频会话中移除,也可以向其他用户发送私聊请求,或者接受其他用户的私聊请求。
205、当终端接收到服务器发送群组视频会话的目标视频数据时,显示目标视频数据,使群组视频会话中的普通用户以二维人物形式显示,群组视频会话中的虚拟用户以二维虚拟人物的形式显示。
终端用户的用户类型为普通用户,因此,终端用户在参与群组视频会话时采用二维显示模式。
由于各个用户的二维视频数据已在服务器侧按照显示位置和显示尺寸进 行合成,当终端接收到目标视频数据时,可以在屏幕上渲染该目标视频数据,从而在屏幕上的各个区域显示出普通用户的二维人物或虚拟用户对应的二维虚拟人物。
206、当VR设备接收到服务器发送群组视频会话的目标视频数据时,显示目标视频数据,使群组视频会话中的普通用户在虚拟环境中以二维人物或三维人物的形式显示,群组视频会话中的虚拟用户在虚拟环境中以三维虚拟人物的形式显示。
VR设备用户的用户类型为虚拟用户,因此,VR设备用户在参与群组视频会话时采用虚拟现实显示模式。
由于普通用户的二维视频数据或三维视频数据、以及虚拟用户对应的三维虚拟人物已在服务器侧按照显示位置进行合成,当VR设备接收到目标视频数据时,可以在VR设备的左右眼屏幕中渲染该目标视频数据,使得VR设备能够在普通用户对应的显示位置上,显示普通用户的二维人物或三维人物,而且在虚拟用户对应的显示位置上,显示虚拟用户的三维虚拟人物。
另外,为了明确提示VR设备用户正在发言的用户,基于目标视频数据,如果VR设备检测到群组视频会话中任一用户正在发言,在该用户对应的显示位置上显示发言提示。其中,发言提示的表现形式不限于“正在发言”的文字提示、箭头图标或闪烁图标等。本发明实施例对检测用户是否发言的方式不做限定。例如,当VR设备从当前的目标视频数据中检测到该用户的音频数据时,确定该用户正在发言,并进一步确定该用户对应的显示位置,在其显示位置上显示发言提示。
本发明实施例通过确定群组视频会话中每个用户的用户类型,根据用户类型处理群组视频会话的视频数据,从而当用户类型为虚拟用户时,可以得到与虚拟用户所指示的虚拟现实显示模式匹配的目标视频数据,当用户类型为普通用户时,可以得到与普通用户所指示的二维显示模式匹配的目标视频数据,从而为不同类型的用户采用合理的显示模式显示视频数据,使得不同类型的用户之间能够不受限制地进行群组视频会话,提高了群组视频会话的灵活性。
另外,当用户的用户类型为普通用户时,将群组视频会话中虚拟用户对应的三维虚拟人物转换为二维虚拟人物,并将二维虚拟人物与二维背景、音频数据进行合成,得到该虚拟用户的二维视频数据,使得虚拟用户的二维视频数据与该用户对应的二维显示模式匹配,从而为该用户提供了处理群组视频会话中 虚拟用户的视频数据的具体方式。
另外,当用户的用户类型为虚拟用户时,可以确定群组视频会话中各个用户在虚拟环境中的显示位置,将普通用户的二维视频数据以及虚拟用户的三维虚拟人物分别合成至对应的显示位置,使得合成的视频数据与该用户对应的虚拟现实显示模式匹配,从而为该用户提供了处理群组视频会话中虚拟用户的视频数据的具体方式。
另外,对于第一普通用户和第二普通用户,提供了不同的获取指定视频数据的方式:将第一普通用户的两路二维视频数据处理成第一三维视频数据,或直接将两路二维视频数据获取为指定视频数据,并告知VR设备显示方式;将第二普通用户的二维视频数据作为指定视频数据。通过两种不同的获取方式,可以智能地提供与普通用户的用户类型对应的显示效果。
另外,提供了至少三种确定群组视频会话对应的虚拟环境的具体方法,既可以支持用户自行选择虚拟环境,也可以根据群组视频会话中的用户数量,选定容量与用户数量匹配的虚拟环境,还可以分析每个用户曾经选择过的虚拟环境,选定被选择次数最多的虚拟环境,使得确定虚拟环境的方式更加多样。
另外,提供了至少五种确定方式,以确定每个用户在虚拟环境中的显示位置:依据用户之间的亲密度、用户身份或用户加入群组视频会话的时间,由服务器智能地为每个用户选择座位,或者,更加人性化地由用户自行选择显示位置,或者,为了尽可能展示普通用户的全貌,将普通用户的显示位置与该用户的正面视角相对。
图7是本发明实施例提供的一种群组视频会话的装置框图。参见图7,该装置具体包括:
创建模块701,创建群组视频会话;
确定模块702,对于群组视频会话中的每个用户,根据用户的设备信息,确定用户的用户类型,用户类型包括普通用户和虚拟用户,普通用户用于指示用户在参与群组视频会话时采用二维显示模式,虚拟用户用于指示用户在参与群组视频会话时采用虚拟现实显示模式;
处理模块703,用于根据用户的用户类型所指示的视频显示模式,对群组视频会话的视频数据进行处理,得到用户的目标视频数据,目标视频数据的视频显示模式与用户的用户类型所指示的视频显示模式匹配;
发送模块704,用于在群组视频会话的进行过程中,向用户的用户设备发送目标视频数据,使用户进行群组视频会话。
本发明实施例通过确定群组视频会话中每个用户的用户类型,根据用户类型处理群组视频会话的视频数据,从而当用户类型为虚拟用户时,可以得到与虚拟用户所指示的虚拟现实显示模式匹配的目标视频数据,当用户类型为普通用户时,可以得到与普通用户所指示的二维显示模式匹配的目标视频数据,从而为不同类型的用户采用合理的显示模式显示视频数据,使得不同类型的用户之间能够不受限制地进行群组视频会话,提高了群组视频会话的灵活性。
在一种可能实现方式中,处理模块703用于:如果用户的用户类型为普通用户,将群组视频会话中虚拟用户对应的三维虚拟人物转换为二维虚拟人物;对二维虚拟人物、虚拟用户选择的二维背景、以及虚拟用户对应的音频数据进行合成,得到第一二维视频数据;对至少一个第一二维视频数据与至少一个第二二维视频数据进行合成,得到用户的目标视频数据,第二二维视频数据是指群组视频会话中普通用户的二维视频数据。
在一种可能实现方式中,处理模块703用于:如果用户的用户类型为虚拟用户,确定群组视频会话对应的虚拟环境;以虚拟环境为三维背景,确定群组视频会话中的每个用户在虚拟环境中的显示位置;对于群组视频会话中的普通用户,将普通用户的指定视频数据合成至普通用户对应的显示位置;对于群组视频会话中的虚拟用户,将虚拟用户的三维虚拟人物和音频数据合成至虚拟用户对应的显示位置;将合成后的视频数据作为用户的目标视频数据。
在一种可能实现方式中,处理模块703还用于:如果普通用户包括第一普通用户,将第一普通用户的两路二维视频数据转换为第一三维视频数据,将第一三维视频数据作为指定视频数据,第一普通用户是指使用双目摄像头的普通用户,或,如果普通用户包括第一普通用户,将第一普通用户的两路二维视频数据作为指定视频数据;如果普通用户包括第二普通用户,将第二普通用户的二维视频数据作为指定视频数据,第二普通用户是指使用单目摄像头的普通用户。
在一种可能实现方式中,处理模块703用于:将用户触发的虚拟环境选项对应的虚拟环境确定为用户在群组视频会话中对应的虚拟环境;或,
处理模块703用于:根据群组视频会话中的用户数量,确定群组视频会话对应的虚拟环境的容量,将符合容量的虚拟环境确定为群组视频会话对应的虚 拟环境;或,
处理模块703用于:分析群组视频会话中的每个用户选择过的虚拟环境,得到每个虚拟环境的被选择次数,将被选择次数最多的虚拟环境确定为群组视频会话对应的虚拟环境。
在一种可能实现方式中,处理模块703用于:根据用户与群组视频会话中其他用户之间的社交数据,分析用户与其他用户之间的亲密度,按照亲密度高低顺序从用户的任一侧开始排列其他用户的显示位置;或,
处理模块703用于:获取其他用户的用户身份,将用户的对面位置确定为其他用户中用户身份最高的用户的显示位置,并随机确定其他用户中剩余用户的显示位置;或,
处理模块703用于:按照其他用户加入群组视频会话的时间先后顺序,从用户的任一侧开始排列其他用户的显示位置;或,
处理模块703用于:根据用户在虚拟环境中选择的位置,将用户所选择的位置确定为用户在虚拟环境中的显示位置;或,
处理模块703用于:将用户的对面位置确定为普通用户的显示位置,并随机确定其他用户中剩余用户的显示位置。
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施例,在此不再一一赘述。
图8是本发明实施例提供的一种群组视频会话的装置框图。参见图8,该装置具体包括:
接收模块801,用于接收服务器发送群组视频会话的目标视频数据,目标视频数据的视频显示模式与终端用户的用户类型所指示的视频显示模式匹配,终端用户的用户类型为普通用户,普通用户用于指示终端用户在参与群组视频会话时采用二维显示模式;
显示模块802,用于显示目标视频数据,使群组视频会话中的普通用户以二维人物形式显示,群组视频会话中的虚拟用户以二维虚拟人物的形式显示。
本发明实施例通过接收目标视频数据,由于目标视频数据是服务器根据用户类型处理得到,使得该目标视频数据与普通用户所指示的二维显示模式匹配,从而为终端用户采用合理的显示模式显示视频数据,使得不同类型的用户之间能够不受限制地进行群组视频会话,提高了群组视频会话的灵活性。
图9是本发明实施例提供的一种群组视频会话的装置框图。参见图9,该装置具体包括:
接收模块901,用于接收服务器发送群组视频会话的目标视频数据,目标视频数据的视频显示模式与VR设备用户的用户类型所指示的视频显示模式匹配,VR设备用户的用户类型为虚拟用户,虚拟用户用于指示VR设备用户在参与群组视频会话时采用虚拟现实显示模式;
显示模块902,用于显示目标视频数据,使群组视频会话中的普通用户在虚拟环境中以二维人物或三维人物的形式显示,群组视频会话中的虚拟用户在虚拟环境中以三维虚拟人物的形式显示。
本发明实施例通过接收目标视频数据,由于目标视频数据是服务器根据用户类型处理得到,使得该目标视频数据与虚拟用户所指示的二维显示模式匹配,从而为VR设备用户采用合理的显示模式显示视频数据,使得不同类型的用户之间能够不受限制地进行群组视频会话,提高了群组视频会话的灵活性。
在一种可能实现方式中,显示模块902用于:在普通用户对应的显示位置上,显示普通用户的二维人物或三维人物;在虚拟用户对应的显示位置上,显示虚拟用户的三维虚拟人物。
在一种可能实现方式中,显示模块902还用于:基于目标视频数据,如果检测到群组视频会话中任一用户正在发言,在用户对应的显示位置上显示发言提示。
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施例,在此不再一一赘述。
需要说明的是:上述实施例提供的群组视频会话的装置在群组视频会话时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的群组视频会话的装置与群组视频会话的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
进一步地,在进行群组视频会话时,每个虚拟用户的实际形象都有各自的特征,而VR设备上提供的虚拟人物有限,很可能与虚拟用户的真实形象大相 径庭,导致虚拟人物表达虚拟用户的效果差,群组视频会话时的视觉效果差,为此,本实施例还提供了更符合用户实际形象和实际动作的会话方法,以争抢群组视频会话时的视觉效果,该过程可以在上述实施例中对群组视频会话的视频数据进行处理,得到用户的目标视频数据的过程中进行,还可以在VR设备上生成用户的视频数据或是进行视频数据合成时进行,本公开实施例对此不做限定。
图10是本发明实施例提供的一种群组视频会话的方法流程图。参见图10,该方法可以应用于服务器或者VR设备,以服务器作为执行主体为例,该方法具体包括:
1001、服务器获取群组视频会话中第一用户的虚拟人物。
群组视频会话是指多个(两个或两个以上)用户基于服务器进行的视频会话。其中,多个用户可以是该服务器对应的社交平台上的多个用户,该多个用户之间可能是群组关系或好友关系。需要说明的是,该群组视频会话中的用户可以是使用VR设备的虚拟用户,也可以是使用传统终端(如,台式电脑、移动电话)的传统用户。
第一用户可以是该群组视频会话中的任一用户。第一用户的虚拟人物至少根据第一用户的头部特征数据和第一用户对应的肢体模型得到。本发明实施例对获取虚拟人物的时机不做限定。例如,当服务器为多个用户创建群组视频会话时,获取该多个用户中每个用户的虚拟人物。又例如,在群组视频会话的过程中,该第一用户接受该群组视频会话中某一用户的邀请,使得服务器确定该第一用户加入群组视频会话时,获取该第一用户的虚拟人物。
本发明实施例中,服务器可以根据第一用户的头部特征数据和对应的肢体模型,实时地为第一用户创建虚拟人物,从而获取到该虚拟人物。或者,服务器配置的虚拟人物数据库中也可能预先存储了第一用户的虚拟人物,因此服务器也可以根据第一用户的用户标识,在虚拟人物数据库中查询是否存在与用户标识对应的虚拟人物,如果是,则可以直接获取到该第一用户的虚拟人物,如果否,则可以实时地为该第一用户创建虚拟人物。事实上,虚拟人物数据库中预先存储的虚拟人物也是由服务器创建的,也即是,获取虚拟人物的过程包括创建过程。其中,基于创建过程获取到虚拟人物的过程可以采用以下步骤1001A-1001D进行获取:
1001A、服务器获取第一用户的头部特征数据。
该头部特征数据用于描述该第一用户的实际头部形象,可以用于指示该第一用户的头发区域、头发色调、脸部区域、脸部色调、五官位置和五官形态中的至少一项。其中,五官形态至少包括五官色调和五官轮廓。
本发明实施例对获取头部特征数据的方式不做限定。例如:
服务器获取第一用户的头部图像数据,对头部图像数据的色调分布进行分析,得到头部特征数据。该头部图像数据的来源可以有多种,如,第一用户的云相册中的头部图像数据(大头照),或者第一用户的摄像头当前拍摄的头部图像数据。当然,服务器也可以获取多张第一用户的头部图像,从而更全面地分析头部图像数据。事实上,服务器也可以提供拍摄提示,该拍摄提示用于提示用户以不同的拍摄角度进行拍摄,使得服务器能够获取到不同拍摄角度的头部图像数据,从而使得后续得到的头部模型与第一用户的实际形象更为匹配。
由于用户的头发、脸部和五官的色调各具特征(如,黄种人发色一般为黑色,脸部一般偏黄,眼部为黑白,嘴部为红色),且色调明暗(如,嘴部、鼻梁和脑门等相对突出的部分一般较亮,鼻翼和眼窝一般较暗)也各有不同,因此,服务器可以基于上述特征得到头部特征数据:
在确定脸部色调和脸部区域时,服务器可以将头部图像数据中的像素点的颜色值与已配置的多种肤色进行比较,如果超过第一比例的连续像素点的颜色值均与某一种肤色匹配,则可以将该肤色确定为脸部色调,并将匹配的连续像素点所构成的图像区域确定为脸部区域。
在确定头发色调和头发区域时,服务器可以将与脸部区域相邻的连续像素点确定为头发区域,并提取该连续像素点的颜色值作为头发色调。
在确定五官位置时,由于嘴部、眼睛和眉毛的色调与脸部色调不同,服务器可以将确定的脸部区域内的空心区域分别确定为嘴部、眼睛和眉毛位置。其中,眉毛的位置位于最上面,其次是眼睛,嘴部位于最下面。而且,由于耳部相对脸部向外侧突出,服务器可以确定脸部区域的两侧的边缘像素点,分析该边缘像素点的切线斜率,如果从像素点A到像素点B的切线斜率的变化率均满足预设变化率,则可以将像素点A至像素点B所在的区域确定为耳部位置。另外,由于鼻子相对脸部较为立体,一般在鼻子两侧和下面会形成阴影,且鼻梁亮度较高,因此服务器可以分析出脸部区域中明暗度高于第一明暗度的连续像素点,且位于该连续像素点两侧的连续像素点、下方的连续像素点的明暗度 低于第二明暗度,将这三部分连续像素点所在的区域确定为鼻子位置。根据上述确定的五官位置,服务器可以根据五官位置所在的边缘像素点所构成的形状确定为五官轮廓,将五官位置所在的像素点的颜色确定为五官色调,从而得到五官形态。当然,为了表征鼻子的立体程度,服务器可以记录鼻子位置中高于第一明暗度的像素点与低于第二明暗度的像素点的明暗度比例,该明暗度比例越高,表明第一用户的鼻子越立体。
事实上,以上获取头部特征数据的方式仅是示例性的,本发明实施例也可以采用任一种方式获取头部特征数据,例如,基于人脸模板的识别算法或利用神经网络进行识别的算法。
当然,服务器还可以继续对确定的头部特征数据进行修正,例如,根据该头部特征数据中的五官位置,确定五官比例,将该五官比例与已配置的正常五官比例进行比较,如果不符合正常五官比例,服务器可以适应性修正五官中某一部分的位置,使得五官比例符合正常五官比例。事实上,该正常五官比例用于指示正常的五官比例所处的范围,因此在比较过程中,该五官比例符合正常的五官比例所处的范围即可。
需要说明的是,为了节省服务器的运算资源,服务器也可以分析必要头部特征数据,必要头部特征数据用于简要地描述该第一用户的实际头部形象,如,必要头部特征数据可以用于指示脸部色调、五官位置和五官形态。
1001B、服务器根据头部特征数据,生成与头部特征数据匹配的头部模型。
基于步骤1001A获取到的头部特征数据,为了更细致地表达虚拟人物的头部模型,使其头部模型与第一用户的实际长相更为匹配,该步骤可以具体为:根据脸部区域和头发区域,确定头部轮廓模型,头部轮廓模型包括脸部轮廓模型和头发轮廓模型;根据脸部色调和头发色调,填充脸部轮廓模型和头发轮廓模型;获取与五官形态匹配的五官模型;按照五官位置,将五官模型合成至脸部轮廓模型,生成与头部特征数据匹配的头部模型。
例如,服务器确定脸部色调为乳白色、头发色调为棕色,则服务器可以根据脸部区域(头发区域)的边缘像素点构成的形状确定为脸部轮廓(头发轮廓),从而生成脸部轮廓模型(头发轮廓模型),从而确定头部轮廓模型,进而,服务器可以用乳白色填充脸部轮廓模型,得到脸部模型,用棕色填充头发轮廓模型,得到头发模型。进一步地,服务器可以将鼻子形态、嘴部形态等五官形态与五官模型数据库中卡通化的五官模型进行比较,获取与五官色调、五官轮廓 相似度最高的五官模型,并按照五官位置,将获取的五官模型分别合成至已填充的脸部轮廓模型上,按照脸部轮廓模型与头发轮廓模型的弧度,构建三维的头部模型,使得生成的头部模型与第一用户的实际头部形象匹配。
事实上,服务器也可以根据五官形态生成卡通化的五官模型,例如,用嘴部形态中的嘴部色调填充嘴部轮廓,并加深嘴部轮廓的两端连线上的像素点,生成嘴部模型,且嘴部模型呈“两瓣”效果。例如,眼部形态中的眼部色调至少包括两种,即眼球色调和眼白色调,眼白色调一般为偏白色调,因此,服务器是可以用眼部色调中的偏白色调填充眼部轮廓,用眼部色调中的另一色调填充眼部轮廓中的球型轮廓,该球型轮廓与眼部轮廓相切。
需要说明的是,为了更加细致地表达第一用户的头部形象,服务器还可以进一步处理该头部模型。例如,服务器为头发模型添加纹理,并获取该第一用户的年龄数据,在脸部模型上添加与该第一用户的年龄匹配的纹理。又例如,服务器获取该第一用户的性别数据,如果该第一用户为女性,则可以延长眼部模型上的睫毛长度,加强嘴部模型的亮度。又例如,服务器获取第一用户的职业数据,如果该第一用户为学生,则可以在脸部模型上添加眼镜模型。
1001C、服务器根据第一用户的用户属性,确定第一用户对应的肢体模型。
其中,用户属性不限于用户的性别、年龄和职业。一般地,用户会在社交平台上注册账号时填写用户属性,使得服务器能够得到用户属性,并将用户属性与用户标识对应存储。
由于用户的实际形象往往与性别、年龄、职业、身高、体重等用户属性密切相关,因此,为使虚拟人物更加符合第一用户的实际形象,服务器可以根据第一用户的用户标识,获取该用户标识对应的用户属性,进而,根据用户属性,从肢体模型数据库中选择与用户属性匹配的肢体模型。而且,服务器也会提供着装模型。
其中,本发明实施例对提供着装的方式不做限定。例如,该肢体模型中可以包括着装,或者,服务器也可以单独提供着装模型,该着装模型可以存储于肢体模型数据库,也可以存储于服务器配置的着装模型数据库。如果服务器单独提供着装模型,则可以将着装模型和对应的着装选项提供给第一用户,使得第一用户可以通过着装选项选择对应的着装模型。或者,服务器也可以获取第一用户的图像数据,确定图像数据中该第一用户所穿的服装,匹配出与第一用户所穿的服装对应的着装模型,将该着装模型提供给第一用户。在匹配着装模 型时,不限于根据服装颜色或形状进行匹配。或者,服务器可以根据用户属性确定该第一用户的着装模型,具体过程与下述确定肢体模型的过程类似。
另外,如果肢体模型中包括着装模型,服务器也可以采用以下至少三种用户属性确定肢体模型:
(1)、根据第一用户的性别数据,确定与第一用户的性别数据匹配的肢体模型。
一般地,男性身材较为强壮,女性身材较为弱小,因此,肢体模型数据库中可以针对男性和女性身材的特点,提供多种男性或女性专用的肢体模型,每个肢体模型对应一个性别标签,使得服务器可以根据性别标签,确定一个与该第一用户的性别数据匹配的肢体模型,而且,男性标签的肢体模型的着装可以为裤装,女性标签的肢体模型的着装可以为裙装。
(2)、根据第一用户的年龄数据,确定与第一用户的年龄数据匹配的肢体模型。
一般地,如果用户年龄越大,该用户的服装风格会更加成熟。因此,肢体模型数据库中可以针对用户所属的年龄段,提供多种服装风格的肢体模型,每个肢体模型对应一个年龄段标签,例如,着装上有漫画人物的肢体模型对应的年龄段标签为18岁以下,使得服务器可以根据年龄段标签,确定与该第一用户的年龄数据符合的肢体模型。
(3)、根据第一用户的职业数据,确定与第一用户的职业数据匹配的肢体模型。
在实际生活中,不同职业的用户的职业装也有所不同,因此,在肢体模型数据库中也可以提供多种身着职业装的肢体模型,每个肢体模型对应一个职业标签,例如,西装肢体模型对应的职业标签为白领,校服肢体模型对应的职业标签为学生,使得服务器可以根据职业标签,确定与该第一用户的职业数据符合的肢体模型。
需要说明的是,本发明实施例对每个肢体模型对应的标签的形式不做限定。例如,每个肢体模型可以同时对应上述至少两种标签,或者,每个肢体模型对应的一个标签同时具有两层含义,如,该标签为女教师标签。一旦肢体模型对应至少两种标签或对应的标签具有两层以上的含义,均可以使服务器可以根据至少两种用户属性,确定第一用户对应的肢体模型。例如,服务器根据第一用户的性别数据和职业数据,确定该第一用户为女医生,则可以从肢体模型 数据库中查找性别标签为女性、且职业标签为医生的肢体模型,或查找标签为女医生的肢体模型,均可以将查找到的肢体模型确定为该第一用户对应的肢体模型。
需要说明的是,在确定肢体模型时,除了根据用户属性,还可以参考群组视频会话对应的群组类型、群组视频会话中的虚拟环境以及当前的实际温度。该群组类型是指该群组视频会话中多个用户所属群组的群组类型。以下将分别说明参考上述三种数据确定肢体模型的具体方式:
确定方式1、服务器确定群组视频会话中多个用户所属群组的群组类型,将与群组类型匹配的肢体模型确定为与第一用户的肢体模型。例如,每个肢体模型对应一个群组类型标签,西装肢体模型可以对应公司群组标签,因此,当该群组类型为公司群组时,服务器可以查找到公司群组标签对应的西装肢体模型,将西装肢体模型确定为第一用户的肢体模型。
确定方式2、服务器确定群组视频会话对应的虚拟环境类型,将与虚拟环境类型匹配的肢体模型确定为第一用户的肢体模型。例如,该虚拟环境的类型为沙滩,则服务器可以将沙滩服肢体模型确定为该第一用户对应的肢体模型。
确定方式3、服务器获取当前的实际温度,将与当前的实际温度匹配的肢体模型确定为第一用户的肢体模型。例如,当前的实际温度为35度,则服务器可以将夏装肢体模型确定为该第一用户对应的肢体模型。
事实上,服务器确定第一用户的肢体模型时,也可以为第一用户提供调整选项。本发明实施例对调整选项和提供调整选项的方式不做具体限定。例如,服务器确定第一用户的初始肢体模型后,将初始肢体模型和调整选项提供给第一用户,该调整选项包括身高调整选项、体型调整选项和着装调整选项,第一用户可以通过触发身高调整选项调整身高的高低、触发体型调整选项调整体型的胖瘦、触发着装调整选项更换着装。
需要说明的是,该步骤1001C为本发明实施例的可选步骤,事实上,由于头部模型足以表征该第一用户的实际形象,为了实现过程简单,减少服务器的运算资源,也可以根据第一用户的性别数据随机从肢体模型数据库中选择一个与性别数据匹配的肢体模型即可。
另外,需要说明的是,本发明实施例对上述步骤1001A和1001C的时序不做限定。事实上,服务器也可以先确定肢体模型,或者,服务器同时确定头部模型和肢体模型。
1001D、服务器对头部模型和肢体模型进行合成,得到第一用户的虚拟人物。
通过步骤1001,服务器获取了用户的头部图像数据,进行了人脸和头发技术处理,获得了人脸和五官定位,依据五官模型数据和肢体模型数据库等生成头部模型,并确定肢体模型,将头部模型在肢体模型的上部进行合成,从而得到一个完整的虚拟人物。参见图11,本发明实施例提供了一种获取虚拟人物的流程图。
需要说明的是,为使得到的虚拟人物的视觉效果更好,服务器在合成时也可以结合头部模型与肢体模型的比例。例如,服务器按照第一用户的身高数据和已配置的正常人的头身比例数据,确定头部模型和肢体模型的合成尺寸,并将头部模型和肢体模型调整至确定的合成尺寸,再进行合成虚拟人物的过程,使得所得到的虚拟人物更加符合第一用户的实际形象。事实上,为使虚拟人物更具吸引力,服务器也可以合成“Q版”的虚拟人物,“Q版”的虚拟人物是指头身比例不符合正常人的头身比例的虚拟人物。一般地,为使“Q版”的虚拟人物更为可爱,其头身比例数据可以较为夸张,如,头身比例数据为1:1。服务器可以按照已配置的“Q版”的头身比例数据,确定头部模型和肢体模型的合成尺寸,并将头部模型和肢体模型调整至确定的合成尺寸,再进行合成,从而得到“Q版”的虚拟人物。
1002、服务器在群组视频会话的过程中,基于第一用户的虚拟人物和第一用户的行为特征数据,获取第一用户的视频数据。
其中,行为特征数据用于指示该第一用户的实际动作,至少包括表情特征数据、嘴型特征数据、头部方位特征数据和眼神方向特征数据中任一种。通过以上步骤1001,服务器获取到静态的虚拟人物,本发明实施例中,为使该虚拟人物动态化,服务器获取第一用户的视频数据,且该视频数据中第一用户的虚拟人物的动作与第一用户的实际动作匹配。本发明实施例对获取该视频数据的方式不做限定。例如,基于上述至少四种行为特征数据,本发明实施例提供了以下至少四种获取视频数据的方式:
获取方式1、行为特征数据包括表情特征数据时,当服务器检测到第一用户的表情特征数据为指定表情特征数据时,获取与指定表情特征数据对应的肢体特征数据;将指定表情特征数据实时映射至第一用户的虚拟人物的头部模型,并将肢体特征数据实时映射至第一用户的虚拟人物的肢体模型,得到第一 用户的视频数据。
为使虚拟人物更符合第一用户当前的实际形象,形象地表达第一用户的形态,服务器可以将指定表情特征数据和肢体特征数据联合映射至虚拟人物。该获取方式中,服务器可以实时获取第一用户的摄像头拍摄到的图像数据,标记并追踪图像数据中脸部区域和五官位置的像素点,或脸部区域和五官位置的关键像素点,从而捕获到该第一用户的表情特征数据,关键像素点用于基础性地描述五官位置和五官形态。进而,服务器可以比较该表情特征数据的像素点分布、与指定表情特征数据的像素点分布,该指定表情特征数据是指服务器已配置的表情特征数据,每个指定表情特征数据对应配置一个肢体特征数据,如果二者的相似度达到预设阈值,则检测到该表情特征数据为指定表情特征数据。
以指定表情特征数据为嘴部大张特征数据为例,若服务器捕获到的图像数据中的嘴部位置的像素点分布与嘴部大张特征数据的像素点分布匹配,可以获取与嘴部大张特征数据对应的手部捂嘴特征数据,因此,服务器可以为嘴部模型建立三维坐标,在三维坐标上根据嘴部大张特征数据指示的像素点分布调整嘴部模型的像素点分布,从而将嘴部大张特征数据映射至头部模型中的嘴部模型;同理,服务器也可以根据手部捂嘴特征数据指示的像素点分布调整手臂模型的像素点分布,从而将手部捂嘴特征数据映射至肢体模型中的手臂模型,使得虚拟人物动态化,进而得到第一用户的视频数据。
以指定表情特征数据为哭泣表情特征数据为例,若服务器捕获到的图像数据中的眼部位置的像素点分布与哭泣表情特征数据的像素点分布匹配,也可以获取与哭泣表情特征数据对应的手部揉眼特征数据,将哭泣表情特征数据映射至头部模型中的眼部模型,并根据手部揉眼特征数据指示的像素点分布调整手臂模型的像素点分布,从而将手部揉眼特征数据映射至肢体模型中的手臂模型。
需要说明的是,为使视频数据中的影像合理过渡,服务器也可以在连续多帧视频数据中渐次调整嘴部模型和手臂模型对应的像素点分布,从而得到能够反映虚拟人物动作变化的多帧视频数据。
该获取方式通过在检测到用户的实际人物形象的表情特征数据与已配置的指定表情特征数据匹配时,获取与指定表情特征数据匹配的肢体特征数据,并为该用户的虚拟人物赋予指定表情特征和肢体特征,从而得到视频数据,由于用户自身佩戴VR设备时不容易直接通过肢体动作表达自身情绪,该获取过 程不仅使得虚拟人物能够模拟用户的实际表情,更可以通过表情特征预测该用户的情绪,并以肢体特征突出表达用户的情绪,从而同时以表情和肢体动作联合的方式模拟用户的人物形象,使得虚拟人物的表现力和真实性更强。
获取方式2、行为特征数据包括嘴型特征数据时,服务器将第一用户的嘴型特征数据实时映射至第一用户的虚拟人物的头部模型,得到第一用户的视频数据。
为使第一用户的视频数据同步第一用户发言时的嘴部动作,当服务器接收到第一用户的音频数据时,获取已配置的嘴型特征数据,该嘴型特征数据用于指示嘴部持续处于开合状态,进而,服务器可以将该嘴型特征数据实时映射至头部模型中的嘴部模型,并将音频数据与映射后的虚拟人物进行合成,从而得到第一用户的视频数据,直到接收音频数据的过程结束,服务器取消映射嘴部模型的过程,并将嘴部模型恢复至默认状态,该默认状态是指嘴部模型保持闭合的状态。
获取方式3、行为特征数据包括头部方位特征数据时,服务器获取第一用户的传感器采集到的第一用户的头部方位数据;将第一用户的头部方位特征数据实时映射至第一用户的虚拟人物的头部模型,得到第一用户的视频数据。
为了使虚拟人物更加生动地表达第一用户的实际形象,服务器可以实时获取第一用户的传感器(如,VR设备上的九轴传感器)采集到的头部方位数据,该头部方位数据至少用于指示第一用户的俯仰角或左右旋转角,进而,服务器可以根据头部方位数据所指示的俯仰角或左右旋转角,相对该虚拟人物的肢体模型旋转该头部模型,从而将头部方位特征该数据实时映射至头部模型。
当然,为使获取的头部方位数据更加准确,服务器还可以结合第一用户的摄像头拍摄到的图像数据,参照图12,本发明实施例提供了一种获取头部方位数据的流程图。服务器可以获取摄像头捕获到的图像数据,根据图像数据中脸部区域的像素点变化,当脸部区域的像素点集中地向一侧偏移时,确定头部处于偏转状态,并将偏移方向的反方向确定为头部偏转方向(对于自拍的情况),并根据像素点的偏移量确定偏转角度,从而得到头部方位特征数据。在结合上述两种获取头部方位特征数据的方式时,服务器可以确定两项头部方位特征数据之间的数据误差,如果数据误差大于容错误差,可以重新进行获取头部方位特征数据的过程,如果数据误差小于容错误差,可以采用数据融合的方式得到头部特征数据,如,取头部特征数据的平均值作为正确的头部特征数据。
获取方式4、行为特征数据包括眼神方向特征数据,服务器获取第一用户的摄像头拍摄到的第一用户的眼部图像数据;根据第一用户的眼部图像数据,获取第一用户的眼神方向特征数据;将第一用户的眼神方向特征数据实时映射至第一用户的虚拟人物的头部模型,得到第一用户的视频数据。
为增强群组视频会话中各个用户之间的交互,服务器还可以获取眼神方向特征数据,该眼神方向特征数据用于指示第一用户的眼球相对眼部的位置,进而可以用于指示第一用户的眼神凝视方向。
由于眼球和眼白的色调不同,服务器可以锁定眼部图像数据中的眼球区域,并实时追踪眼球区域相对眼部的位置,从而获取到眼神方向特征数据。进一步地,服务器可以根据该眼神方向特征数据,调整眼部模型中的眼球位置,并生成得到视频数据,从而将眼神方向特征数据映射至头部模型中的眼部模型。
该获取方式通过拍摄到的眼部图像数据,获取用户的眼神方向特征数据,从而将用户的眼神方向特征数据实时映射至第一用户的虚拟人物的头部模型。不仅使得虚拟人物更加细致地表现用户的真实人物形象,使得虚拟人物与用户的真实人物形象更为匹配,而且能够在表现各个用户的眼神细节的基础上,增强各个用户在群组视频会话中的眼神交流,提高群组视频会话的效率。
事实上,步骤1002所得到的视频数据可作为第一用户的初始视频数据,为了给群组视频会话中的第二用户提供与其视角匹配的视频数据,服务器还可以进一步对初始视频数据进行处理,例如,参照图13,本发明实施例提供了一种获取视频数据的流程图,服务器获取第二用户的视角数据;按照第二用户的视角数据所指示的视角,对初始视频数据进行处理,得到与该视角匹配的第一用户的视频数据。
其中,本发明实施例对获取视角数据的方式不做限定。例如,服务器可以根据第二用户的传感器采集到的头部方位特征数据,得到第二用户的头部方位特征数据对应的视角数据。该举例中,服务器根据头部方位数据,可以确定旋转后的头部模型的朝向为第二用户的视角,从而获取到第二用户的视角数据。
又例如,服务器根据第二用户的摄像头拍摄到的眼部图像数据,获取第二用户的眼神方向特征数据,根据第二用户的眼神方向特征数据得到第二用户的视角数据。该举例中,服务器可以根据眼神方向特征数据所指示的眼球位置,以头部模型的中心指向眼球位置的方向确定为第二用户的视角,从而获取到该 视角数据。
进而,服务器可以基于第二用户的视角数据,确定该视角数据所指示的视角在初始视频数据中的视野范围,从而提取出该视野范围内的视频数据作为第一用户的视频数据。参照图14,本发明实施例提供了一种群组视频会话的流程图,该群组视频会话中,服务器可以通过获取虚拟人物,并实时跟踪第一用户的人脸和五官,从而获取到实时的视频数据,并实时地将该视频数据发送至第二用户所在终端。
1003、服务器向参与群组视频会话的第二用户所在终端发送第一用户的视频数据,以实现群组视频会话。
本发明实施例中,对于群组视频会话中的任一用户,服务器均可以按照步骤1001和1002得到该用户的视频数据,因此,为了同步显示各个用户的虚拟人物,服务器可以合成群组视频会话中每个用户的视频数据,将合成后的视频数据发送至第二用户所在终端。当第二用户所在终端接收到视频数据时,可以实时显示视频数据,且该视频数据与第二用户的视角匹配,从而实现群组视频会话。参照图15,本发明实施例提供了一种显示视频数据的流程图,服务器通过获取初始视频数据,按照第二用户的视角数据处理初始视频数据,将处理得到的视频数据发送至第二用户所在终端,使得第二用户所在终端能够按照第二用户的视角实时显示视频数据。需要说明的是,当第一用户所在VR设备作为本发明实施例的执行主体时,可以将该视频数据发送至服务器,通过服务器将该视频数据发送至第二用户所在终端。
本发明实施例通过在群组视频会话中第一用户的虚拟人物,且该虚拟人物根据第一用户的头部特征数据和对应的肢体模型得到,使得该虚拟人物能够匹配与第一用户的实际形象,而且,基于该虚拟人物和行为特征数据得到了该第一用户的视频数据,使得第一用户的虚拟人物的动作能够实时模拟第一用户的实际动作,从而更加灵动地表达第一用户的实际形象,增强了群组视频会话时的视觉效果。
另外,提供了获取虚拟人物的具体方式,根据头部特征数据,生成与头部特征数据匹配的头部模型,且根据第一用户的用户属性,确定与第一用户对应的肢体模型,通过合成头部模型和肢体模型得到虚拟人物,细化了虚拟人物各部分的获取过程,使得虚拟人物具有更加细致的特征,从而更加细致地表达第一用户的实际形象。而且,该肢体模型根据用户属性得到,使虚拟人物更加贴 近用户的实际形象。
另外,提供了获取头部特征数据的具体方式,通过分析第一用户的头部图像的色调分布,确定第一用户的头部特征数据,且该头部特征数据可用于指示第一用户的头发区域、头发色调、脸部区域、脸部色调、五官位置和五官形态,从而得到了第一用户的实际头部形象的多项特征,可以更加细致、全面地描述第一用户的实际头部形象。
另外,提供了生成与头部特征数据匹配的头部模型的具体过程,根据脸部区域和头发区域确定脸部轮廓模型和头发轮廓模型,根据脸部色调和头发色调进行填充,并按照五官位置,将与五官形态匹配的五官模型合成至脸部轮廓模型,细化了生成头部模型的过程,且头部模型中每个部分的生成过程均与第一用户的实际头部形象相匹配,从而提高了虚拟人物与第一用户实际形象的匹配程度。
另外,提供了至少三种确定第一用户的肢体模型的方式,根据第一用户的性别、年龄或职业等用户属性,确定与第一用户的用户属性匹配的肢体模型,而且,这三种确定方式也可以相结合,不仅使肢体模型更加符合第一用户的实际形象,而且使确定肢体模型的方式更加多样化。
另外,具体说明了当行为特征数据包括表情特征数据时,获取第一用户的视频数据的具体方式,当检测到表情特征数据为指定表情特征数据时,可以获取与该指定表情特征数据对应的肢体特征数据,从而将指定表情特征数据映射至脸部,将肢体特征数据映射至肢体模型,使得第一用户的虚拟人物的表达形式更加生动。
另外,具体说明了当行为特征数据包括嘴型特征数据、头部方位特征数据以及眼神方向特征数据时,获取第一用户的视频数据的具体方式,不仅使虚拟人物能更加生动地表达第一用户的实际形象,而且使得获取第一视频数据的方式更加多样化。
另外,提供了按照第二用户的视角数据所指示的视角,处理初始视频数据的方式,从而得到与第二用户的视角匹配的第一用户的视频数据,使得为第二用户展示第一用户的虚拟人物的视角更符合实际的视觉效果。
另外,提供了至少两种获取第二用户的视角数据的方式,根据第二用户的传感器采集到的头部方位特征数据,或者根据第二用户的摄像头拍摄到的眼部图像数据,得到视角数据,不仅能够实时地获取第二用户的视角,而且使得获 取视角数据的方式多样化。
图16是本发明实施例提供的一种群组视频会话的装置框图。参见图16,该装置具体包括:
虚拟人物获取模块1601,用于获取群组视频会话中第一用户的虚拟人物,第一用户的虚拟人物至少根据第一用户的头部特征数据和第一用户对应的肢体模型得到;
视频数据获取模块1602,用于在群组视频会话的过程中,基于第一用户的虚拟人物和第一用户的行为特征数据,获取第一用户的视频数据,视频数据中第一用户的虚拟人物的动作与第一用户的实际动作匹配;
发送模块1603,用于向参与群组视频会话的第二用户所在终端发送第一用户的视频数据,以实现群组视频会话。
本发明实施例通过在群组视频会话中第一用户的虚拟人物,且该虚拟人物根据第一用户的头部特征数据和对应的肢体模型得到,使得该虚拟人物能够匹配与第一用户的实际形象,而且,基于该虚拟人物和行为特征数据得到了该第一用户的视频数据,使得第一用户的虚拟人物的动作能够实时模拟第一用户的实际动作,从而更加灵动地表达第一用户的实际形象,增强了群组视频会话时的视觉效果。
可选地,虚拟人物获取模块1601用于:获取第一用户的头部特征数据;根据头部特征数据,生成与头部特征数据匹配的头部模型;根据第一用户的用户属性,确定第一用户对应的肢体模型;对头部模型和肢体模型进行合成,得到第一用户的虚拟人物。
可选地,虚拟人物获取模块1601用于:获取第一用户的头部图像数据;对头部图像数据的色调分布进行分析,得到头部特征数据,头部特征数据用于指示第一用户的头发区域、头发色调、脸部区域、脸部色调、五官位置和五官形态。
可选地,虚拟人物获取模块1601用于:根据脸部区域和头发区域,确定头部轮廓模型,头部轮廓模型包括脸部轮廓模型和头发轮廓模型;根据脸部色调和头发色调,填充脸部轮廓模型和头发轮廓模型;获取与五官形态匹配的五官模型;按照五官位置,将五官模型合成至脸部轮廓模型,生成与头部特征数据匹配的头部模型。
可选地,虚拟人物获取模块1601用于:根据第一用户的性别数据,确定与第一用户的性别数据匹配的肢体模型;和/或,虚拟人物获取模块1601用于:根据第一用户的年龄数据,确定与第一用户的年龄数据匹配的肢体模型;和/或,虚拟人物获取模块1601用于:根据第一用户的职业数据,确定与第一用户的职业数据匹配的肢体模型。
可选地,行为特征数据包括表情特征数据,视频数据获取模块1602用于:当检测到第一用户的表情特征数据为指定表情特征数据时,获取与指定表情特征数据对应的肢体特征数据;将指定表情特征数据实时映射至第一用户的虚拟人物的头部模型,并将肢体特征数据实时映射至第一用户的虚拟人物的肢体模型,得到第一用户的视频数据。
可选地,行为特征数据包括嘴型特征数据,视频数据获取模块1602用于:将第一用户的嘴型特征数据实时映射至第一用户的虚拟人物的头部模型,得到第一用户的视频数据。
可选地,行为特征数据包括头部方位特征数据,视频数据获取模块1602用于:获取第一用户的传感器采集到的第一用户的头部方位数据;将第一用户的头部方位特征数据实时映射至第一用户的虚拟人物的头部模型,得到第一用户的视频数据。
可选地,行为特征数据包括眼神方向特征数据,视频数据获取模块1602用于:获取第一用户的摄像头拍摄到的第一用户的眼部图像数据;根据第一用户的眼部图像数据,获取第一用户的眼神方向特征数据;将第一用户的眼神方向特征数据实时映射至第一用户的虚拟人物的头部模型,得到第一用户的视频数据。
可选地,视频数据获取模块1602用于:基于第一用户的虚拟人物和第一用户的行为特征数据,获取第一用户的初始视频数据;获取第二用户的视角数据;按照第二用户的视角数据所指示的视角,对初始视频数据进行处理,得到与视角匹配的第一用户的视频数据。
可选地,视频数据获取模块1602用于:根据第二用户的传感器采集到的头部方位特征数据,得到第二用户的头部方位特征数据对应的视角数据;或,视频数据获取模块1602用于:根据第二用户的摄像头拍摄到的眼部图像数据,获取第二用户的眼神方向特征数据,根据第二用户的眼神方向特征数据得到第二用户的视角数据。
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施例,在此不再一一赘述。
在进行群组视频会话过程中,不仅可以展示参与会话的各个用户的虚拟人物,还可以展示一些三维物体模型,并可以基于用户的一些操作来对三维物体模型进行一些角度变化等展示,参见下述图17所述的实施例:
图17是本发明实施例提供的一种群组视频会话的方法流程图。参见图17,该方法应用于服务器,具体包括:
1701、在群组视频会话过程中,服务器获取待展示的目标物的三维交互模型。
其中,群组视频会话是指多个(两个或两个以上)用户基于服务器进行的视频会话。其中,多个用户可以是该服务器对应的社交平台上的多个用户,该多个用户之间可能是群组关系或好友关系。目标物是指群组视频会话中某一用户想要展示的实物。三维交互模型是指根据目标物生成的三维模型,用于基于该群组视频会话中任一用户的控制展示在多个用户的视频数据中。例如,图18是本发明实施例提供的一种三维交互模型的示意图。参见图18,三维交互模型可以是三维几何模型、三维汽车模型和三维图表模型。
该步骤中,服务器可以通过多种方式获取三维交互模型。例如,服务器可以获取第五用户上传的三维物体模型。该举例中,三维交互模型可以是第五用户通过CAD(Computer Aided Design,计算机辅助设计)得到的模型,如,三维汽车模型。
又例如,服务器获取第六用户上传的二维表格,对二维表格进行处理,得到三维表格模型。该举例中,服务器可以通过EXCEL表格直接生成该二维表格对应的三维表格模型。或者,服务器也可以建立三维坐标模型(x,y,z)。例如,当二维表格中有两项参数时(如,班级和人数),服务器可以采用(x,y)平面的上的不同平面区域表示不同的“班级”参数值,且将每个“班级”参数值对应的“人数”参数值确定为该“班级”参数值对应的z坐标,从而生成柱状图形式的三维表格模型。当然,参照上述举例,服务器也可以生成其他形式的三维表格模型,如饼状图和条形图。而且,在生成三维表格模型时,服务器也可以设置三维表格模型的色调,如,不同的参数对应不同的色调。
事实上,服务器可以基于用户上传的目标物对应的至少一个二维图像数 据,对该目标物进行三维建模,如,采用SFS(Shape From Shading,明暗恢复形状)算法,从而得到三维交互模型。
其中,第五用户或第六用户均可以为群组视频会话中的任一用户。进一步地,该第五用户或第六用户也可以是具有上传权限的用户。本发明实施例对具有上传权限的用户不做限定。例如,该具有上传权限的用户为群组视频会话的发起者、或者VIP(Very Important People,贵宾)用户。
1702、服务器根据群组视频会话中多个用户中每个用户的视角,对目标物的三维交互模型进行处理,得到该用户的视频数据,该用户的视频数据包含对目标物的三维交互模型进行视角变换得到的模型数据。
该步骤中,服务器可以获取群组视频会话中每个用户的视角数据,根据该用户的视角数据和该用户的虚拟人物的显示位置,确定该用户的视角,进而,服务器可以提取出该视角对应的三维交互模型的图像数据,将提取的图像数据与会话环境数据进行合成,对合成后的图像数据进行立体编码,从而得到该用户的一帧一帧的视频数据。其中,本发明实施例对立体编码的方法不做限定。例如,根据交错显示原理,服务器将合成后的图像数据编码为两个图场的视频数据,两个图场即单数描线所构成的单图场与偶数描线所构成的偶图场,使得VR设备接收到视频数据时,可以将两个图场的视频数据交错显示于左右眼屏幕中,从而使得用户双眼产生视差,达到三维显示效果。另外,会话环境数据不限于群组视频会话对应的虚拟环境、多个用户分别对应的虚拟人物、每个用户的音频数据等。
需要说明的是,本发明实施例对获取视角数据的方式不做限定。例如,服务器可以根据用户的传感器采集到的头部方位特征数据,得到第二用户的头部方位特征数据对应的视角数据。又例如,服务器根据用户的摄像头拍摄到的眼部图像数据,获取用户的眼神方向特征数据,根据眼神方向特征数据所指示的眼球位置,确定该用户的视角数据。
事实上,为了更好地展示该三维交互模型,在得到视频数据之前,服务器还可以采用不同的方式确定该三维交互模型的显示位置。例如,服务器上配置有默认的显示位置,该默认的显示位置可以是多个用户对应的虚拟人物的对面位置。又例如,服务器将上传该三维交互模型的用户的旁边位置确定为显示位置,以方便该用户对三维交互模型进行演示说明。
本发明实施例中,为了进一步扩展群组视频会话中的交流方式,提高视频 会话的实际效率,当服务器接收到对三维交互模型的操作指令时,可以根据操作指令对应的操作方式对三维交互模型进行调整,并基于调整后的三维交互模型执行根据群组视频会话中多个用户中每个用户的视角进行处理和发送的步骤。其中,该操作指令用于指示按照对应的操作方式调整三维交互模型。本发明实施例对操作指令的获取方式不做限定。例如,服务器可以采用以下至少两种获取方式:
获取方式1、服务器获取第一用户的手势特征数据,当手势特征数据与三维交互模型的任一操作方式匹配时,确定接收到与操作方式对应的操作指令。
该手势特征数据用于表征该第一用户的手势,获取手势特征数据的方式可以有多种,如,摄像头或手势传感器。以第一用户的VR设备上的手势传感器为例,服务器可以获取该手势传感器采集到的手势特征数据,根据手势特征数据确定第一用户的手势,当该手势与预设手势(如,指向左方、右方、上方、或下方)匹配时,将预设手势对应的操作方式确定该手势匹配的操作方式,生成并获取与该操作方式对应的操作指令。本发明实施例对具体的操作方式不做限定。例如,参见表4,本发明实施例提供了一种预设手势和操作方式的对应关系:
表4
预设手势 操作方式
指向上方 向上移动三维交互模型
指向下方 向下移动三维交互模型
指向左方 向左旋转三维交互模型
指向右方 向右旋转三维交互模型
获取方式2、服务器获取第二用户对外接设备的操作信息,当操作信息与三维交互模型的任一操作方式匹配时,确定接收到操作方式对应的操作指令,外接设备与第二用户所在终端绑定。
该外接设备可以是鼠标或键盘。当服务器获取到第二用户对外界设备的操作信息时,可以判断是否存在与该操作信息对应的操作方式,如果是,则生成并获取与该操作方式对应的操作指令。参见表5,本发明实施例提供了一种预设手势和操作方式的对应关系:
表5
操作信息 操作方式
单击鼠标左键 放大三维交互模型
单击鼠标右键 缩小三维交互模型
长按鼠标左键进行移动 按鼠标移动方向旋转三维交互模型
当然,第一用户和第二用户可以是群组视频会话中的任一用户,也可以是对该三维交互模型具有操作权限的用户,本发明实施例对此不做限定。
在实际的应用场景中,为了智能地给用户提供交互服务,也可以提示用户可以操作三维交互模型、以及如何进行操作。本发明实施例对提示的时机不做限定。例如,在确定用户有操作三维交互模型的需求时,适时地进行提示:当服务器检测到第七用户对三维交互模型的凝视时长大于预设时长时,将操作提示信息发送至第七用户所在终端,操作提示信息用于提示第七用户能够对三维交互模型进行操作。
其中,对第七用户的说明与对第一用户的说明同理。上述举例中,服务器可以实时监测第七用户的眼神凝视方向,一旦检测到第七用户的眼神凝视方向对准该三维交互模型时,则进行计时,当计时的时长(即凝视时长)大于预设时长时,说明第七用户很可能有操作三维交互模型的需求,因此将操作提示信息发送至第七用户所在终端。其中,本发明实施例对操作提示信息包括的具体内容不做限定。以服务器支持鼠标进行操作为例,该操作提示信息可以包括“通过鼠标即可操作汽车模型”的文字提示信息、以及通过鼠标进行操作的具体方法,如,“单击鼠标左键可以放大汽车模型”和“单击鼠标右键可以缩小汽车模型”。
经过用户的操作过程,服务器可以获取到操作指令,并根据操作指令对应的操作方式对三维交互模型进行调整。本发明实施例对具体的调整过程不做限定。例如,操作指令分别为旋转操作指令、缩放操作指令和移位操作指令为例,对应的调整过程可以具体为:
调整过程1、当操作指令为旋转操作指令时,服务器获取旋转操作指令对应的旋转角度和旋转方向,按照旋转角度和旋转方向,旋转三维交互模型。
该调整过程中,服务器可以提取旋转操作指令中携带的旋转角度和旋转方向,并基于这两项参数旋和当前用户视角所见的三维交互模型,对三维交互模型进行旋转。其中,旋转角度和旋转方向在生成旋转操作指令时进行确定。本发明实施例对确定的具体方式不做限定。例如,当该旋转操作指令根据手势特征数据生成时,旋转方向可以与手势方向相同;旋转角度可以是默认的旋转角 度,如,30度,或者,根据手势的持续时长进行确定,如,旋转角度=持续时长(秒)*30度。又例如,当该旋转操作指令根据操作信息生成时,旋转方向可以与外接设备的移动方向一致,旋转角度可以根据外接设备的移动距离确定,如,旋转角度=移动距离(厘米)*10度。
调整过程2、当操作指令为缩放操作指令时,服务器获取缩放操作指令对应的缩小比例或放大比例,按照缩小比例和放大比例,缩小或放大三维交互模型。
该调整过程中,服务器可以提取缩放操作指令中携带的缩小比例或放大比例,并基于缩放比例和当前用户视角所见的三维交互模型,对三维交互模型进行缩放。其中,缩放比例可以在生成缩放操作指令时进行确定。本发明实施例对确定的具体方式不做限定。例如,当该缩放操作指令根据操作信息生成时,每次操作可对应默认的缩放比例,如,一次单击鼠标左键对应放大三维交互模型的10%。
调整过程3、当操作指令为移位操作指令时,服务器获取移位操作指令对应的移位方向和移位距离,按照移位方向和移位距离,对三维交互模型进行移位操作。
该调整过程中,服务器可以提取移位操作指令中携带的移位方向和移位距离,并基于这两项参数和当前用户视角所见的三维交互模型,对三维交互模型进行移位。其中,移位方向和移位距离可以在生成移位操作指令时进行确定。本发明实施例对确定的具体方式不做限定。例如,当该移位操作指令根据手势特征数据生成时,移位方向可以与手势方向相同;移位距离可以根据手势的持续时长进行确定,如,移位距离=持续时长(秒)*三维交互模型长度的10%。又例如,当该移位操作指令根据操作信息生成时,移位方向可以与外接设备的移动方向一致,移位距离可以根据外接设备的移动距离确定,如,移位距离=移动距离(厘米)*三维交互模型长度的5%。
当然,服务器可能同时接收到以上至少两个操作指令,此时,服务器既可以串行进行至少两个调整过程,也可以并行进行至少两个调整过程。例如,服务器同时接收到旋转操作指令和移位操作指令时,为了更清楚地展示三维交互模型的变化过程,服务器可以对三维交互模型先进行旋转,再进行移位;或者,为使调整过程与用户的操作过程相衔接,服务器可以同时对三维交互模型进行旋转和移位。
需要说明的是,在调整三维交互模型过程中,服务器可以对应调整过程实时生成一帧一帧的视频数据,也即是,根据当前调整的三维交互模型,服务器按照用户当前的视角,将当前调整的三维交互模型与会话环境数据进行合成和编码,得到当前的一帧视频数据,从而为用户展示三维交互模型的动态调整过程。
另外,需要说明的是,以上调整过程可以是服务器单独为各个用户提供服务,即按照每个用户触发的操作指令处理三维交互模型,并得到该用户的视频数据;而在操作三维交互模型需要操作权限时,服务器也可以根据具有操作权限的用户触发的操作指令,按照各个用户的视角处理三维交互模型,从而得到各个用户的视频数据。为了清楚地说明调整过程的流程,参见图19,本发明实施例提供了一种调整三维交互模型的流程图,服务器从获取三维交互模型、监测用户的眼神凝视方向、获取操作信息、进而根据操作信息对应的操作方式调整三维交互模型。
在群组视频会话的过程中,为使多个用户的视频会话有序进行,并突出某一用户的发言过程,当服务器接收到第三用户的发言请求时,可以生成指定视频数据,该指定视频数据用于展示虚拟话筒从虚拟主持人传递至第三用户的虚拟人物的过程;基于指定视频数据,执行根据群组视频会话中多个用户中每个用户的视角进行处理和发送的步骤。
其中,该第三用户可以是群组视频会话中的任一用户。本发明实施例对发言请求的触发方式不做限定。例如,当服务器接收到第三用户的音频数据时自动触发,或者,检测到第三用户的指定操作信息时触发得到,该指定操作信息可以为连续双击鼠标左键。虚拟主持人可以是服务器从虚拟人物数据库中获取的虚拟人物,也可以是群组视频会话中某一用户的虚拟人物。本发明实施例对服务器获取虚拟主持人的方式不做限定。例如,服务器根据群组视频会话对应的群组的群组属性,获取与群组属性匹配的虚拟主持人,如,群组属性为班级时,匹配的虚拟主持人的着装为校服,群组属性为公司时,匹配的虚拟主持人的着装为西装。又例如,服务器随机指定一个用户的虚拟人物为虚拟主持人,或者,在群组视频会话开始时,服务器向VR设备发送用于票选虚拟主持人的投票信息,该投票信息至少包括多个用户的用户信息,由VR设备根据投票信息显示投票界面,当任一用户A选中投票界面上的某个用户信息b时,服务器可以确定该用户A为用户信息b对应的用户B投票,进而,服务器可以统计 出得票数最多的用户,将该用户的虚拟人物作为虚拟主持人。
基于上述说明,当服务器接收到第三用户的发言请求时,可以根据第三用户在虚拟环境中的显示位置C、以及虚拟话筒当前的显示位置D,确定虚拟话筒的移动路径,该移动路径可以是D到C的路径(或者,服务器再根据虚拟主持人的显示位置E,将D到E到C的路径确定为移动路径),进而,服务器可以根据虚拟话筒的移动路径生成一帧一帧的指定视频数据,以动态地表征虚拟话筒的传递过程,进一步地,服务器可以按照每个用户的视角处理并发送视频数据。当然,为了更合理地显示虚拟话筒,在虚拟话筒到达第三用户的显示位置时,服务器可以确定第三用户的虚拟人物的手臂模型的抬起路径,使得生成的至少一帧指定视频数据对应手臂模型抬起并握住虚拟话筒的过程。另外,在传递过程中,服务器可以将虚拟主持人的指定音频数据合成至指定视频数据,该指定音频数据用于指示第三用户将要发言,可以包括“现在由第三用户发言”的一段语音。
事实上,除了上述传递虚拟话筒的方法,还可以通过其他方法突出某一用户的发言过程。例如,当服务器接收到第三用户的发言请求时,降低第四用户的音频数据的音量,第四用户为群组视频会话中除第三用户以外的用户;基于调整后的音频数据,执行根据群组视频会话中多个用户中每个用户的视角进行处理和发送的步骤。该举例中,服务器可以根据第三用户的音频数据的音量V1,将第四用户的音频数据的音量V2调整至小于V1。
需要说明的是,以上两种突出用户发言过程的方法也可以相结合,也即是,当服务器接收到第三用户的发言请求时,可以生成指定视频数据,该指定视频数据用于展示虚拟话筒从虚拟主持人传递至第三用户的虚拟人物的过程,且指定视频数据中第四用户的音频数据的音量被降低。
在实际的应用场景中,服务器有可能在第三用户发言时接收到第四用户的发言请求,此时,本发明实施例对服务器处理第四用户的发言请求的方式不做限定。例如,服务器暂存第四用户的发言请求,直到检测到第三用户的音频数据结束时,按照发言请求的接收顺序,以处理第三用户的发言请求的方式继续处理第四用户的发言请求。当然,在第四用户等待发言的过程中,服务器可以将发言提示信息发送至第四用户所在终端,该发言提示信息用户指示该第四用户何时发言,可以包括如“下一个发言的就是你哦”的文字信息。
本发明实施例中,为了进一步提高群组视频会话的效率,扩展群组视频会 话时的交互方式,当服务器接收到多媒体文件播放请求时,可以将与多媒体播放请求对应的多媒体文件合成至多个用户的视频数据。该多媒体文件如音频文件、视频文件或文本文件等。该多媒体文件播放请求可以直接携带该多媒体文件,也可以携带多媒体文件的文件标识,使得服务器从多媒体数据库或网络上获取到文件标识对应的多媒体文件。该扩展的交互方式中,本发明实施例对合成多媒体文件的方法不做限定。例如,当该多媒体文件为音频文件时,服务器可以将音频文件作为背景音频合成至视频数据中;当该多媒体文件为视频文件时,服务器可以按照每个用户的视角,将视频文件合成至该用户对面的虚拟环境中,使得视频文件以“屏幕播放”的方式嵌在虚拟环境中。
基于上述扩展的交互方式,参见图20,本发明实施例提供了一种交互流程图,服务器可以为用户1授权对三维交互模型的操作权限,授权用户2对多媒体文件的播放权限,因此,服务器可以基于用户1的操作信息调整三维交互模型,从而提供操作三维交互模型的服务,也可以基于用户2的多媒体文件播放请求将多媒体文件合成至视频数据,从而提供多媒体文件共享的服务。
1703、服务器将多个用户的视频数据分别发送至多个用户所在终端。
该步骤中,当终端接收到视频数据时,可以显示视频数据,由于该视频数据按照用户的视角进行处理,每个用户均可以从视频数据中看到自身视角的三维交互模型。
需要说明的是,当用户使用VR设备时,服务器可以直接将该视频数据发送至用户所在VR设备,当用户使用传统终端时,服务器可以在处理三维交互模型时,提取某一视角的二维视频数据,从而将二维视频数据发送至用户所在传统终端,使得多个用户可以不受设备类型的限制、自由交流。
本发明实施例通过获取待展示的目标物的三维交互模型,根据群组视频会话中每个用户的视角处理三维交互模型,得到对三维交互模型进行视角变换后的视频数据,并将该视频数据发送至多个用户所在终端,使得多个用户能够在群组视频会话时以自身视角体验同一三维交互模型,并通过三维交互模型进行交流,从而在扩展的交流方式的基础上提高视频会话的效率。
另外,当接收到对三维交互模型的操作指令时,可以按照操作指令对应的操作方式对三维交互模型进行调整,从而为用户提供了操作三维交互模型的服务,而且,可以基于调整后的三维交互模型将视频数据发送至多个用户,使得多个用户可以基于同一三维交互模型进行交互,进一步提高了视频会话的效 率。
另外,提供了至少两种获取操作指令的方式,可以通过第一用户的手势特征数据,当手势特征数据与三维交互模型的任一操作方式匹配时,确定接收到与操作方式对应的操作指令,还可以通过第二用户对外接设备的操作信息,当操作信息与某一操作方式匹配时,确定接收到该操作方式对应的操作指令,既可以智能地根据用户手势触发操作指令,也可以根据用户的操作信息触发操作指令,从而提供了多样化的操作指令的获取方式,可操作性更强。
另外,提供了至少三个根据操作指令调整三维交互模型的过程,如,根据旋转操作指令旋转三维交互模型、根据缩放操作指令缩小或放大三维交互模型以及根据移位操作指令对三维交互模型进行移位,从而提供了多样化的调整方式,增加了视频会话的交互强度,进一步提高了视频会话的效率。
另外,为使群组视频会话有序进行,并突出某一用户的发言过程,提供了至少两种处理发言请求的方法,如,生成指定视频数据,该指定视频数据用于展示虚拟话筒从虚拟主持人传递至第三用户的虚拟人物,或者,降低第四用户的音频数据的音量。
另外,提供了至少两种获取三维交互模型的方式,如,获取第五用户上传的三维物体模型,或者,获取第六用户上传的二维表格,并处理得到三维表格模型,从而能够提供多样化的三维交互模型。
另外,进一步扩展了视频会话时的交流方式,如,当接收到多媒体文件播放请求时,可以将多媒体文件合成至多个用户的视频数据,使得多个用户可以共享多媒体文件。
另外,为了提供智能的交互服务,从而提示用户能够操作三维交互模型、以及如何进行操作,当检测到第七用户对三维交互模型的凝视时长大于预设时长时,说明第七用户很可能有操作三维交互模型的需求,因此,可以将操作提示信息发送至第七用户所在终端,从而适时地提示第七用户操作三维交互模型。
图21是本发明实施例提供的一种群组视频会话的装置框图。参见图21,该装置具体包括:
交互模型获取模块2101,用于在群组视频会话过程中,获取待展示的目标物的三维交互模型;
处理模块2102,用于根据群组视频会话中多个用户中每个用户的视角,对目标物的三维交互模型进行处理,得到用户的视频数据,用户的视频数据包含对目标物的三维交互模型进行视角变换得到的模型数据;
发送模块2103,用于将多个用户的视频数据分别发送至多个用户所在终端。
本发明实施例通过获取待展示的目标物的三维交互模型,根据群组视频会话中每个用户的视角处理三维交互模型,得到对三维交互模型进行视角变换后的视频数据,并将该视频数据发送至多个用户所在终端,使得多个用户能够在群组视频会话时以自身视角体验同一三维交互模型,并通过三维交互模型进行交流,从而在扩展的交流方式的基础上提高视频会话的效率。
在一种可能实现方式中,基于图21的装置组成,参见图22,该装置还包括:调整模块2104;
调整模块2104,用于当接收到对三维交互模型的操作指令时,根据操作指令对应的操作方式对三维交互模型进行调整;
处理模块2102,用于基于调整后的三维交互模型执行根据群组视频会话中多个用户中每个用户的视角进行处理的步骤;
发送模块2103,用于对处理模块根据群组视频会话中多个用户中每个用户的视角处理后的视频数据进行发送的步骤。
在一种可能实现方式中,基于图21的装置组成,参见图23,该装置还包括:
手势获取模块2105,用于获取第一用户的手势特征数据,当手势特征数据与三维交互模型的任一操作方式匹配时,确定接收到与操作方式对应的操作指令;或,
操作信息获取模块2106,用于获取第二用户对外接设备的操作信息,当操作信息与三维交互模型的任一操作方式匹配时,确定接收到操作方式对应的操作指令,外接设备与第二用户所在终端绑定。
在一种可能实现方式中,调整模块2104用于:当操作指令为旋转操作指令时,获取旋转操作指令对应的旋转角度和旋转方向,按照旋转角度和旋转方向,旋转三维交互模型;和/或,调整模块用于:当操作指令为缩放操作指令时,获取缩放操作指令对应的缩小比例或放大比例,按照缩小比例和放大比例,缩小或放大三维交互模型;和/或,调整模块用于:当操作指令为移位操作指令时, 获取移位操作指令对应的移位方向和移位距离,按照移位方向和移位距离,对三维交互模型进行移位操作。
在一种可能实现方式中,基于图21的装置组成,参见图24,该装置还包括:
生成模块2107,用于当接收到第三用户的发言请求时,生成指定视频数据,指定视频数据用于展示虚拟话筒从虚拟主持人传递至第三用户的虚拟人物的过程;
处理模块2102,用于基于指定视频数据,执行根据群组视频会话中多个用户中每个用户的视角进行处理的步骤;
发送模块2103,用于对处理模块根据群组视频会话中多个用户中每个用户的视角处理后的指定视频数据进行发送的步骤。
在一种可能实现方式中,基于图21的装置组成,参见图25,该装置还包括:
降低模块2108,用于当接收到第三用户的发言请求时,降低第四用户的音频数据的音量,第四用户为群组视频会话中除第三用户以外的用户;
处理模块2102,用于基于调整后的音频数据,执行根据群组视频会话中多个用户中每个用户的视角进行处理的步骤;
发送模块2103,用于对处理模块根据群组视频会话中多个用户中每个用户的视角处理后的视频数据进行发送的步骤。
在一种可能实现方式中,交互模型获取模块2101用于:获取第五用户上传的三维物体模型;或,交互模型获取模块用于2101:获取第六用户上传的二维表格,对二维表格进行处理,得到三维表格模型。
在一种可能实现方式中,基于图21的装置组成,参见图26,该装置还包括:合成模块2109,用于当接收到多媒体文件播放请求时,将与多媒体播放请求对应的多媒体文件合成至多个用户的视频数据。
在一种可能实现方式中,发送模块2103还用于:当检测到第七用户对三维交互模型的凝视时长大于预设时长时,将操作提示信息发送至第七用户所在终端,操作提示信息用于提示第七用户能够对三维交互模型进行操作。
上述所有可选技术方案,可以采用任意结合形成本发明的可选实施例,在此不再一一赘述。
需要说明的是:上述实施例提供的群组视频会话的装置在群组视频会话时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的群组视频会话的装置与群组视频会话的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图27示出了本发明一个示例性实施例提供的终端2700的结构框图。该终端2700可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端2700还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端2700包括有:处理器2701和存储器2702。
处理器2701可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器2701可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器2701也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器2701可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器2701还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器2702可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器2702还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器2702中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器2701所执行以实现本申请中方法实施例提供的XXXX方法。
在一些实施例中,终端2700还可选包括有:外围设备接口2703和至少一个外围设备。处理器2701、存储器2702和外围设备接口2703之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口2703相连。具体地,外围设备包括:射频电路2704、触摸显示屏2705、摄像头2706、音频电路2707、定位组件2708和电源2709中的至少一种。
外围设备接口2703可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器2701和存储器2702。在一些实施例中,处理器2701、存储器2702和外围设备接口2703被集成在同一芯片或电路板上;在一些其他实施例中,处理器2701、存储器2702和外围设备接口2703中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路2704用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路2704通过电磁信号与通信网络以及其他通信设备进行通信。射频电路2704将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路2704包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路2704可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:城域网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路2704还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏2705用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏2705是触摸显示屏时,显示屏2705还具有采集在显示屏2705的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器2701进行处理。此时,显示屏2705还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏2705可以为一个,设置终端2700的前面板;在另一些实施例中,显示屏2705可以为至少两个,分别设置在终端2700的不同表面或呈折叠设计;在再一些实施例中,显示屏2705可以是柔性显示屏,设置在终端2700的弯曲表面上或折叠面上。甚至,显示屏2705还可以设置成非矩形的不规则图形,也即异形屏。显示屏2705可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件2706用于采集图像或视频。可选地,摄像头组件2706包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件2706还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路2707可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器2701进行处理,或者输入至射频电路2704以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端2700的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器2701或射频电路2704的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路2707还可以包括耳机插孔。
定位组件2708用于定位终端2700的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件2708可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统、俄罗斯的格雷纳斯系统或欧盟的伽利略系统的定位组件。
电源2709用于为终端2700中的各个组件进行供电。电源2709可以是交流电、直流电、一次性电池或可充电电池。当电源2709包括可充电电池时,该可充电电池可以支持有线充电或无线充电。该可充电电池还可以用于支持快充技术。
在一些实施例中,终端2700还包括有一个或多个传感器2710。该一个或多个传感器2710包括但不限于:加速度传感器2711、陀螺仪传感器2712、压力传感器2713、指纹传感器2714、光学传感器2715以及接近传感器2716。
加速度传感器2711可以检测以终端2700建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器2711可以用于检测重力加速度在三个坐标 轴上的分量。处理器2701可以根据加速度传感器2711采集的重力加速度信号,控制触摸显示屏2705以横向视图或纵向视图进行用户界面的显示。加速度传感器2711还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器2712可以检测终端2700的机体方向及转动角度,陀螺仪传感器2712可以与加速度传感器2711协同采集用户对终端2700的3D动作。处理器2701根据陀螺仪传感器2712采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器2713可以设置在终端2700的侧边框和/或触摸显示屏2705的下层。当压力传感器2713设置在终端2700的侧边框时,可以检测用户对终端2700的握持信号,由处理器2701根据压力传感器2713采集的握持信号进行左右手识别或快捷操作。当压力传感器2713设置在触摸显示屏2705的下层时,由处理器2701根据用户对触摸显示屏2705的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器2714用于采集用户的指纹,由处理器2701根据指纹传感器2714采集到的指纹识别用户的身份,或者,由指纹传感器2714根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器2701授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器2714可以被设置终端2700的正面、背面或侧面。当终端2700上设置有物理按键或厂商Logo时,指纹传感器2714可以与物理按键或厂商Logo集成在一起。
光学传感器2715用于采集环境光强度。在一个实施例中,处理器2701可以根据光学传感器2715采集的环境光强度,控制触摸显示屏2705的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏2705的显示亮度;当环境光强度较低时,调低触摸显示屏2705的显示亮度。在另一个实施例中,处理器2701还可以根据光学传感器2715采集的环境光强度,动态调整摄像头组件2706的拍摄参数。
接近传感器2716,也称距离传感器,通常设置在终端2700的前面板。接近传感器2716用于采集用户与终端2700的正面之间的距离。在一个实施例中,当接近传感器2716检测到用户与终端2700的正面之间的距离逐渐变小时,由 处理器2701控制触摸显示屏2705从亮屏状态切换为息屏状态;当接近传感器2716检测到用户与终端2700的正面之间的距离逐渐变大时,由处理器2701控制触摸显示屏2705从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图27中示出的结构并不构成对终端2700的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
图28是本发明实施例提供的一种网络设备的结构示意图,该网络设备2800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)2801和一个或一个以上的存储器2802,其中,所述存储器2802中存储有至少一条指令,所述至少一条指令由所述处理器2801加载并执行以实现上述各个方法实施例提供的方法。当然,该网络设备还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该网络设备还可以包括其他用于实现设备功能的部件,在此不做赘述。
在示例性实施例中,还提供了一种计算机可读存储介质,例如包括指令的存储器,上述指令可由终端中的处理器执行以完成下述实施例中的资源发放方法或资源领取方法。例如,所述计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (47)

  1. 一种群组视频会话的方法,其特征在于,应用于网络设备,所述方法包括:
    创建群组视频会话;
    对于所述群组视频会话中的每个用户,根据所述用户的设备信息,确定所述用户的用户类型,所述用户类型包括普通用户和虚拟用户,所述普通用户用于指示所述用户在参与所述群组视频会话时采用二维显示模式,所述虚拟用户用于指示所述用户在参与所述群组视频会话时采用虚拟现实显示模式;
    根据所述用户的用户类型所指示的视频显示模式,对所述群组视频会话的视频数据进行处理,得到所述用户的目标视频数据,所述目标视频数据的视频显示模式与所述用户的用户类型所指示的视频显示模式匹配;
    在所述群组视频会话的进行过程中,向所述用户的用户设备发送目标视频数据,使所述用户进行群组视频会话。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述用户的用户类型所指示的视频显示模式,对所述群组视频会话的视频数据进行处理,得到所述用户的目标视频数据包括:
    如果所述用户的用户类型为普通用户,将所述群组视频会话中虚拟用户对应的三维虚拟人物转换为二维虚拟人物;
    对所述二维虚拟人物、所述虚拟用户选择的二维背景、以及所述虚拟用户对应的音频数据进行合成,得到第一二维视频数据;
    对至少一个第一二维视频数据与至少一个第二二维视频数据进行合成,得到所述用户的目标视频数据,所述第二二维视频数据是指所述群组视频会话中普通用户的二维视频数据。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述用户的用户类型所指示的视频显示模式,对所述群组视频会话的视频数据进行处理,得到所述用户的目标视频数据包括:
    如果所述用户的用户类型为虚拟用户,确定所述群组视频会话对应的虚拟 环境;
    以所述虚拟环境为三维背景,确定所述群组视频会话中的每个用户在所述虚拟环境中的显示位置;
    对于所述群组视频会话中的普通用户,将所述普通用户的指定视频数据合成至所述普通用户对应的显示位置;
    对于所述群组视频会话中的虚拟用户,将所述虚拟用户的三维虚拟人物和音频数据合成至所述虚拟用户对应的显示位置;
    将合成后的视频数据作为所述用户的目标视频数据。
  4. 根据权利要求3所述的方法,其特征在于,所述对于所述群组视频会话中的普通用户,将所述普通用户的指定视频数据合成至所述普通用户对应的显示位置之前,所述方法还包括:
    如果所述普通用户包括第一普通用户,将所述第一普通用户的两路二维视频数据转换为第一三维视频数据,将所述第一三维视频数据作为所述指定视频数据,所述第一普通用户是指使用双目摄像头的普通用户,或,如果所述普通用户包括所述第一普通用户,将所述第一普通用户的两路二维视频数据作为所述指定视频数据;
    如果所述普通用户包括第二普通用户,将所述第二普通用户的二维视频数据作为所述指定视频数据,所述第二普通用户是指使用单目摄像头的普通用户。
  5. 根据权利要求3所述的方法,其特征在于,所述确定所述群组视频会话对应的虚拟环境包括:
    将所述用户触发的虚拟环境选项对应的虚拟环境确定为所述用户在所述群组视频会话中对应的虚拟环境;或,
    根据所述群组视频会话中的用户数量,确定所述群组视频会话对应的虚拟环境的容量,将符合所述容量的虚拟环境确定为所述群组视频会话对应的虚拟环境;或,
    分析所述群组视频会话中的每个用户选择过的虚拟环境,得到每个虚拟环境的被选择次数,将被选择次数最多的虚拟环境确定为所述群组视频会话对应的虚拟环境。
  6. 根据权利要求3所述的方法,其特征在于,所述确定所述群组视频会话中的每个用户在所述虚拟环境中的显示位置包括:
    根据所述用户与所述群组视频会话中其他用户之间的社交数据,分析所述用户与所述其他用户之间的亲密度,按照亲密度高低顺序从所述用户的任一侧开始排列所述其他用户的显示位置;或,
    获取所述其他用户的用户身份,将所述用户的对面位置确定为所述其他用户中用户身份最高的用户的显示位置,并随机确定所述其他用户中剩余用户的显示位置;或,
    按照所述其他用户加入所述群组视频会话的时间先后顺序,从所述用户的任一侧开始排列所述其他用户的显示位置;或,
    根据所述用户在所述虚拟环境中选择的位置,将所述用户所选择的位置确定为所述用户在所述虚拟环境中的显示位置;或,
    将所述用户的对面位置确定为所述普通用户的显示位置,并随机确定所述其他用户中剩余用户的显示位置。
  7. 根据权利要求1所述的方法,其特征在于,所述方法包括:
    在创建群组视频会话时,获取群组视频会话中第一用户的虚拟人物,所述第一用户的虚拟人物至少根据所述第一用户的头部特征数据和所述第一用户对应的肢体模型得到;
    在所述群组视频会话过程中,基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据,所述视频数据中所述第一用户的虚拟人物的动作与所述第一用户的实际动作匹配。
  8. 根据权利要求7所述的方法,其特征在于,所述获取群组视频会话中第一用户的虚拟人物包括:
    获取所述第一用户的头部特征数据;
    根据所述头部特征数据,生成与所述头部特征数据匹配的头部模型;
    根据所述第一用户的用户属性,确定所述第一用户对应的肢体模型;
    对所述头部模型和所述肢体模型进行合成,得到所述第一用户的虚拟人物。
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述第一用户的用 户属性,确定所述第一用户对应的肢体模型包括:
    根据所述第一用户的性别数据,确定与所述第一用户的性别数据匹配的肢体模型;和/或,
    根据所述第一用户的年龄数据,确定与所述第一用户的年龄数据匹配的肢体模型;和/或,
    根据所述第一用户的职业数据,确定与所述第一用户的职业数据匹配的肢体模型。
  10. 根据权利要求7所述的方法,其特征在于,所述行为特征数据包括表情特征数据,所述基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据包括:
    当检测到所述第一用户的表情特征数据为指定表情特征数据时,获取与所述指定表情特征数据对应的肢体特征数据;
    将所述指定表情特征数据实时映射至所述第一用户的虚拟人物的头部模型,并将所述肢体特征数据实时映射至所述第一用户的虚拟人物的肢体模型,得到所述第一用户的视频数据。
  11. 根据权利要求7所述的方法,其特征在于,所述行为特征数据包括嘴型特征数据,所述基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据包括:
    将所述第一用户的嘴型特征数据实时映射至所述第一用户的虚拟人物的头部模型,得到所述第一用户的视频数据。
  12. 根据权利要求7所述的方法,其特征在于,所述行为特征数据包括头部方位特征数据,所述基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据包括:
    获取所述第一用户的传感器采集到的所述第一用户的头部方位数据;
    将所述第一用户的头部方位特征数据实时映射至所述第一用户的虚拟人物的头部模型,得到所述第一用户的视频数据。
  13. 根据权利要求7所述的方法,其特征在于,所述行为特征数据包括眼 神方向特征数据,所述基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据包括:
    获取所述第一用户的摄像头拍摄到的所述第一用户的眼部图像数据;
    根据所述第一用户的眼部图像数据,获取所述第一用户的眼神方向特征数据;
    将所述第一用户的眼神方向特征数据实时映射至所述第一用户的虚拟人物的头部模型,得到所述第一用户的视频数据。
  14. 根据权利要求7所述的方法,其特征在于,所述基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据包括:
    基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的初始视频数据;
    获取所述第二用户的视角数据;
    按照所述第二用户的视角数据所指示的视角,对所述初始视频数据进行处理,得到与所述视角匹配的所述第一用户的视频数据。
  15. 根据权利要求1所述的方法,其特征在于,所述方法包括:
    在群组视频会话过程中,获取待展示的目标物的三维交互模型;
    根据所述群组视频会话中多个用户中每个用户的视角,在所述群组视频会话过程中,对所述目标物的三维交互模型进行处理,得到所述用户的视频数据,所述用户的视频数据包含对所述目标物的三维交互模型进行视角变换得到的模型数据;
    将所述多个用户的视频数据分别发送至所述多个用户所在终端。
  16. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当接收到第三用户的发言请求时,降低第四用户的音频数据的音量,所述第四用户为所述群组视频会话中除第三用户以外的用户。
  17. 根据权利要求15所述的方法,其特征在于,所述获取待展示的目标物的三维交互模型包括:
    获取第五用户上传的三维物体模型;或,
    获取第六用户上传的二维表格,对所述二维表格进行处理,得到三维表格模型。
  18. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    当接收到多媒体文件播放请求时,将与所述多媒体播放请求对应的多媒体文件合成至所述群组视频会话中多个用户的视频数据。
  19. 根据权利要求15所述的方法,其特征在于,在群组视频会话过程中,获取待展示的目标物的三维交互模型之后,所述方法还包括:
    当检测到第七用户对所述三维交互模型的凝视时长大于预设时长时,将操作提示信息发送至所述第七用户所在终端,所述操作提示信息用于提示所述第七用户能够对所述三维交互模型进行操作。
  20. 一种群组视频会话的方法,其特征在于,应用于终端,所述方法包括:
    接收网络设备发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与终端用户的用户类型所指示的视频显示模式匹配,所述终端用户的用户类型为普通用户,所述普通用户用于指示所述终端用户在参与所述群组视频会话时采用二维显示模式;
    显示所述目标视频数据,使群组视频会话中的普通用户以二维人物形式显示,所述群组视频会话中的虚拟用户以二维虚拟人物的形式显示。
  21. 一种群组视频会话的方法,其特征在于,应用于虚拟现实VR设备,所述方法包括:
    接收网络设备发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与VR设备用户的用户类型所指示的视频显示模式匹配,所述VR设备用户的用户类型为虚拟用户,所述虚拟用户用于指示所述VR设备用户在参与所述群组视频会话时采用虚拟现实显示模式;
    显示所述目标视频数据,使群组视频会话中的普通用户在虚拟环境中以二维人物或三维人物的形式显示,所述群组视频会话中的虚拟用户在所述虚拟环境中以三维虚拟人物的形式显示。
  22. 根据权利要求21所述的方法,其特征在于,所述显示所述目标视频数据包括:
    在所述普通用户对应的显示位置上,显示所述普通用户的二维人物或三维人物;
    在所述虚拟用户对应的显示位置上,显示所述虚拟用户的三维虚拟人物。
  23. 根据权利要求21所述的方法,其特征在于,所述方法还包括:
    基于所述目标视频数据,如果检测到所述群组视频会话中任一用户正在发言,在所述用户对应的显示位置上显示发言提示。
  24. 一种网络设备,其特征在于,所述网络设备包括存储器和处理器,所述存储器用于存储指令,所述处理器被配置为执行所述指令,以执行下述群组视频会话的方法的步骤:
    创建群组视频会话;
    对于所述群组视频会话中的每个用户,根据所述用户的设备信息,确定所述用户的用户类型,所述用户类型包括普通用户和虚拟用户,所述普通用户用于指示所述用户在参与所述群组视频会话时采用二维显示模式,所述虚拟用户用于指示所述用户在参与所述群组视频会话时采用虚拟现实显示模式;
    根据所述用户的用户类型所指示的视频显示模式,对所述群组视频会话的视频数据进行处理,得到所述用户的目标视频数据,所述目标视频数据的视频显示模式与所述用户的用户类型所指示的视频显示模式匹配;
    在所述群组视频会话的进行过程中,向所述用户的用户设备发送目标视频数据,使所述用户进行群组视频会话。
  25. 根据权利要求24所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    如果所述用户的用户类型为普通用户,将所述群组视频会话中虚拟用户对应的三维虚拟人物转换为二维虚拟人物;
    对所述二维虚拟人物、所述虚拟用户选择的二维背景、以及所述虚拟用户对应的音频数据进行合成,得到第一二维视频数据;
    对至少一个第一二维视频数据与至少一个第二二维视频数据进行合成,得 到所述用户的目标视频数据,所述第二二维视频数据是指所述群组视频会话中普通用户的二维视频数据。
  26. 根据权利要求24所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    如果所述用户的用户类型为虚拟用户,确定所述群组视频会话对应的虚拟环境;
    以所述虚拟环境为三维背景,确定所述群组视频会话中的每个用户在所述虚拟环境中的显示位置;
    对于所述群组视频会话中的普通用户,将所述普通用户的指定视频数据合成至所述普通用户对应的显示位置;
    对于所述群组视频会话中的虚拟用户,将所述虚拟用户的三维虚拟人物和音频数据合成至所述虚拟用户对应的显示位置;
    将合成后的视频数据作为所述用户的目标视频数据。
  27. 根据权利要求26所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    如果所述普通用户包括第一普通用户,将所述第一普通用户的两路二维视频数据转换为第一三维视频数据,将所述第一三维视频数据作为所述指定视频数据,所述第一普通用户是指使用双目摄像头的普通用户,或,如果所述普通用户包括所述第一普通用户,将所述第一普通用户的两路二维视频数据作为所述指定视频数据;
    如果所述普通用户包括第二普通用户,将所述第二普通用户的二维视频数据作为所述指定视频数据,所述第二普通用户是指使用单目摄像头的普通用户。
  28. 根据权利要求26所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    将所述用户触发的虚拟环境选项对应的虚拟环境确定为所述用户在所述群组视频会话中对应的虚拟环境;或,
    根据所述群组视频会话中的用户数量,确定所述群组视频会话对应的虚拟环境的容量,将符合所述容量的虚拟环境确定为所述群组视频会话对应的虚拟 环境;或,
    分析所述群组视频会话中的每个用户选择过的虚拟环境,得到每个虚拟环境的被选择次数,将被选择次数最多的虚拟环境确定为所述群组视频会话对应的虚拟环境。
  29. 根据权利要求26所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    根据所述用户与所述群组视频会话中其他用户之间的社交数据,分析所述用户与所述其他用户之间的亲密度,按照亲密度高低顺序从所述用户的任一侧开始排列所述其他用户的显示位置;或,
    获取所述其他用户的用户身份,将所述用户的对面位置确定为所述其他用户中用户身份最高的用户的显示位置,并随机确定所述其他用户中剩余用户的显示位置;或,
    按照所述其他用户加入所述群组视频会话的时间先后顺序,从所述用户的任一侧开始排列所述其他用户的显示位置;或,
    根据所述用户在所述虚拟环境中选择的位置,将所述用户所选择的位置确定为所述用户在所述虚拟环境中的显示位置;或,
    将所述用户的对面位置确定为所述普通用户的显示位置,并随机确定所述其他用户中剩余用户的显示位置。
  30. 根据权利要求24所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    在创建群组视频会话时,获取群组视频会话中第一用户的虚拟人物,所述第一用户的虚拟人物至少根据所述第一用户的头部特征数据和所述第一用户对应的肢体模型得到;
    在所述群组视频会话过程中,基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的视频数据,所述视频数据中所述第一用户的虚拟人物的动作与所述第一用户的实际动作匹配。
  31. 根据权利要求30所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    获取所述第一用户的头部特征数据;
    根据所述头部特征数据,生成与所述头部特征数据匹配的头部模型;
    根据所述第一用户的用户属性,确定所述第一用户对应的肢体模型;
    对所述头部模型和所述肢体模型进行合成,得到所述第一用户的虚拟人物。
  32. 根据权利要求31所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    根据所述第一用户的性别数据,确定与所述第一用户的性别数据匹配的肢体模型;和/或,
    根据所述第一用户的年龄数据,确定与所述第一用户的年龄数据匹配的肢体模型;和/或,
    根据所述第一用户的职业数据,确定与所述第一用户的职业数据匹配的肢体模型。
  33. 根据权利要求30所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    当检测到所述第一用户的表情特征数据为指定表情特征数据时,获取与所述指定表情特征数据对应的肢体特征数据;
    将所述指定表情特征数据实时映射至所述第一用户的虚拟人物的头部模型,并将所述肢体特征数据实时映射至所述第一用户的虚拟人物的肢体模型,得到所述第一用户的视频数据。
  34. 根据权利要求30所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    将所述第一用户的嘴型特征数据实时映射至所述第一用户的虚拟人物的头部模型,得到所述第一用户的视频数据。
  35. 根据权利要求30所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    获取所述第一用户的传感器采集到的所述第一用户的头部方位数据;
    将所述第一用户的头部方位特征数据实时映射至所述第一用户的虚拟人物 的头部模型,得到所述第一用户的视频数据。
  36. 根据权利要求30所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    获取所述第一用户的摄像头拍摄到的所述第一用户的眼部图像数据;
    根据所述第一用户的眼部图像数据,获取所述第一用户的眼神方向特征数据;
    将所述第一用户的眼神方向特征数据实时映射至所述第一用户的虚拟人物的头部模型,得到所述第一用户的视频数据。
  37. 根据权利要求30所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    基于所述第一用户的虚拟人物和所述第一用户的行为特征数据,获取所述第一用户的初始视频数据;
    获取所述第二用户的视角数据;
    按照所述第二用户的视角数据所指示的视角,对所述初始视频数据进行处理,得到与所述视角匹配的所述第一用户的视频数据。
  38. 根据权利要求24所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    在群组视频会话过程中,获取待展示的目标物的三维交互模型;
    根据所述群组视频会话中多个用户中每个用户的视角,在所述群组视频会话过程中,对所述目标物的三维交互模型进行处理,得到所述用户的视频数据,所述用户的视频数据包含对所述目标物的三维交互模型进行视角变换得到的模型数据;
    将所述多个用户的视频数据分别发送至所述多个用户所在终端。
  39. 根据权利要求24所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    当接收到第三用户的发言请求时,降低第四用户的音频数据的音量,所述第四用户为所述群组视频会话中除第三用户以外的用户。
  40. 根据权利要求38所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    获取第五用户上传的三维物体模型;或,
    获取第六用户上传的二维表格,对所述二维表格进行处理,得到三维表格模型。
  41. 根据权利要求24所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    当接收到多媒体文件播放请求时,将与所述多媒体播放请求对应的多媒体文件合成至所述群组视频会话中多个用户的视频数据。
  42. 根据权利要求38所述的网络设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    当检测到第七用户对所述三维交互模型的凝视时长大于预设时长时,将操作提示信息发送至所述第七用户所在终端,所述操作提示信息用于提示所述第七用户能够对所述三维交互模型进行操作。
  43. 一种终端,其特征在于,所述终端包括存储器和处理器,所述存储器用于存储指令,所述处理器被配置为执行所述指令,以执行下述群组视频会话的方法的步骤:
    接收网络设备发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与终端用户的用户类型所指示的视频显示模式匹配,所述终端用户的用户类型为普通用户,所述普通用户用于指示所述终端用户在参与所述群组视频会话时采用二维显示模式;
    显示所述目标视频数据,使群组视频会话中的普通用户以二维人物形式显示,所述群组视频会话中的虚拟用户以二维虚拟人物的形式显示。
  44. 一种虚拟现实VR设备,其特征在于,所述VR设备包括存储器和处理器,所述存储器用于存储指令,所述处理器被配置为执行所述指令,以执行下述群组视频会话的方法的步骤:
    接收网络设备发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与VR设备用户的用户类型所指示的视频显示模式匹配,所述VR设备用户的用户类型为虚拟用户,所述虚拟用户用于指示所述VR设备用户在参与所述群组视频会话时采用虚拟现实显示模式;
    显示所述目标视频数据,使群组视频会话中的普通用户在虚拟环境中以二维人物或三维人物的形式显示,所述群组视频会话中的虚拟用户在所述虚拟环境中以三维虚拟人物的形式显示。
  45. 根据权利要求44所述的VR设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    在所述普通用户对应的显示位置上,显示所述普通用户的二维人物或三维人物;
    在所述虚拟用户对应的显示位置上,显示所述虚拟用户的三维虚拟人物。
  46. 根据权利要求44所述的VR设备,其特征在于,所述处理器被配置为执行所述指令,以执行下述步骤:
    基于所述目标视频数据,如果检测到所述群组视频会话中任一用户正在发言,在所述用户对应的显示位置上显示发言提示。
  47. 一种群组视频会话系统,其特征在于,所述系统包括:
    网络设备,被配置为创建群组视频会话;对于所述群组视频会话中的每个用户,根据所述用户的设备信息,确定所述用户的用户类型,所述用户类型包括普通用户和虚拟用户,所述普通用户用于指示所述用户在参与所述群组视频会话时采用二维显示模式,所述虚拟用户用于指示所述用户在参与所述群组视频会话时采用虚拟现实显示模式;根据所述用户的用户类型所指示的视频显示模式,对所述群组视频会话的视频数据进行处理,得到所述用户的目标视频数据,所述目标视频数据的视频显示模式与所述用户的用户类型所指示的视频显示模式匹配;在所述群组视频会话的进行过程中,向所述用户的用户设备发送目标视频数据,使所述用户进行群组视频会话;
    终端,被配置为接收网络设备发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与终端用户的用户类型所指示的视频显示模式匹 配,所述终端用户的用户类型为普通用户,所述普通用户用于指示所述终端用户在参与所述群组视频会话时采用二维显示模式;显示所述目标视频数据,使群组视频会话中的普通用户以二维人物形式显示,所述群组视频会话中的虚拟用户以二维虚拟人物的形式显示;
    虚拟现实VR设备,被配置为接收网络设备发送群组视频会话的目标视频数据,所述目标视频数据的视频显示模式与VR设备用户的用户类型所指示的视频显示模式匹配,所述VR设备用户的用户类型为虚拟用户,所述虚拟用户用于指示所述VR设备用户在参与所述群组视频会话时采用虚拟现实显示模式;显示所述目标视频数据,使群组视频会话中的普通用户在虚拟环境中以二维人物或三维人物的形式显示,所述群组视频会话中的虚拟用户在所述虚拟环境中以三维虚拟人物的形式显示。
PCT/CN2018/075749 2017-02-24 2018-02-08 群组视频会话的方法及网络设备 WO2018153267A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/435,733 US10609334B2 (en) 2017-02-24 2019-06-10 Group video communication method and network device

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
CN201710104442.4A CN108513089B (zh) 2017-02-24 2017-02-24 群组视频会话的方法及装置
CN201710104669.9 2017-02-24
CN201710104439.2A CN108513088B (zh) 2017-02-24 2017-02-24 群组视频会话的方法及装置
CN201710104439.2 2017-02-24
CN201710104442.4 2017-02-24
CN201710104669.9A CN108513090B (zh) 2017-02-24 2017-02-24 群组视频会话的方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/435,733 Continuation US10609334B2 (en) 2017-02-24 2019-06-10 Group video communication method and network device

Publications (1)

Publication Number Publication Date
WO2018153267A1 true WO2018153267A1 (zh) 2018-08-30

Family

ID=63253513

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/075749 WO2018153267A1 (zh) 2017-02-24 2018-02-08 群组视频会话的方法及网络设备

Country Status (3)

Country Link
US (1) US10609334B2 (zh)
TW (1) TWI650675B (zh)
WO (1) WO2018153267A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012501A (zh) * 2021-03-18 2021-06-22 郑州铁路职业技术学院 一种远程教学方法

Families Citing this family (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11176484B1 (en) * 2017-09-05 2021-11-16 Amazon Technologies, Inc. Artificial intelligence system for modeling emotions elicited by videos
CN108229308A (zh) * 2017-11-23 2018-06-29 北京市商汤科技开发有限公司 目标对象识别方法、装置、存储介质和电子设备
US11948242B2 (en) 2019-08-02 2024-04-02 Fmr Llc Intelligent smoothing of 3D alternative reality applications for secondary 2D viewing
US11138804B2 (en) * 2019-08-02 2021-10-05 Fmr Llc Intelligent smoothing of 3D alternative reality applications for secondary 2D viewing
CN110598043B (zh) * 2019-09-24 2024-02-09 腾讯科技(深圳)有限公司 一种视频处理方法、装置、计算机设备以及存储介质
US11012249B2 (en) * 2019-10-15 2021-05-18 Microsoft Technology Licensing, Llc Content feature based video stream subscriptions
TWI764319B (zh) * 2019-11-01 2022-05-11 華南商業銀行股份有限公司 基於深度學習的影像辨識系統
TWI764318B (zh) * 2019-11-01 2022-05-11 華南商業銀行股份有限公司 使用色彩空間轉換的影像辨識系統
TWI758904B (zh) * 2019-11-01 2022-03-21 華南商業銀行股份有限公司 具有多攝像裝置的影像辨識系統
TWI718743B (zh) * 2019-11-01 2021-02-11 華南商業銀行股份有限公司 影像辨識系統
CN113015000A (zh) * 2019-12-19 2021-06-22 中兴通讯股份有限公司 渲染和显示的方法、服务器、终端、计算机可读介质
CN113810203B (zh) * 2020-06-11 2023-11-07 腾讯科技(深圳)有限公司 主题会话处理方法、装置、计算机设备和存储介质
US11502861B2 (en) * 2020-08-17 2022-11-15 T-Mobile Usa, Inc. Simulated auditory space for online meetings
GB2598897A (en) * 2020-09-14 2022-03-23 Antser Holdings Ltd Virtual meeting platform
US11076128B1 (en) 2020-10-20 2021-07-27 Katmai Tech Holdings LLC Determining video stream quality based on relative position in a virtual space, and applications thereof
US11457178B2 (en) 2020-10-20 2022-09-27 Katmai Tech Inc. Three-dimensional modeling inside a virtual video conferencing environment with a navigable avatar, and applications thereof
US10979672B1 (en) 2020-10-20 2021-04-13 Katmai Tech Holdings LLC Web-based videoconference virtual environment with navigable avatars, and applications thereof
US11070768B1 (en) 2020-10-20 2021-07-20 Katmai Tech Holdings LLC Volume areas in a three-dimensional virtual conference space, and applications thereof
US10952006B1 (en) 2020-10-20 2021-03-16 Katmai Tech Holdings LLC Adjusting relative left-right sound to provide sense of an avatar's position in a virtual space, and applications thereof
US11095857B1 (en) 2020-10-20 2021-08-17 Katmai Tech Holdings LLC Presenter mode in a three-dimensional virtual conference space, and applications thereof
US11743430B2 (en) 2021-05-06 2023-08-29 Katmai Tech Inc. Providing awareness of who can hear audio in a virtual conference, and applications thereof
US11184362B1 (en) 2021-05-06 2021-11-23 Katmai Tech Holdings LLC Securing private audio in a virtual conference, and applications thereof
US20230008871A1 (en) * 2021-07-12 2023-01-12 Milestone Systems A/S Computer implemented method and apparatus for training an operator of a video management system
CN113784189B (zh) * 2021-08-31 2023-08-01 Oook(北京)教育科技有限责任公司 一种圆桌视频会议的生成方法、装置、介质和电子设备
US11656747B2 (en) 2021-09-21 2023-05-23 Microsoft Technology Licensing, Llc Established perspective user interface and user experience for video meetings
CN116668622A (zh) * 2022-02-18 2023-08-29 鸿富锦精密工业(深圳)有限公司 多方通信语音控制方法及系统
US20230316663A1 (en) * 2022-03-30 2023-10-05 Tmrw Foundation Ip S. À R.L. Head-tracking based media selection for video communications in virtual environments
US20240020901A1 (en) * 2022-07-13 2024-01-18 Fd Ip & Licensing Llc Method and application for animating computer generated images
US12009938B2 (en) 2022-07-20 2024-06-11 Katmai Tech Inc. Access control in zones
US11928774B2 (en) 2022-07-20 2024-03-12 Katmai Tech Inc. Multi-screen presentation in a virtual videoconferencing environment
US11876630B1 (en) 2022-07-20 2024-01-16 Katmai Tech Inc. Architecture to control zones
US12022235B2 (en) 2022-07-20 2024-06-25 Katmai Tech Inc. Using zones in a three-dimensional virtual environment for limiting audio and video
US11651108B1 (en) 2022-07-20 2023-05-16 Katmai Tech Inc. Time access control in virtual environment application
US11700354B1 (en) 2022-07-21 2023-07-11 Katmai Tech Inc. Resituating avatars in a virtual environment
US11741664B1 (en) 2022-07-21 2023-08-29 Katmai Tech Inc. Resituating virtual cameras and avatars in a virtual environment
US11956571B2 (en) 2022-07-28 2024-04-09 Katmai Tech Inc. Scene freezing and unfreezing
US11682164B1 (en) 2022-07-28 2023-06-20 Katmai Tech Inc. Sampling shadow maps at an offset
US11562531B1 (en) 2022-07-28 2023-01-24 Katmai Tech Inc. Cascading shadow maps in areas of a three-dimensional environment
US11711494B1 (en) 2022-07-28 2023-07-25 Katmai Tech Inc. Automatic instancing for efficient rendering of three-dimensional virtual environment
US11704864B1 (en) 2022-07-28 2023-07-18 Katmai Tech Inc. Static rendering for a combination of background and foreground objects
US11776203B1 (en) 2022-07-28 2023-10-03 Katmai Tech Inc. Volumetric scattering effect in a three-dimensional virtual environment with navigable video avatars
US11593989B1 (en) 2022-07-28 2023-02-28 Katmai Tech Inc. Efficient shadows for alpha-mapped models
US20240062457A1 (en) * 2022-08-18 2024-02-22 Microsoft Technology Licensing, Llc Adaptive adjustments of perspective views for improving detail awareness for users associated with target entities of a virtual environment
CN115514729B (zh) * 2022-08-31 2024-04-05 同炎数智科技(重庆)有限公司 一种基于三维模型的即时讨论方法及系统
US11748939B1 (en) 2022-09-13 2023-09-05 Katmai Tech Inc. Selecting a point to navigate video avatars in a three-dimensional environment
US11875492B1 (en) 2023-05-01 2024-01-16 Fd Ip & Licensing Llc Systems and methods for digital compositing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102164265A (zh) * 2011-05-23 2011-08-24 宇龙计算机通信科技(深圳)有限公司 一种三维视频通话的方法及系统
CN103238317A (zh) * 2010-05-12 2013-08-07 布鲁珍视网络有限公司 实时多媒体通讯中可伸缩分布式全球基础设施的系统和方法
US20140085406A1 (en) * 2012-09-27 2014-03-27 Avaya Inc. Integrated conference floor control
CN105721821A (zh) * 2016-04-01 2016-06-29 宇龙计算机通信科技(深圳)有限公司 视频通话方法及装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219045B1 (en) * 1995-11-13 2001-04-17 Worlds, Inc. Scalable virtual world chat client-server system
US5884029A (en) * 1996-11-14 1999-03-16 International Business Machines Corporation User interaction with intelligent virtual objects, avatars, which interact with other avatars controlled by different users
US7908554B1 (en) * 2003-03-03 2011-03-15 Aol Inc. Modifying avatar behavior based on user action or mood
AU2006352758A1 (en) * 2006-04-10 2008-12-24 Avaworks Incorporated Talking Head Creation System and Method
US8683353B2 (en) 2006-12-12 2014-03-25 Motorola Mobility Llc Method and system for distributed collaborative communications
US20080263460A1 (en) * 2007-04-20 2008-10-23 Utbk, Inc. Methods and Systems to Connect People for Virtual Meeting in Virtual Reality
US20090132309A1 (en) * 2007-11-21 2009-05-21 International Business Machines Corporation Generation of a three-dimensional virtual reality environment from a business process model
US20090249226A1 (en) 2008-03-28 2009-10-01 Microsoft Corporation Collaborative tool use in virtual environment
US8108774B2 (en) * 2008-09-26 2012-01-31 International Business Machines Corporation Avatar appearance transformation in a virtual universe
TWM389406U (en) * 2010-05-25 2010-09-21 Avermedia Information Inc Network video conference system
US20140096036A1 (en) 2012-09-28 2014-04-03 Avaya Inc. Transporting avatars and meeting materials into virtual reality meeting rooms
US10073516B2 (en) 2014-12-29 2018-09-11 Sony Interactive Entertainment Inc. Methods and systems for user interaction within virtual reality scene using head mounted display
KR101577986B1 (ko) * 2015-03-24 2015-12-16 (주)해든브릿지 양방향 가상 현실 구현 시스템
KR102523997B1 (ko) * 2016-02-12 2023-04-21 삼성전자주식회사 360도 영상 처리 방법 및 장치
US20180189554A1 (en) * 2016-12-31 2018-07-05 Facebook, Inc. Systems and methods to present reactions to media content in a virtual environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103238317A (zh) * 2010-05-12 2013-08-07 布鲁珍视网络有限公司 实时多媒体通讯中可伸缩分布式全球基础设施的系统和方法
CN102164265A (zh) * 2011-05-23 2011-08-24 宇龙计算机通信科技(深圳)有限公司 一种三维视频通话的方法及系统
US20140085406A1 (en) * 2012-09-27 2014-03-27 Avaya Inc. Integrated conference floor control
CN105721821A (zh) * 2016-04-01 2016-06-29 宇龙计算机通信科技(深圳)有限公司 视频通话方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012501A (zh) * 2021-03-18 2021-06-22 郑州铁路职业技术学院 一种远程教学方法
CN113012501B (zh) * 2021-03-18 2023-05-16 深圳市天天学农网络科技有限公司 一种远程教学方法

Also Published As

Publication number Publication date
US10609334B2 (en) 2020-03-31
TW201832051A (zh) 2018-09-01
TWI650675B (zh) 2019-02-11
US20190297304A1 (en) 2019-09-26

Similar Documents

Publication Publication Date Title
WO2018153267A1 (zh) 群组视频会话的方法及网络设备
US11908243B2 (en) Menu hierarchy navigation on electronic mirroring devices
WO2019034142A1 (zh) 三维虚拟形象的显示方法、装置、终端及存储介质
US11790614B2 (en) Inferring intent from pose and speech input
US20220206581A1 (en) Communication interface with haptic feedback response
US20240184372A1 (en) Virtual reality communication interface with haptic feedback response
US20220319061A1 (en) Transmitting metadata via invisible light
US20220319059A1 (en) User-defined contextual spaces
US11989348B2 (en) Media content items with haptic feedback augmentations
KR20230160918A (ko) 햅틱 및 오디오 피드백 응답을 갖는 인터페이스
US20240015260A1 (en) Dynamically switching between rgb and ir capture
CN116320721A (zh) 一种拍摄方法、装置、终端及存储介质
US20220318303A1 (en) Transmitting metadata via inaudible frequencies
WO2023121896A1 (en) Real-time motion and appearance transfer
CN117409119A (zh) 基于虚拟形象的画面显示方法、装置以及电子设备
CN114004922B (zh) 骨骼动画显示方法、装置、设备、介质及计算机程序产品
US20220377309A1 (en) Hardware encoder for stereo stitching
US11874960B2 (en) Pausing device operation based on facial movement
US11825276B2 (en) Selector input device to transmit audio signals
US11935442B1 (en) Controlling brightness based on eye tracking
US20240161242A1 (en) Real-time try-on using body landmarks
US20230343004A1 (en) Augmented reality experiences with dual cameras
US20220210336A1 (en) Selector input device to transmit media content items
US20220373791A1 (en) Automatic media capture using biometric sensor data
WO2022246373A1 (en) Hardware encoder for stereo stitching

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18758050

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18758050

Country of ref document: EP

Kind code of ref document: A1