CN111881807A - VR conference control system and method based on face modeling and expression tracking - Google Patents

VR conference control system and method based on face modeling and expression tracking Download PDF

Info

Publication number
CN111881807A
CN111881807A CN202010717430.0A CN202010717430A CN111881807A CN 111881807 A CN111881807 A CN 111881807A CN 202010717430 A CN202010717430 A CN 202010717430A CN 111881807 A CN111881807 A CN 111881807A
Authority
CN
China
Prior art keywords
face
module
conference
information
expression tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010717430.0A
Other languages
Chinese (zh)
Inventor
何宝华
沈睦生
严明月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Cadre Science And Technology Industry Co ltd
Original Assignee
Shenzhen Cadre Science And Technology Industry Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Cadre Science And Technology Industry Co ltd filed Critical Shenzhen Cadre Science And Technology Industry Co ltd
Priority to CN202010717430.0A priority Critical patent/CN111881807A/en
Publication of CN111881807A publication Critical patent/CN111881807A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a VR conference control system and method based on face modeling and expression tracking, and the VR conference control system comprises a conference scene selection module, a face detection module, a feature point extraction module, a 3D face model generation module, a facial expression tracking module and an interaction module, wherein the conference scene selection module is used for entering a conference scene according to user selection, the face detection module is used for identifying face information of a wearer, the facial expression tracking module is used for acquiring a plurality of feature points in the face information every preset time period and mapping the feature points to a 3D face model in real time, and the interaction module is used for outputting a conference scene and the 3D face model. The invention can accurately and real-timely update each characteristic point in the face information to the 3D face model, namely, the expression change of each participant in the conference can be observed in the virtual scene, and the information except the sound can be more vivid and clearly understood.

Description

VR conference control system and method based on face modeling and expression tracking
Technical Field
The invention belongs to the technical field of virtual scenes, and particularly relates to a VR conference control system and method based on face modeling and expression tracking.
Background
At present, an existing VR system mainly adopts a virtual reality technology to perform scene 3D on an existing situational conference system, and combines a video conference technology and a 3D technology to realize a network video conference under the comprehensive combination of a voice system, a video system and a 3D virtual picture, so as to shorten the distance between people and realize cross-regional connection and interaction. The interactive exhibition in the world can be simulated through the screen. However, in the process of person-to-person communication, the transmission of sound is mainly used, and the sense of reality is poor.
Therefore, the prior art is to be improved.
Disclosure of Invention
The invention mainly aims to provide a VR conference control system and method based on face modeling and expression tracking so as to solve the technical problems mentioned in the background technology.
The invention relates to a VR conference control system based on face modeling and expression tracking, which comprises a conference scene selection module, a face detection module, a feature point extraction module, a 3D face model generation module, a facial expression tracking module and an interaction module, wherein the conference scene selection module is used for entering a conference scene according to user selection, the face detection module is used for identifying face information of a wearer, the feature point extraction module is used for acquiring the face information and extracting relative position information of a plurality of feature points, the 3D face model generation module is used for determining a 3D face model according to the position information, the facial expression tracking module is used for acquiring a plurality of feature points in the face information every preset time period, and mapping the three-dimensional face model to a 3D face model in real time, and outputting a conference scene and the 3D face model corresponding to each wearer participating in the conference scene by an interaction module.
Preferably, the interaction module comprises a VR head-mounted display device.
Preferably, the VR head display device is internally provided with a display module, a camera and a storage module.
Preferably, the storage module stores at least two conference scenes and at least two 3D face models calculated by the ASM algorithm.
The invention also provides a VR conference control method based on face modeling and expression tracking, which comprises the following steps:
step S10, after the wearer wears the interaction module, the wearer receives the selection of the wearer and enters a conference scene;
step S20, a camera in the interaction module acquires a real-time picture in the equipment, and the face information of the wearer is identified from the real-time picture;
step S30, identifying at least three characteristic points from the face information and calculating relative position information;
step S40, determining a 3D face model according to the relative position information;
and step S50, acquiring a plurality of feature points in the face information every other preset time period, and mapping the feature points to the 3D face model in real time.
Preferably, the method further comprises the steps of:
step S60, the interaction module outputs a conference scene and a 3D face model corresponding to each wearer participating in the conference scene.
Preferably, the face information includes a face region part in the real-time photograph, and the feature points include at least a nose, a mouth, a left eye, a right eye, and ears.
Preferably, the relative position information includes a first relative distance and a second relative distance, and the step S30 specifically includes:
step S31, identifying the nose, the left eye and the right eye from the face information;
step S32, establishing a coordinate system by taking the nose as a coordinate center, and acquiring left eye coordinate information and right eye coordinate information;
step S33, calculating a first relative distance and a second relative distance according to the positions of the nose coordinate information, the left eye coordinate information, and the right eye coordinate information, wherein the first relative distance represents the distance between the left eye and the nose, and the second relative distance represents the distance between the left eye and the right eye.
Preferably, step S40 specifically includes:
step S41, selecting a plurality of first correlation models from the storage module according to the first relative distance;
and step S42, determining a 3D face model from the first correlation models according to the second relative distance.
The VR conference control system and method based on face modeling and expression tracking can accurately and real-timely update each feature point in face information to a 3D face model, namely the expression change of each participant in a conference can be observed in a virtual scene, and information except sound can be more vivid and clearly known.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flow chart of a VR conference control method based on face modeling and expression tracking according to the present invention;
fig. 2 is a detailed flowchart of step S30 in the VR conference control method based on face modeling and expression tracking;
fig. 3 is a detailed flowchart of step S40 in the VR conference control method based on face modeling and expression tracking;
fig. 4 is a block diagram of a VR conference control system based on face modeling and expression tracking according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It is noted that relative terms such as "first," "second," and the like may be used to describe various components, but these terms are not intended to limit the components. These terms are only used to distinguish one component from another component. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present invention. The term "and/or" refers to a combination of any one or more of the associated items and the descriptive items.
As shown in fig. 4, fig. 4 is a block diagram of a VR conference control system based on face modeling and expression tracking according to the present invention. The invention relates to a VR conference control system based on face modeling and expression tracking, which comprises a conference scene selection module, a face detection module, a feature point extraction module, a 3D face model generation module, a facial expression tracking module and an interaction module, wherein the conference scene selection module is used for entering a conference scene according to user selection, the face detection module is used for identifying face information of a wearer, the feature point extraction module is used for acquiring the face information and extracting relative position information of a plurality of feature points, the 3D face model generation module is used for determining a 3D face model according to the position information, the facial expression tracking module is used for acquiring a plurality of feature points in the face information every preset time period, and mapping the three-dimensional face model to a 3D face model in real time, wherein the interaction module is used for outputting a conference scene and the 3D face model corresponding to each wearer participating in the conference scene. The VR conference control system and method based on face modeling and expression tracking can accurately and real-timely update each feature point in face information to a 3D face model, namely the expression change of each participant in a conference can be observed in a virtual scene, and information except sound can be more vivid and clearly known.
Wherein, the interaction module comprises VR head display equipment so as to be convenient for a wearer needing to participate in the conference to wear. A display module, a camera and a storage module are arranged in the VR head display equipment; the hand equipment is connected with the VR head display equipment; a sensor and a keyboard are arranged on the hand equipment to obtain a hand instruction of a user; the display module is used for playing VR video.
Preferably, the storage module stores at least two conference scenes and at least two 3D face models calculated by an ASM algorithm; the conference scene can be selected according to the number of people, the conference scene comprises a virtual conference scene, and the display module is used for outputting the virtual conference scene and the 3D face model corresponding to each wearer participating in the virtual conference scene. The ASM algorithm is a feature point extraction method based on a statistical learning model, and obtains the statistical description of the shape of an object by selecting a group of training samples, describing the shape of the object by using a group of feature points, registering the shapes of the samples, and then statistically modeling the registered shape vector by using a principal component analysis method. The created shape model may be used to search for objects in the new image that are similar to the model. The method comprises the following specific steps: the ASM training is carried out on key areas of the face, the feature points of the mouth, nose, canthus and face boundaries of the face are labeled, the coordinates of the feature points are recorded, the size, absolute position and angle of each image are different for the labeled face sample sets, therefore, the direct statistical modeling of the sample sets is irregular, in order to establish a model of the sample training set, the alignment operation is carried out through the rotation, translation, expansion and reduction of the images, the difference between the images is eliminated, and therefore a geometric model of the face is established. After the shape model is established, the target image and the model need to be matched by using an iteration principle, the coordinate set of the target image is still similar to the training set, and the iteration process is repeated until convergence is realized.
As shown in fig. 1, the present invention further provides a VR conference control method based on face modeling and expression tracking, including the following steps:
step S10, after the wearer wears the interaction module, the wearer receives the selection of the wearer and enters a conference scene;
step S20, a camera in the interaction module acquires a real-time picture in the equipment, and the face information of the wearer is identified from the real-time picture;
step S30, identifying at least three characteristic points from the face information and calculating relative position information;
step S40, determining a 3D face model according to the relative position information;
and step S50, acquiring a plurality of feature points in the face information every other preset time period, and mapping the feature points to the 3D face model in real time.
Preferably, the method includes, after step S50, the steps of:
and step S60, the interaction module outputs the 3D face model to the VR video in real time.
Preferably, the face information includes a face region part in the real-time photograph, and the feature points include at least a nose, a mouth, a left eye, a right eye, and ears.
As shown in fig. 2, preferably, the relative position information includes a first relative distance and a second relative distance, and step S30 specifically includes:
step S31, identifying the nose, the left eye and the right eye from the face information; in step S31, the face information includes a face region in the real-time photograph;
step S32, establishing a coordinate system by taking the nose as a coordinate center, and acquiring left eye coordinate information and right eye coordinate information;
step S33, calculating a first relative distance and a second relative distance according to the positions of the nose coordinate information, the left eye coordinate information, and the right eye coordinate information, wherein the first relative distance represents the distance between the left eye and the nose, and the second relative distance represents the distance between the left eye and the right eye.
The preferred embodiment calculates the relative position information by three feature points of the nose, the left eye, and the right eye based on step S31, step S32, and step S33; so as to determine the 3D face model with higher matching degree with the wearer;
as shown in fig. 3, preferably, step S40 specifically includes:
step S41, selecting a plurality of first correlation models from the storage module according to the first relative distance;
and step S42, determining a 3D face model from the first correlation models according to the second relative distance.
The storage module stores a plurality of 3D face models in advance; for example, a 3D face model for children is different from a 3D face model for adults.
The VR conference control system and method based on face modeling and expression tracking can accurately and real-timely update each feature point in face information to a 3D face model, namely the expression change of each participant in a conference can be observed in a virtual scene, and information except sound can be more vivid and clearly known. The real perception degree of the participants is met to the maximum extent, and not only the existence of the opposite side can be seen, but also the opposite side can be exactly sensed to be at the side of the participant. The vivid effect is realized through wearable equipment, including VR head display, force feedback data gloves and other human-computer interaction equipment.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. A VR conference control system based on face modeling and expression tracking is characterized by comprising a conference scene selection module, a face detection module, a feature point extraction module, a 3D face model generation module, a facial expression tracking module and an interaction module, wherein the conference scene selection module is used for entering a conference scene according to user selection, the face detection module is used for identifying face information of a wearer, the feature point extraction module is used for acquiring the face information and extracting relative position information of a plurality of feature points, the 3D face model generation module is used for determining a 3D face model according to the position information, the facial expression tracking module is used for acquiring a plurality of feature points in the face information every preset time period, and mapping the three-dimensional face model to a 3D face model in real time, wherein the interaction module is used for outputting a conference scene and the 3D face model corresponding to each wearer participating in the conference scene.
2. The VR conference control system based on face modeling and expression tracking of claim 1, wherein the interaction module includes a VR head-mounted display.
3. The VR conference control system based on face modeling and expression tracking of claim 2, wherein the VR head display device has a display module, a camera, and a storage module built in.
4. The VR conference control system of claim 3 that is based on face modeling and expression tracking, wherein the storage module stores at least two conference scenes and at least two 3D face models computed using ASM algorithms.
5. A VR conference control method based on face modeling and expression tracking is characterized by comprising the following steps:
step S10, after the wearer wears the interaction module, the wearer receives the selection of the wearer and enters a conference scene;
step S20, a camera in the interaction module acquires a real-time picture in the equipment, and the face information of the wearer is identified from the real-time picture;
step S30, identifying at least three characteristic points from the face information and calculating relative position information;
step S40, determining a 3D face model according to the relative position information;
and step S50, acquiring a plurality of feature points in the face information every other preset time period, and mapping the feature points to the 3D face model in real time.
6. The VR conference control method of claim 5 based on face modeling and expression tracking, further comprising:
step S60, the interaction module outputs a conference scene and a 3D face model corresponding to each wearer participating in the conference scene.
7. The VR conference control method of claim 5 that is based on face modeling and expression tracking, wherein the face information includes a face region portion in the real-time photo, and the feature points include at least nose, mouth, left eye, right eye, and ears.
8. The VR conference control method based on face modeling and expression tracking of claim 5, wherein the relative position information includes a first relative distance and a second relative distance, and the step S30 specifically includes:
step S31, identifying the nose, the left eye and the right eye from the face information;
step S32, establishing a coordinate system by taking the nose as a coordinate center, and acquiring left eye coordinate information and right eye coordinate information;
step S33, calculating a first relative distance and a second relative distance according to the positions of the nose coordinate information, the left eye coordinate information, and the right eye coordinate information, wherein the first relative distance represents the distance between the left eye and the nose, and the second relative distance represents the distance between the left eye and the right eye.
9. The VR conference control method based on face modeling and expression tracking as claimed in claim 8, wherein the step S40 specifically includes:
step S41, selecting a plurality of first correlation models from the storage module according to the first relative distance;
and step S42, determining a 3D face model from the first correlation models according to the second relative distance.
CN202010717430.0A 2020-07-23 2020-07-23 VR conference control system and method based on face modeling and expression tracking Pending CN111881807A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010717430.0A CN111881807A (en) 2020-07-23 2020-07-23 VR conference control system and method based on face modeling and expression tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010717430.0A CN111881807A (en) 2020-07-23 2020-07-23 VR conference control system and method based on face modeling and expression tracking

Publications (1)

Publication Number Publication Date
CN111881807A true CN111881807A (en) 2020-11-03

Family

ID=73154875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010717430.0A Pending CN111881807A (en) 2020-07-23 2020-07-23 VR conference control system and method based on face modeling and expression tracking

Country Status (1)

Country Link
CN (1) CN111881807A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114339134A (en) * 2022-03-15 2022-04-12 中瑞云软件(深圳)有限公司 Remote online conference system based on Internet and VR technology
CN114615455A (en) * 2022-01-24 2022-06-10 北京师范大学 Teleconference processing method, teleconference processing device, teleconference system, and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615455A (en) * 2022-01-24 2022-06-10 北京师范大学 Teleconference processing method, teleconference processing device, teleconference system, and storage medium
CN114339134A (en) * 2022-03-15 2022-04-12 中瑞云软件(深圳)有限公司 Remote online conference system based on Internet and VR technology

Similar Documents

Publication Publication Date Title
US11887234B2 (en) Avatar display device, avatar generating device, and program
US20130063560A1 (en) Combined stereo camera and stereo display interaction
US20040104935A1 (en) Virtual reality immersion system
CN110544301A (en) Three-dimensional human body action reconstruction system, method and action training system
KR101711736B1 (en) Feature extraction method for motion recognition in image and motion recognition method using skeleton information
JP2004537082A (en) Real-time virtual viewpoint in virtual reality environment
CN111710036A (en) Method, device and equipment for constructing three-dimensional face model and storage medium
JPWO2017094543A1 (en) Information processing apparatus, information processing system, information processing apparatus control method, and parameter setting method
JP5833526B2 (en) Video communication system and video communication method
CN102801994A (en) Physical image information fusion device and method
WO2004012141A2 (en) Virtual reality immersion system
CN111881807A (en) VR conference control system and method based on face modeling and expression tracking
Kawai et al. A support system for visually impaired persons to understand three-dimensional visual information using acoustic interface
CN117711066A (en) Three-dimensional human body posture estimation method, device, equipment and medium
CN113965550B (en) Intelligent interactive remote auxiliary video system
JP5759439B2 (en) Video communication system and video communication method
Siegl et al. An augmented reality human–computer interface for object localization in a cognitive vision system
WO2017147826A1 (en) Image processing method for use in smart device, and device
CN112416124A (en) Dance posture feedback method and device
JP5898036B2 (en) Video communication system and video communication method
Magee et al. Towards a multi-camera mouse-replacement interface
CN113342167B (en) Space interaction AR realization method and system based on multi-person visual angle positioning
Zhang et al. Behavior Recognition On Multiple View Dimension
CN116958353B (en) Holographic projection method based on dynamic capture and related device
JP2014086774A (en) Video communication system and video communication method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination