WO2018188277A1 - 视线校正方法、装置、智能会议终端及存储介质 - Google Patents

视线校正方法、装置、智能会议终端及存储介质 Download PDF

Info

Publication number
WO2018188277A1
WO2018188277A1 PCT/CN2017/103270 CN2017103270W WO2018188277A1 WO 2018188277 A1 WO2018188277 A1 WO 2018188277A1 CN 2017103270 W CN2017103270 W CN 2017103270W WO 2018188277 A1 WO2018188277 A1 WO 2018188277A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
image
depth information
current
face
Prior art date
Application number
PCT/CN2017/103270
Other languages
English (en)
French (fr)
Inventor
杨铭
Original Assignee
广州视源电子科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司 filed Critical 广州视源电子科技股份有限公司
Publication of WO2018188277A1 publication Critical patent/WO2018188277A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present invention relates to the field of image processing technologies, particularly a line of sight correction method, apparatus, intelligent conference terminal, and storage medium.
  • video conferencing has also been more widely used.
  • the survey shows that if the video parties can perform eye contact during the video conference, it will bring a good video conferencing experience to the video participants.
  • the video parties can perform eye contact during the video conference, it will bring a good video conferencing experience to the video participants.
  • the video parties can perform eye contact during the video conference, it will bring a good video conferencing experience to the video participants.
  • the video conference scene if both video players look at the video screen, the other party displayed on the video screen actually looks away. At this time, the two parties cannot make eye contact, which affects the visual experience of the user video conference.
  • Common sight-correcting schemes include: improvements to display devices in video equipment, such as semi-transparent mirrors or translucent displays for line-of-sight correction. Or use a special camera (such as RGB-D camera) combined with the corresponding algorithm to achieve line of sight correction, although the above scheme has better line of sight correction performance, but it needs to rely on special hardware or special camera, which has high cost. There are also limits to the range of applications that can be applied.
  • the technicians also proposed some schemes using a common monocular camera combined with the corresponding algorithm for line-of-sight correction, but most of the schemes cannot guarantee the synthesis of high-quality images under the premise of real-time, and the scheme mainly relies on ordinary The monocular camera performs line-of-sight correction, and the line of sight correction accuracy of the scheme is not good compared to the above scheme.
  • the embodiment of the invention provides a line-of-sight correction method, a device, an intelligent conference terminal and a storage medium, which can perform high-precision line-of-sight correction on a video player in a video conference, and solves the problem that the cost of line-of-sight correction is too high and the application range is too narrow.
  • an embodiment of the present invention provides a method for determining a line of sight, including:
  • an embodiment of the present invention provides a line of sight correction apparatus, including:
  • a depth information determining module configured to acquire two current picture frames that are synchronously captured by the dual camera, and determine depth information of each coincident shooting point in the two current picture frames;
  • An image mosaic synthesis module configured to merge the two current picture frames to form a current real picture frame
  • a key point information determining module configured to detect a two-dimensional key point constituting a face image in the current real scene picture frame, and determine coordinate information of the two-dimensional key point;
  • the character gaze correction module is configured to correct the face image in a three-dimensional space according to the depth information corresponding to the two-dimensional key point and the coordinate information to obtain a two-dimensional face front view image.
  • an intelligent conference terminal including:
  • One or more processors are One or more processors;
  • a storage device for storing one or more programs
  • the one or more programs are executed by the one or more processors such that the one or more processors implement a line of sight correction method provided by an embodiment of the present invention.
  • an embodiment of the present invention provides a computer readable storage medium, where a computer program is stored, and when the program is executed by the processor, the line of sight correction method provided by the embodiment of the present invention is implemented.
  • device intelligent conference terminal and storage medium, firstly acquiring two current picture frames synchronously captured by the dual camera, determining depth information of each coincident shot point in the two current picture frames, and combining to form a picture The current real picture frame; then detecting the two-dimensional key points constituting the face image in the current real picture frame and determining the coordinate information of the two-dimensional key point; finally correcting the person in the three-dimensional space according to the depth information and coordinate information corresponding to the two-dimensional key point
  • the face image obtains a two-dimensional face-facing image.
  • the solution of the invention does not need to rely on special hardware or special camera, and only needs two ordinary cameras to efficiently capture the camera.
  • the line of sight of the person in the real picture frame is corrected, which has low cost consumption and wide application range, and can also bring a wider capturing field of view through the dual camera, thereby enhancing the actual use experience of the intelligent meeting terminal. .
  • FIG. 1 is a schematic flowchart diagram of a line of sight correction method according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic flowchart of a line of sight correction method according to Embodiment 2 of the present invention
  • 2b to 2c are flowcharts showing a process of performing line of sight correction based on the line of sight correction method provided by the second embodiment of the present invention
  • Figure 2d shows a set of first live view frames for which a line of sight is to be corrected for a subject
  • FIG. 2e is a diagram showing a correction effect after performing line-of-sight correction processing on the first set of first real-time picture frames
  • Figure 2f shows a second live picture frame to be line-corrected for a plurality of subjects
  • FIG. 2g is a diagram showing a correction effect after performing line-of-sight correction processing on the set of second real-time picture frames
  • FIG. 3 is a structural block diagram of a line of sight correction apparatus according to Embodiment 3 of the present invention.
  • FIG. 4 is a schematic structural diagram of hardware of an intelligent conference terminal according to Embodiment 4 of the present invention.
  • FIG. 1 is a schematic flowchart of a method for correcting a line of sight according to a first embodiment of the present invention.
  • the method is applicable to a situation in which a line of sight in a captured picture frame is corrected by a line of sight during a video call, and the method may be performed by a line of sight correction device, where The device can be implemented by software and/or hardware and is generally integrated on a smart terminal having a video call function.
  • the smart terminal may be a smart mobile terminal such as a mobile phone, a tablet computer, or a notebook, or a fixed electronic device with a video call function such as a desktop computer or a smart conference terminal.
  • This embodiment preferably sets the application scenario to be performed by a stationary intelligent terminal.
  • the video call is preferably considered to be a video screen when the video call is performed.
  • the line of sight correction method provided by the present invention allows the video player to naturally view the line of sight during the video call.
  • a method for correcting a line of sight according to Embodiment 1 of the present invention includes the following operations:
  • the camera information of the scene where the video person is currently located is captured by the camera of the smart terminal.
  • the smart terminal in this embodiment has two cameras with parallel optical axes.
  • the smart terminal has a dual camera. During a video call, the dual camera can simultaneously capture the current frame of the current scene.
  • the current picture frames in the currently captured scene of the synchronous capture are not completely coincident, but the captured images are still captured in the two current picture frames.
  • the photographed point that exists in the two current picture frames at the same time is referred to as a coincident shot point.
  • the disparity value of each coincident shot point in the two current picture frames may be determined according to the set picture frame stereo matching algorithm, and then, according to the focal length of the camera, each coincident shot point to the current picture
  • the disparity value in the frame and the distance of the optical line connecting the two cameras can determine the depth information of each coincident shot point.
  • the depth information may be specifically understood as a depth value that coincides with the captured point to the smart terminal.
  • the merged splicing process may be performed on the captured two picture frames, thereby combining two current picture frames to form one current real picture frame.
  • the key point detection algorithm may detect whether there is a face image in the current real scene frame and determine a two-dimensional key point constituting the face image.
  • the two-dimensional key points constituting the face image may be detected in the current real-time picture frame according to the feature identifier of the face, and the specific coordinate information of each two-dimensional key point in the current real-time picture frame may be determined.
  • the eyes, the nose, and the two corners of the face can be identified as the most basic features of the face, whereby five two-dimensional key points constituting the face image can be detected in the current picture frame.
  • the number of the two-dimensional key points is not limited to five, and may be eight, ten or even 63.
  • the dual camera on the smart terminal can clearly capture the video information in the current scene, that is, the captured point that can be considered as a video image (which can be a face image) belongs to The coincidence of the captured points, and therefore, the depth information of each of the two-dimensional key points constituting the face image can be acquired from the acquired depth information of each of the coincident captured points.
  • the line of sight of the face image can be corrected according to the determined depth information of each two-dimensional key point and coordinate information.
  • the line of sight correction of the face image may specifically be equivalent to correcting the posture of the face image.
  • the face image is corrected from the upward view, the top view, and the side view to the front view, the corresponding The realization of the correction of the character's line of sight.
  • the current facial image can be actually triangulated based on the coordinate information of the determined two-dimensional key point, and the key point coordinate information of the standard facial image can also be obtained according to the preset frontal posture.
  • Standard triangulation is performed, and then the texture mapping between each actual triangulation and each standard triangulation can be established according to the correspondence between each two-dimensional key point and each key point in the standard face image, and finally according to the texture mapping thereof.
  • the current face image is corrected to a standard face image in a frontal posture.
  • the above operation can realize the posture correction of the face image, but the accuracy of the correction effect is low.
  • the three-dimensional actual face image model can be formed in the three-dimensional space by the depth information and coordinate information of each two-dimensional key point, and then The three-dimensional actual face image model can be corrected to the face image model of the face model according to the geometric transformation matrix, and finally the face image model of the face model is projected and mapped to form a face image model of the two-dimensional face posture.
  • the face image model of the face posture can be used as the face face image corrected in the present embodiment.
  • a line-of-sight correction method provided by Embodiment 1 of the present invention compared with the existing line-of-sight correction scheme, the line-of-sight correction method does not need to rely on special hardware or a special camera, and only two ordinary cameras can efficiently capture the captured scene.
  • the line of sight of the person in the picture frame is corrected, which has low cost and wide application range, and can also provide a wider capturing field of view through the dual camera, thereby enhancing the actual use experience of the intelligent meeting terminal.
  • FIG. 2a is a schematic flow chart of a method for correcting a line of sight according to Embodiment 2 of the present invention.
  • the second embodiment of the present invention is optimized based on the foregoing embodiment.
  • two current picture frames captured by the dual camera may be acquired to determine the depth of each coincident shot point in the two current picture frames.
  • the information is merged to form a current real-time picture frame, and further optimized to: obtain two current picture frames that are synchronously captured by the dual camera in the current video scene; perform stereo matching on the two current picture frames to obtain the two The disparity value of each coincident shot point in the current picture frame; according to each coincident shot point Deviation and depth calculation formulas are used to determine depth information of each coincident shot point; according to the set image merge strategy, the two current picture frames are merged into a seamless high resolution current real scene frame.
  • the method further includes: forming the coincidence based on the depth information of each of the coincident shot points a depth map corresponding to the photographing point; performing smoothing optimization processing on the depth map based on the set image smoothing algorithm, and obtaining optimized depth information corresponding to each of the coincident photographing points.
  • the method further optimizes The method includes: replacing a face image in the current real scene frame with the face front view image, obtaining a corrected real scene frame frame; performing edge blending processing on the corrected real scene picture frame, and displaying the processed corrected real scene image frame .
  • the embodiment further corrects the face image according to the depth information corresponding to the two-dimensional key point and the coordinate information to obtain a two-dimensional face front view image in a three-dimensional space
  • the specific optimization is: searching each And superimposing depth information of the captured point, determining depth information corresponding to the two-dimensional key point; and performing face image fitting on the preset three-dimensional face parameter model according to the depth information and the coordinate information, obtaining the An actual three-dimensional face model of the face image in the current live view frame; and the actual three-dimensional face model is transformed from the current pose transform into a two-dimensional face front view image according to the determined geometric transformation matrix.
  • a method for correcting a line of sight according to Embodiment 2 of the present invention specifically includes the following operations:
  • S201 to S204 specifically describe an acquisition process of the coincidence depth information of the captured points.
  • a dual camera with parallel optical axes disposed on the smart terminal can perform image capture synchronously in the current video scene, which is equivalent to obtaining two current frame frames of the same scene at two different viewing angles.
  • S202 Perform stereo matching on two current picture frames to obtain disparity values of the coincident shot points in the two current picture frames.
  • the stereo matching of the two current picture frames may be specifically understood as finding corresponding points in the two or more images captured from different perspectives, wherein the corresponding points are understandable.
  • the disparity values of the coincident captured points may be determined.
  • the matching of the corresponding points can be implemented by the binning matching algorithm based on the area (window).
  • the two current picture frames are divided into specific number of areas, and then it is determined in each area whether There is a matching corresponding point; in this embodiment, the matching of the corresponding points can also be achieved by the feature-based binocular matching algorithm.
  • each of the two current picture frames is divided into intervals including objects with real features in the real world. And then determine in each interval whether there is a matching corresponding point.
  • each method has its own advantages and disadvantages, such as a binocular matching algorithm based on a region (window), which can easily recover the parallax of a high-texture region, but In the low-texture area, a large number of mismatches are caused, which leads to blurring of the boundary and difficult processing of the occluded area.
  • the feature points extracted by the feature-based binocular matching method are not too sensitive to noise, so A more accurate match is obtained, but since the feature points in the image are sparse, this method can only obtain a sparse disparity map.
  • This embodiment does not specifically limit the binocular matching algorithm to be used, and the above binocular matching algorithm can be used, and can be used according to a specific application field. Make a specific choice of scenery.
  • the depth calculation formula is expressed as: Where Z denotes the depth value of the coincident shot point to the smart terminal, b denotes the connection distance of the optical center of the dual camera, f denotes the focal length of the dual camera, and d denotes the disparity value of the coincident shot point. Based on the above formula and the determined disparity value, depth information of each coincident shot point can be determined.
  • a depth map corresponding to each coincident shot point may be formed.
  • S205 Perform smoothing optimization processing on the depth map based on the set image smoothing algorithm, and obtain optimized depth information corresponding to each of the coincident captured points.
  • the determined depth information has low reliability, and there are many holes in the depth map formed according to the depth information, thereby requiring a depth map.
  • the optimization process is performed to fill the holes in the depth map.
  • the image smoothing algorithm may be used to perform smoothing optimization processing.
  • the image smoothing algorithm may be a Laplacian smoothing algorithm and a two-dimensional adaptive filtering smoothing algorithm. Wait. Further, the depth information corresponding to each of the acquired coincident points can be used for the operation of the subsequent S208.
  • the embodiment may determine the foreground area in the current real-time picture frame by determining a surrounding average depth value.
  • This step specifically implements the splicing process of two current picture frames. Based on this step, the images of the overlapping portions captured at two different viewing angles can be spliced into a seamless high-resolution image with a wider field of view.
  • the image merging strategy in this step may be a region-based splicing algorithm or a feature-based splicing algorithm.
  • an implementation manner of the region-based splicing algorithm may be expressed as: first, one image of two current picture frames is used as a to-be-registered image, and the other is used as a reference image, and then the registration image is to be processed.
  • the middle area and the same size area in the reference image are calculated by least squares method or other mathematical methods to calculate the difference of the gray value, and the difference is compared to determine the degree of similarity of the overlapping areas in the two images to be stitched, thereby obtaining two The extent and position of the overlapping area in the current picture frame, thereby achieving image stitching of two current picture frames.
  • Another implementation manner may transform the images of two current picture frames from the time domain to the frequency domain by using an FFT transform, and then establish a mapping relationship between two current picture frames, when the pixels of each block area in the two current picture frames are used.
  • the correlation coefficient of the gradation value of the pixel corresponding to the two regions is calculated. The larger the correlation coefficient is, the higher the matching degree of the image in the two regions is, and the image matching degree is high.
  • the area is used as an overlapping area, and the splicing of two current picture frames can also be realized.
  • the implementation of the feature-based splicing algorithm can be expressed as follows: first, the matching of the overlapping images is performed based on the features, and the matching process does not directly utilize the pixel values of the images in each current frame frame. Rather, the features of the image in each current picture frame are derived by pixels, and then the corresponding feature regions of the overlapping portions of the image are determined by searching and matching according to the image feature, thereby realizing the splicing of two current picture frames, wherein the splicing The algorithm has higher robustness and robustness.
  • the matching of overlapping images based on features has two processes: feature extraction and feature registration. Firstly, feature points such as points, lines and regions with obvious gray-scale changes are extracted from the two current picture frames; then feature pairs in the corresponding feature sets of the two current picture frames are used to make the corresponding pairs of features as possible Choose it.
  • a series of image segmentation techniques are used for feature extraction and boundary detection, such as Canny operator, Laplacian Gaussian operator, and region growth.
  • the extracted spatial features include closed boundaries, open boundaries, intersecting lines, and other features.
  • the feature registration operation in the above process can be realized by algorithms such as cross correlation, distance transformation, dynamic programming, structure matching, and chain code correlation.
  • the image splicing algorithm to be used is not specifically limited.
  • the image splicing algorithm proposed above may be used.
  • This embodiment may perform specific selection and selection according to a specific application scenario.
  • the present embodiment preferably detects 63 two-dimensional key points constituting the face image in the current real-time picture frame, and acquires coordinate information of each two-dimensional key point in the current real-time picture frame.
  • the depth information used in this step may be the initial depth information obtained based on S203, or may be the depth information based on the S205 optimization, and the preferred embodiment adopts the optimized depth.
  • the degree information is subjected to subsequent operations, whereby the accuracy of the line of sight correction can be better improved.
  • the depth information of each of the coincident shot points is determined.
  • each of the two-dimensional key points constituting the face image belongs to the set of coincident shot points, thereby obtaining each two-dimensional key point. Corresponding depth information.
  • the line of sight correction of the face image can be realized by the following S209 and S210.
  • the stereoscopic face image can be fitted on a given three-dimensional face parameter model.
  • the three-dimensional face parameter model can be understood as a three-dimensional model with a face contour, which can fit a three-dimensional face model with different feature information and different postures according to different input parameters. Therefore, in this step, the actual three-dimensional face model corresponding to the face image in the current real-time frame frame is determined according to the depth information and the coordinate information of the input two-dimensional key points.
  • the fitted 3D face model can be regarded as the posture of the face image in the current real scene frame (such as looking up or down, etc.), and the step can be performed on the actual 3D.
  • the geometric transformation of the face model obtains the front view of the face image. Specifically, this step may first multiply the actual three-dimensional face model by the first geometric transformation matrix, determine a three-dimensional face front view model in the three-dimensional space, and then multiply the three-dimensional face-facing front view model according to the second geometric transformation matrix. The texture of the three-dimensional face-facing front view model is projected onto a two-dimensional plane to obtain a two-dimensional face front view image. In addition, this step may also first multiply the first geometric transformation matrix and the second geometric transformation matrix to obtain a third geometry. The transformation matrix finally multiplies the actual three-dimensional face model and the third geometric change matrix to directly obtain a two-dimensional face front view image.
  • the first geometric transformation matrix in this embodiment is uniquely determined by the position of the character included in the current live view frame relative to the smart terminal screen, and the position of the included character relative to the smart terminal screen can pass the above depth information. Obtained, whereby the specific value of the first transformation matrix can be uniquely determined from the depth information constituting the face image.
  • the second geometric transformation matrix in this embodiment is specifically used for three-dimensional to two-dimensional dimensionality reduction projection, and can be determined according to a three-dimensional face model of a positive posture in three-dimensional space.
  • the corrected live view frame can be obtained by replacing the face image based on the step, and it can be known that the posture of the face image in the corrected live view frame is a front view gesture. Thereby, the correction of the person's line of sight in the frame of the picture captured during the video call is realized.
  • the corrected real-time picture frame formed based on the above steps only obtains a preliminary correction effect, and although the line of sight is corrected, there is often a large inconsistency between the replaced synthesized face edge and the original real-time picture frame, resulting in a more obvious
  • the image processing traces therefore, the processing trace of the above steps can be repaired by the edge fusion method based on this step.
  • the out-of-profile region of the face image in the formed corrected real image frame can be used as the region to be cut, thereby obtaining the best outer contour region by using image segmentation technology.
  • the edge is cut, and then mixed with the corrected real image frame to finally obtain the corrected real image frame after the edge processing.
  • the processed corrected real image frame can be finally displayed. Go to the local and opposite screens.
  • FIG. 2b to FIG. 2c show the line of sight correction method according to the second embodiment of the present invention.
  • Process flow chart As shown in FIG. 2b, cameras 20 with parallel optical axes are respectively disposed on both sides of the smart terminal, and the camera 20 can synchronously capture two current picture frames 21 through step S1; then, two current picture frames 21 can be performed through step S2.
  • Stereo matching obtaining the depth information 22 of the coincident shot point, and obtaining the optimized depth information 23 by step S3, and simultaneously splicing the two current picture frames 21 by step S4 to obtain the current real-time picture frame 24;
  • the line-of-sight correction operation may be performed on the face image in the current live picture frame 24 by the determined depth information 23 and the detected two-dimensional key point according to step S5, and the corrected live view picture frame 25 after the line-of-sight correction is obtained;
  • the embodiment also provides an effect diagram of performing line-of-sight correction based on the provided line-of-sight correction method
  • FIG. 2d shows a first scene picture frame to be subjected to line-of-sight correction in which a subject is present
  • FIG. 2e gives A correction effect diagram after a set of first real scene picture frame angle correction processing is performed.
  • FIG. 2f shows a second real-life picture frame to be subjected to line-of-sight correction in which a plurality of subjects are present
  • FIG. 2g shows a correction effect after performing line-of-sight correction processing on the set of second real picture frames.
  • a method for determining a line of sight according to a second embodiment of the present invention specifically describes a process of determining depth information, and specifically describes a process of correcting a line of sight of a person in a picture frame, and further increases an operation of optimizing the depth information and correcting the line of sight of the person.
  • the process of forming a corrected picture frame is formed.
  • the method does not need to rely on special hardware or special camera, and only needs two common cameras to efficiently correct the line of sight of the captured person in the captured real-picture frame, which has low cost and wide application range, and through the dual camera. It also brings a wider capture horizon, which enhances the actual experience of the smart conference terminal.
  • FIG. 3 is a structural block diagram of a line-of-sight correction apparatus according to Embodiment 3 of the present invention.
  • the apparatus is suitable for performing line-of-sight correction on a person in a captured picture frame during a video call, and the apparatus may be implemented by software and/or hardware. And generally integrated on a smart terminal with video calling capabilities.
  • the apparatus includes: a depth information determining module 31, an image stitching combining module 32, a key point information determining module 33, and a character line of sight correcting module 34.
  • the depth information determining module 31 is configured to acquire two current picture frames that are synchronously captured by the dual camera, and determine depth information of each coincident shooting point in the two current picture frames;
  • the image splicing and combining module 32 is configured to combine the two current picture frames to form a current real picture frame
  • the key point information determining module 33 is configured to detect two-dimensional key points constituting the face image in the current real-time picture frame, and determine coordinate information of the two-dimensional key point;
  • the character gaze correction module 34 is configured to correct the face image in a three-dimensional space according to the depth information corresponding to the two-dimensional key point and the coordinate information to obtain a two-dimensional face front view image.
  • the line-of-sight correction device first acquires two current picture frames captured by the dual camera simultaneously, and determines depth information of each coincident shot point in the two current picture frames;
  • the module 32 combines the two current picture frames to form a current real picture frame;
  • the key point information determining module 33 detects the two-dimensional key points constituting the face image in the current real picture frame, and determines the two
  • the coordinate information of the key point is finally corrected by the character line of sight correction module 34 according to the depth information corresponding to the two-dimensional key point and the coordinate information, and the face image is corrected in a three-dimensional space to obtain a two-dimensional face front view image.
  • a line-of-sight correction device provides a line-of-sight correction device that can efficiently capture captured real-time picture frames without requiring special hardware or special cameras, instead of relying on special hardware or special cameras.
  • the line of sight of the captured person is corrected, which has low cost consumption and wide application range, and can also bring a wider capturing field of view through the dual camera, thereby enhancing the actual use experience of the intelligent conference terminal.
  • the depth information determining module 31 is specifically configured to: acquire two current picture frames that are synchronously captured by the dual camera in the current video scene; perform stereo matching on the two current picture frames to obtain the two current picture frames.
  • the disparity value of each coincident shot point; the depth information of each coincident shot point is determined according to the disparity value and the depth calculation formula of each coincident photographed point.
  • the image splicing and synthesizing module 32 is configured to: combine the two current picture frames into a seamless high-resolution current real-time picture frame according to the set image merging strategy.
  • the device is also optimized to increase:
  • a depth map determining module 35 configured to form the coincident shot points based on the depth information of each coincident shot point after determining the depth information of each coincident shot point in the two current picture frames Corresponding depth map;
  • the depth information optimization module 36 is configured to perform smoothing optimization processing on the depth map based on the set image smoothing algorithm, and obtain optimized depth information corresponding to the overlapped photographing points.
  • the device is further optimized to include:
  • a face image replacing module 37 configured to: after correcting the face image in a three-dimensional space to obtain a two-dimensional face front view image according to the depth information corresponding to the two-dimensional key point and the coordinate information, The face image in the current real scene frame is replaced with the face front view image, and the corrected real scene frame frame is obtained;
  • the corrected image processing module 38 is configured to perform edge blending processing on the corrected real scene frame and display the processed corrected real scene frame.
  • the character line of sight correction module 34 is specifically configured to:
  • the smart conference terminal provided by the fourth embodiment of the present invention includes two cameras 41 with optical axes parallel, a processor 42 and a storage device 43.
  • the processor in the smart conference terminal may be one or more.
  • a processor 42 is taken as an example.
  • Two cameras 41 in the smart conference terminal may be respectively connected to the processor 42 and stored by a bus or other means.
  • the device 43 is connected, and the processor 42 and the storage device 43 are also connected by a bus or the like, and the bus connection is taken as an example in FIG.
  • the smart conference terminal belongs to one of the foregoing smart terminals, and can perform a remote video conference call.
  • the processor 42 in the smart conference terminal can control the two cameras 41 to perform image capture, and the processor 42 can also perform required operations according to the frame frames captured by the two cameras.
  • the two cameras 41 The captured picture frames can also be stored to storage device 43 to effect storage of the image data.
  • the storage device 43 in the smart conference terminal is used as a computer readable storage medium, and can be used to store one or more programs, which can be software programs, computer executable programs, and modules, such as line of sight correction in the embodiment of the present invention.
  • the program instruction/module corresponding to the method (for example, the module in the line of sight correction device shown in FIG. 3 includes: a depth information determining module 31, an image stitching synthesizing module 32, a key point information determining module 33, and a character line of sight correcting module 34) .
  • the processor 42 executes various functional applications and data processing of the smart conference terminal by executing software programs, instructions, and modules stored in the storage device 43, that is, the line-of-sight correction method in the above method embodiments.
  • the storage device 43 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the device, and the like. Further, the storage device 43 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device. In some examples, storage device 43 can further include a remote relative to processor 42 Set up memory that can be connected to the device over the network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the program when one or more programs included in the smart conference terminal are executed by the one or more processors 42, the program performs the following operations:
  • the embodiment of the present invention further provides a computer readable storage medium, where the computer program is stored, and when the program is executed by the control device, the line of sight correction method provided in the first embodiment or the second embodiment of the present invention is implemented, and the method includes: Obtaining two current picture frames that are captured by the dual camera, determining depth information of each coincident shot point in the two current picture frames, and combining to form a current real picture frame; detecting the constituents in the current real picture frame a two-dimensional key point of the face image, and determining coordinate information of the two-dimensional key point; correcting the face image in three-dimensional space according to the depth information corresponding to the two-dimensional key point and the coordinate information to obtain two-dimensional The face of the face is facing the image.
  • the present invention can be implemented by software and necessary general hardware, and can also be implemented by hardware, but in many cases, the former is a better implementation. .
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk of a computer.
  • Read-Only Memory (ROM), Random Access Memory (RAM), Flash memory (FLASH), hard disk or optical disk, etc. includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Studio Devices (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明公开了视线校正方法、装置、智能会议终端及存储介质。该方法包括:获取双摄像头同步捕获的两张当前画面帧,确定两张当前画面帧中各重合被摄点的深度信息,并合并形成一幅当前实景画面帧;检测当前实景画面帧中构成人脸图像的二维关键点,并确定二维关键点的坐标信息;根据二维关键点对应的深度信息及所述坐标信息,在三维空间中校正人脸图像获得二维的人脸正视图像。利用该方法,无需依赖特殊硬件或特殊摄像头,仅需两个普通的摄像头就能高效地对所捕获实景画面帧中被摄人物的视线进行校正处理,其成本消耗低且适用范围广,同时通过双摄像头还能带来更广的捕获视野,由此更好的增强了智能会议终端的实际使用体验。

Description

视线校正方法、装置、智能会议终端及存储介质 技术领域
本发明涉及图像处理技术领域,尤其视线校正方法、装置、智能会议终端及存储介质。
背景技术
随着科技的发展,视频会议也得到更广泛的应用,调查显示,视频会议过程中如果视频双方能够进行眼神交互,则更能给视频参与者带来良好的视频会议体验。一般而言,视频会议时,只有双方视频者盯着摄像头看时,另一方的视频者才觉得画面中的对方与自己存在眼神交互。然而,在视频会议场景中,如果双方视频者均往视频画面看,则视频画面中显示出的对方实则看向别处,此时双方无法进行眼神交流,影响了用户视频会议的视觉体验。
目前,技术人员提出了一些视线校正方案来保证视频会议中视频双方的眼神交流,常见的视线校正方案有:对视频设备中显示设备的改进,如采用半透明镜子或半透明显示屏实现视线校正,或者采用特殊的摄像头(如RGB-D摄像头)结合相应的算法实现视线校正,上述方案尽管具有较好的视线校正性能,但却需要依赖特殊硬件或特殊摄像头,其均具有较高的成本消耗且可应用的范围也存在限制。此外,技术人员也提出了一些采用普通单目摄像头结合相应算法进行视线校正的方案,但该种方案大多数无法保证在实时性的前提下合成高质量的图像,且该种方案主要依赖普通的单目摄像头进行视线校正,相对上述方案,该方案的视线校正精确性不佳。
发明内容
本发明实施例提供了视线校正方法、装置、智能会议终端及存储介质,能够对视频会议中的视频者进行高精度的视线校正,解决了视线校正成本消耗过高,适用范围过窄的问题。
一方面,本发明实施例提供了一种视线校正方法,包括:
获取双摄像头同步捕获的两张当前画面帧,确定所述两张当前画面帧中各重合被摄点的深度信息,并合并形成一幅当前实景画面帧;
检测所述当前实景画面帧中构成人脸图像的二维关键点,并确定所述二维关键点的坐标信息;
根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像。
另一方面,本发明实施例提供了一种视线校正装置,包括:
深度信息确定模块,用于获取双摄像头同步捕获的两张当前画面帧,确定所述两张当前画面帧中各重合被摄点的深度信息;
图像拼接合成模块,用于将所述两张当前画面帧合并形成一幅当前实景画面帧;
关键点信息确定模块,用于检测所述当前实景画面帧中构成人脸图像的二维关键点,并确定所述二维关键点的坐标信息;
人物视线校正模块,用于根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像。
又一方面,本发明实施例提供了一种智能会议终端,包括:
光轴平行的两个摄像头;
一个或多个处理器;
存储装置,用于存储一个或多个程序;
所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本发明实施例提供的视线校正方法。
再一方面,本发明实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本发明实施例提供的视线校正方法。
在上述视线校正方法、装置、智能会议终端及存储介质中,首先获取双摄像头同步捕获的两张当前画面帧,确定两张当前画面帧中各重合被摄点的深度信息,并合并形成一幅当前实景画面帧;然后检测当前实景画面帧中构成人脸图像的二维关键点并确定二维关键点的坐标信息;最终根据二维关键点对应的深度信息及坐标信息在三维空间中校正人脸图像获得二维的人脸正视图像。上述视线校正方法、装置、智能会议终端及存储介质,与现有的视线校正方案相比,本发明的方案无需依赖特殊硬件或特殊摄像头,仅需两个普通的摄像头就能高效地对所捕获实景画面帧中被摄人物的视线进行校正处理,其成本消耗低且适用范围广,同时通过双摄像头还能带来更广的捕获视野,由此更好的增强了智能会议终端的实际使用体验。
附图说明
图1为本发明实施例一提供的一种视线校正方法的流程示意图;
图2a为本发明实施例二提供的一种视线校正方法的流程示意图;
图2b~图2c给出了基于本发明实施例二提供的视线校正方法进行视线校正的处理流程图;
图2d给出了一组存在一个被摄人物的待进行视线校正的第一实景画面帧;
图2e给出了对上述一组第一实景画面帧进行视线校正处理后的校正效果图;
图2f给出了一组存在多个被摄人物的待进行视线校正的第二实景画面帧;
图2g给出了对上述一组第二实景画面帧进行视线校正处理后的校正效果图;
图3为本发明实施例三提供的一种视线校正装置的结构框图;
图4为本发明实施例四提供的一种智能会议终端的硬件结构示意图。
具体实施方式
下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。
实施例一
图1为本发明实施例一提供的一种视线校正方法的流程示意图,该方法适用于视频通话时对所捕获画面帧中的人物进行视线校正的情况,该方法可以由视线校正装置执行,其中该装置可由软件和/或硬件实现,并一般集成在具有视频通话功能的智能终端上。
在本实施例中,所述智能终端具体可以是手机、平板电脑、笔记本等智能移动终端,也可以是台式计算机、智能会议终端等固定式的具有视频通话功能的电子设备。本实施例优选的设定其应用场景为通过固定不动的智能终端进行 视频通话,且优选地认为进行视频通话时视频者双方的实现均看向视频画面,此时基于本发明提供的视线校正方法,可以让视频者双方自然地对视实现视频通话时的视线交流。
如图1所示,本发明实施例一提供的一种视线校正方法,包括如下操作:
S101、获取双摄像头同步捕获的两张当前画面帧,确定该两张当前画面帧中各重合被摄点的深度信息,并合并形成一幅当前实景画面帧。
在本实施例中,在基于智能终端进行视频通话时,主要通过智能终端的摄像头捕获视频者当前所处场景的画面信息,本实施例中的智能终端具有两个光轴平行的摄像头,即所述智能终端具有双摄像头。在视频通话过程中,双摄像头可同步捕获当前所在场景的当前画面帧。
可以理解的是,由于双摄像头在智能终端上的安装位置不同,同步捕获的当前所在场景中的当前画面帧也不完全重合,但所述两张当前画面帧中仍存在同时被捕获的被摄点,本实施例将同时存在于所述两张当前画面帧中的被摄点称为重合被摄点。
在本实施例中,可以根据设定的画面帧立体匹配算法确定两张当前画面帧中各重合被摄点的视差值,之后,根据摄像头具有的焦距、各重合被摄点到所在当前画面帧中的视差值以及双摄像头光心连线的距离,可以确定各重合被摄点的深度信息。其中,所述深度信息具体可理解为重合被摄点到智能终端的深度值。此外,本实施例还可以对所捕获的两张画面帧进行合并拼接处理,由此将两张当前画面帧合并形成一幅当前实景画面帧。
S102、检测当前实景画面帧中构成人脸图像的二维关键点,并确定二维关键点的坐标信息。
本步骤可以根据关键点检测算法检测当前实景画面帧中的是否存在人脸图像并可确定构成人脸图像的二维关键点。具体地,可以根据人脸具有的特征标识在所述当前实景画面帧中检测构成人脸图像的二维关键点,同时可以确定各二维关键点在当前实景画面帧中的具体坐标信息。一般地,可将人脸中的双眼、鼻子以及两嘴角作为人脸的最基本特征标识,由此可以在当前画面帧中检测出构成人脸图像的五个二维关键点。示例性的,所述二维关键点的个数不限于五个,还可以是8个、10个甚至63个,可以理解的是,所检测的二维关键点个数越多,其在当前实景画面帧中确定的人脸图像的所在区域就越准确。本实施例为保证人脸图像所在区域的准确性,优选的进行63个二维关键点的检测,由此可在所述当前实景画面帧中确定出63个二维关键点的坐标信息。
S103、根据二维关键点对应的深度信息及所述坐标信息,在三维空间中校正人脸图像获得二维的人脸正视图像。
需要说明的是,本实施例认为智能终端上的双摄像头均可清晰的捕获当前所处场景中的视频者信息,即,可认为构成视频者图像(可以是人脸图像)的被摄点属于所述重合被摄点,因此,可从所获取的各重合被摄点的深度信息中获取构成人脸图像的各二维关键点的深度信息。
本步骤可以根据所确定的各二维关键点的深度信息以及坐标信息,对人脸图像的视线进行校正。需要说明的是,对人脸图像的视线校正具体可相当于对人脸图像的姿态进行校正,示例性的,当将人脸图像由仰视、俯视以及侧视等姿态校正为正视时,就相应的实现了人物视线的校正。
一般地,可以基于所确定二维关键点的坐标信息对当前的人脸图像进行实际三角剖分,同样可以根据预设的正视姿态下标准人脸图像的关键点坐标信息 进行标准三角剖分,之后可以根据各二维关键点与标准人脸图像中各关键点的对应关系,建立各实际三角剖分与各标准三角剖分之间的纹理映射,最终根据其纹理映射将当前的人脸图像校正为正视姿态下的标准人脸图像。
上述操作可以实现人脸图像的姿态校正,但其校正效果的精准度较低,本步骤可通过各二维关键点的深度信息及坐标信息在三维空间中形成三维的实际人脸图像模型,之后可以根据几何变换矩阵将三维的实际人脸图像模型校正为正脸姿态的人脸图像模型,最终对正脸姿态的人脸图像模型进行投影映射形成二维的正脸姿态的人脸图像模型,由此可将该正脸姿态的人脸图像模型作为本实施例校正后的人脸正视图像。
本发明实施例一提供的一种视线校正方法,与现有的视线校正方案相比,该视线校正方法无需依赖特殊硬件或特殊摄像头,仅需两个普通的摄像头就能高效地对所捕获实景画面帧中被摄人物的视线进行校正处理,其成本消耗低且适用范围广,同时通过双摄像头还能带来更广的捕获视野,由此更好的增强了智能会议终端的实际使用体验。
实施例二
图2a为本发明实施例二提供的一种视线校正方法的流程示意图。本发明实施例二以上述实施例为基础进行优化,在本实施例中,可以将获取双摄像头同步捕获的两张当前画面帧,确定所述两张当前画面帧中各重合被摄点的深度信息,并合并形成一幅当前实景画面帧,进一步具体优化为:获取双摄像头在当前视频场景下同步捕获的两张当前画面帧;对所述两张当前画面帧进行立体匹配,获得所述两张当前画面帧中各重合被摄点的视差值;根据各重合被摄点的 视差值及深度计算公式,确定各重合被摄点的深度信息;根据设定的图像合并策略,将所述两张当前画面帧合并成一幅无缝高分辨率的当前实景画面帧。
进一步地,该视线校正方法所述确定所述两张当前画面帧中各重合被摄点的深度信息之后,还优化包括:基于所述各重合被摄点的深度信息,形成所述各重合被摄点对应的深度图;基于设定的图像平滑算法对所述深度图进行平滑优化处理,获得与所述各重合被摄点对应的优化后的深度信息。
在上述实施例的基础上,该方法在根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像之后,还优化包括:将所述当前实景画面帧中的人脸图像替换为所述人脸正视图像,获得校正实景画面帧;对所述校正实景画面帧进行边缘融合处理,并显示处理后的校正实景图像帧。
此外,本实施例还进一步将根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像,具体优化为:查找各重合被摄点的深度信息,确定所述二维关键点对应的深度信息;根据所述深度信息及所述坐标信息,对预设的三维人脸参数模型进行人脸图像拟合,获得所述当前实景画面帧中人脸图像的实际三维人脸模型;根据确定的几何变换矩阵,将所述实际三维人脸模型由当前姿态变换投影成二维的人脸正视图像。
如图2a所示,本发明实施例二提供的一种视线校正方法,具体包括如下操作:
在本实施例中,S201~S204具体描述了重合被摄点深度信息的获取过程。
S201、获取双摄像头在当前视频场景下同步捕获的两张当前画面帧。
示例性的,在视频通话时,可通过设置于智能终端上的光轴平行的双摄像头在当前视频场景下同步进行画面捕获,相当于在两个不同视角下获得同一场景的两张当前画面帧。
S202、对两张当前画面帧进行立体匹配,获得两张当前画面帧中各重合被摄点的视差值。
在本实施例中,所述对两张当前画面帧的立体匹配,具体可理解为从不同视角所捕获的两张或多张图像中找点匹配的对应点,其中,所述对应点可理解为本实施例中的重合被摄点,本实施例对两张当前画面帧进行立体匹配后,可以确定各重合被摄点的视差值。
具体地,本实施例可以通过基于区域(窗口)的双目匹配算法实现对应点的匹配,示例性的,将两张当前画面帧划分为特定个数的区域,然后在每个区域中确定是否存在相匹配的对应点;本实施例还可以通过基于特征的双目匹配算法实现对应点的匹配,示例性的,在两张当前画面帧中划分出包含真实世界中物体具有明显特征的各个区间,然后在各个区间中确定是否存在相匹配的对应点。
需要说明的是,实现上述立体匹配的方法有多种,每种方法都存在自身的优缺点,如基于区域(窗口)的双目匹配算法,能够很容易地恢复出高纹理区域的视差,但在低纹理区域会造成大量的误匹配,从而导致边界模糊,同时对遮挡的区域也很难进行处理;又如,基于特征的双目匹配方法提取的特征点由于对噪声不是太敏感,所以能得到一个比较精准的匹配,但由于图像中的特征点很稀疏,此种方法只能获得一个稀疏的视差图。本实施例并未对待使用的双目匹配算法作进行具体限定,上述双目匹配算法均可使用,且可根据具体应用场 景进行具体选择选择。
S203、根据各重合被摄点的视差值及深度计算公式,确定各重合被摄点的深度信息。
在本实施例中,深度计算公式表示为:
Figure PCTCN2017103270-appb-000001
其中,Z表示重合被摄点到智能终端的深度值,b表示双摄像头光心的连线距离,f表示双摄像头具有的焦距,d表示重合被摄点的视差值。基于上述公式以及确定的视差值,可以确定各重合被摄点的深度信息。
S204、基于所述各重合被摄点的深度信息,形成所述各重合被摄点对应的深度图。
本步骤基于上述确定的各重合被摄点的深度信息以及各重合被摄点在当前实景画面帧中的像素坐标信息,可以形成各重合被摄点对应的深度图。
S205、基于设定的图像平滑算法对所述深度图进行平滑优化处理,获得与所述各重合被摄点对应的优化后的深度信息。
在本实施例中,由于上述立体匹配算法的局限性,其确定出的深度信息具有的可靠性较低,而根据上述深度信息形成的深度图中存在较多的空洞,由此需要对深度图进行优化处理,以填补深度图中的空洞,本实施例可以采用图像平滑算法进行平滑优化处理,示例性的,所述图像平滑算法可以是拉普拉斯平滑算法以及二维自适应滤波平滑算法等。此外,所获得各重合被摄点对应的深度信息可以用于后续S208的操作。
需要说明的是,为加快本实施例中对深度信息的优化处理速度,可以仅考虑对当前实景画面帧中包含人脸图像的深度信息进行优化处理,但本步骤无需确定人脸图像的具体区域,由于人脸图像一般处于当前实景画面帧中的前景区 域,因此本实施例可以考虑仅对当前实景画面帧中的前景区域进行处理。具体地,本实施例可以通过确定周围平均深度值的方法判断所述当前实景画面帧中的前景区域。
S206、根据设定的图像合并策略,将所述两张当前画面帧合并成一幅无缝高分辨率的当前实景画面帧。
本步骤具体实现两张当前画面帧的拼接处理,基于本步骤可以将两张不同视角下拍摄的有重叠部分的图像拼接成一幅视野范围更广的无缝高分辨率图像。示例性地,本步骤中的图像合并策略可以是基于区域相关的拼接算法,也可以是基于特征相关的拼接算法。
具体的,所述基于区域相关的拼接算法的一种实现方式可表述为:首先将两张当前画面帧中的一张图像作为待配准图像,另一张作为参考图像,然后对待配准图像中一块区域与参考图像中的相同尺寸的区域使用最小二乘法或者其它数学方法计算其灰度值的差异,进行差异比较后来判断两张待拼接图像中重叠区域的相似程度,由此得到两张当前画面帧中重叠区域的范围和位置,从而实现两张当前画面帧的图像拼接。另一种实现方式可以通过FFT变换将两张当前画面帧的图像由时域变换到频域,然后建立两张当前画面帧之间的映射关系,当以两张当前画面帧中各块区域像素点灰度值的差别作为判别标准时,计算对应两块区域的像素点灰度值的相关系数,相关系数越大,则对应两块区域中图像的匹配程度越高,由此将图像匹配程度高的区域作为重叠区域,也可实现两张当前画面帧的拼接。
此外,基于特征相关的拼接算法的实现方式可表述为:首先基于特征进行重叠图像的匹配,该匹配过程不是直接利用每个当前画面帧中图像的像素值, 而是通过像素导出每个当前画面帧中图像的特征,然后以图像特征为标准,通过搜索匹配确定图像重叠部分的对应特征区域,由此实现两张当前画面帧的拼接,其中,该类拼接算法有比较高的健壮性和鲁棒性。
需要说明的是,基于特征进行重叠图像的匹配具有两个过程:特征抽取和特征配准。首先从两张当前画面帧中提取灰度变化明显的点、线、区域等特征形成特征集;然后在两张当前画面帧对应的特征集中利用特征匹配算法尽可能地将存在对应关系的特征对选择出来。在上述过程中,一系列的图像分割技术都被用到特征的抽取和边界检测上,如Canny算子、拉普拉斯高斯算子、区域生长。此外,提取出的空间特征包括有闭合的边界、开边界、交叉线以及其他特征。同时,可通过交叉相关、距离变换、动态编程、结构匹配、链码相关等算法实现上述过程中的特征配准操作。
需要注意的是,本实施例并未对待使用的图像拼接算法作进行具体限定,上述提出的图像拼接算法均可使用,本实施例可根据具体应用场景进行具体选择选择。
S207、检测所述当前实景画面帧中构成人脸图像的二维关键点,并确定所述二维关键点的坐标信息。
示例性的,本实施例优选的对当前实景画面帧中构成人脸图像的63个二维关键点进行检测,并可获取各二维关键点在所述当前实景画面帧中的坐标信息。
S208、查找各重合被摄点的深度信息,确定所述二维关键点对应的深度信息。
需要说明的是,本步骤所采用的深度信息可以是基于S203获得的初始深度信息,也可以是基于S205优化后的深度信息,本实施例优选的采用优化后的深 度信息进行后续的操作,由此可以更好地提高视线校正的精确性。
本步骤具体通过已确定的各重合被摄点的深度信息,本实施例可认为构成人脸图像的各二维关键点属于所述重合被摄点集合,由此可查找获得各二维关键点对应的深度信息。
在本实施例中,可通过下述S209和S210实现人脸图像的视线校正。
S209、根据所述深度信息及所述坐标信息,对预设的三维人脸参数模型进行人脸图像拟合,获得所述当前实景画面帧中人脸图像的实际三维人脸模型。
具体地,根据已确定的各二维关键点的深度信息和坐标信息,可以在给定的三维人脸参数模型上进行立体的人脸图像的拟合。所述三维人脸参数模型具体可理解为具有人脸轮廓的三维模型,其可根据所输入参数的不同,拟合出具有不同特征信息以及不同姿态的三维人脸模型。因此,本步骤可根据所输入的二维关键点的深度信息和坐标信息,拟合确定出对应于当前实景画面帧中人脸图像的实际三维人脸模型。
S210、根据确定的几何变换矩阵,将所述实际三维人脸模型由当前姿态变换投影成二维的人脸正视图像。
在本实施例中,拟合出的世纪三维人脸模型其具有的姿态可看作当前实景画面帧中人脸图像所具有的姿态(如仰视或者俯视等),本步骤可通过对该实际三维人脸模型的几何变换获得人脸图像的正视姿态。具体地,本步骤可以首先将实际三维人脸模型与第一几何变换矩阵相乘,在三维空间中确定一个三维人脸正视模型,之后根据第二几何变换矩阵与三维人脸正视模型相乘,将三维人脸正视模型的纹理投影到二维平面上,获得二维的人脸正视图像。此外,本步骤也可以首先将第一几何变换矩阵和第二几何变换矩阵相乘,获得第三几何 变换矩阵,最终将实际三维人脸模型与第三几何变化矩阵相乘,直接获得二维的人脸正视图像。
需要说明的是,本实施例中的第一几何变换矩阵由当前实景画面帧中所包含人物相对于智能终端屏幕的位置唯一确定,而所包含人物相对于智能终端屏幕的位置可通过上述深度信息获得,由此可根据构成人脸图像的深度信息唯一确定第一变换矩阵的具体值。本实施例中的第二几何变换矩阵具体用于三维到二维的降维投影,可根据三维空间下正姿态的三维人脸模型确定。
S211、将所述当前实景画面帧中的人脸图像替换为所述人脸正视图像,获得校正实景画面帧。
基于上述步骤获得所述人脸正视图像后,可基于本步骤进行人脸图像的替换获得校正实景画面帧,可以知道的是,所述校正实景画面帧中人脸图像所处的姿态为正视姿态,由此实现了视频通话时所捕获画面帧中人物视线的校正。
S212、对所述校正实景画面帧进行边缘融合处理,并显示处理后的校正实景图像帧。
需要说明的是,基于上述步骤形成的校正实景画面帧仅获得初步的校正效果,尽管视线得以校正,但替换合成的脸部边缘与原实景画面帧往往存在较大的不一致性,导致存在较明显的图像处理痕迹,因此,可基于本步骤对上述步骤的处理痕迹通过边缘融合的方法进行修复。
实现本步骤边缘融合的做法有多种,示例性的,可以将所形成校正实景图像帧中的人脸图像的轮廓外区域作为待切割区域,由此利用图像分割技术获得轮廓外区域的最佳切割边缘,之后与校正实景图像帧进行混合,最终获得边缘处理后的校正实景图像帧,本实施例最终可以将处理后的校正实景图像帧显示 到本端以及对端的屏幕上。
在上述实施例的基础上,本实施例还进一步通过图示描述了视线校正的实现过程,具体地,图2b~图2c给出了基于本发明实施例二提供的视线校正方法进行视线校正的处理流程图。如图2b所示,在智能终端的两侧分别设置了光轴平行的摄像头20,摄像头20可通过步骤S1同步捕获两张当前画面帧21;然后可通过步骤S2对两张当前画面帧21进行立体匹配,获得重合被摄点的深度信息22,并可通过步骤S3获得优化后的深度信息23,同时还可通过步骤S4对两张当前画面帧21进行拼接获得当前实景画面帧24;之后,可根据步骤S5通过已确定的深度信息23以及检测出的二维关键点对当前实景画面帧24中的人脸图像进行视线校正操作,获得视线校正后的校正实景画面帧25;可以发现,校正实景画面帧25中人脸图像的额头部分26存在处理痕迹,由此可通过步骤S6对校正实景画面帧25进行边缘融合处理,获得处理后的校正实景画面帧27;还可以发现,处理后的校正实景画面帧27中人脸图像的额头部分28平滑显示,较好的修复了处理痕迹;最终通过步骤S7在对端的智能终端和/或本端的智能终端上实时显示校正后的实景画面帧29。
进一步地,本实施例还给出了基于所提供视线校正方法进行视线校正的效果图,图2d给出了一组存在一个被摄人物的待进行视线校正的第一实景画面帧;图2e给出了对一组第一实景画面帧视角校正处理后的校正效果图。通过图2d和图2e的比对,可以看出,视线校正处理后的被摄人物显示为正视姿态,对端视频者可与该姿态下的被摄人物进行视线交流。
此外,图2f给出了一组存在多个被摄人物的待进行视线校正的第二实景画面帧;图2g给出了对上述一组第二实景画面帧进行视线校正处理后的校正效果 图。通过图2f和图2g的比对,可以看出,视线校正处理后的两个被摄人物均显示为正视姿态,对端视频者可与该姿态下的任一个被摄人物进行视线交流。
本发明实施例二提供的一种视线校正方法,具体描述了深度信息的确定过程,同时具体描述了画面帧中人物视线的校正过程,此外还增加了深度信息的优化操作以及人物视线校正后所形成校正画面帧的处理过程。利用该方法,能够通过双摄像头捕获的双画面帧确定各被摄点的深度信息,由此根据深度信息及检测的人脸关键点信息实现被摄人物的视线校正,与现有方法相比,该方法无需依赖特殊硬件或特殊摄像头,仅需两个普通的摄像头就能高效地对所捕获实景画面帧中被摄人物的视线进行校正处理,其成本消耗低且适用范围广,同时通过双摄像头还能带来更广的捕获视野,由此更好的增强了智能会议终端的实际使用体验。
实施例三
图3为本发明实施例三提供的一种视线校正装置的结构框图,该装置适用于视频通话时对所捕获画面帧中的人物进行视线校正的情况,该装置可由软件和/或硬件实现,并一般集成在具有视频通话功能的智能终端上。如图3所示,该装置包括:深度信息确定模块31、图像拼接合成模块32、关键点信息确定模块33以及人物视线校正模块34。
其中,深度信息确定模块31,用于获取双摄像头同步捕获的两张当前画面帧,确定所述两张当前画面帧中各重合被摄点的深度信息;
图像拼接合成模块32,用于将所述两张当前画面帧合并形成一幅当前实景画面帧;
关键点信息确定模块33,用于检测所述当前实景画面帧中构成人脸图像的二维关键点,并确定所述二维关键点的坐标信息;
人物视线校正模块34,用于根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像。
在本实施中,该视线校正装置首先深度信息确定模块31获取双摄像头同步捕获的两张当前画面帧,确定所述两张当前画面帧中各重合被摄点的深度信息;然后通过图像拼接合成模块32将所述两张当前画面帧合并形成一幅当前实景画面帧;然后通过关键点信息确定模块33检测所述当前实景画面帧中构成人脸图像的二维关键点,并确定所述二维关键点的坐标信息,最终通过人物视线校正模块34根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像。
本发明实施例三提供的一种视线校正装置,与现有的视线校正装置相比,该装置无需依赖特殊硬件或特殊摄像头,仅需两个普通的摄像头就能高效地对所捕获实景画面帧中被摄人物的视线进行校正处理,其成本消耗低且适用范围广,同时通过双摄像头还能带来更广的捕获视野,由此更好的增强了智能会议终端的实际使用体验。
进一步地,深度信息确定模块31,具体用于:获取双摄像头在当前视频场景下同步捕获的两张当前画面帧;对所述两张当前画面帧进行立体匹配,获得所述两张当前画面帧中各重合被摄点的视差值;根据各重合被摄点的视差值及深度计算公式,确定各重合被摄点的深度信息。
相应的,图像拼接合成模块32,具体用于:根据设定的图像合并策略,将所述两张当前画面帧合并成一幅无缝高分辨率的当前实景画面帧。
进一步地,该装置还优化增加了:
深度图确定模块35,用于在所述确定所述两张当前画面帧中各重合被摄点的深度信息之后,基于所述各重合被摄点的深度信息,形成所述各重合被摄点对应的深度图;
深度信息优化模块36,用于基于设定的图像平滑算法对所述深度图进行平滑优化处理,获得与所述各重合被摄点对应的优化后的深度信息。
进一步地,该装置还优化包括:
人脸图像替换模块37,用于在根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像之后,将所述当前实景画面帧中的人脸图像替换为所述人脸正视图像,获得校正实景画面帧;
校正图像处理模块38,用于对所述校正实景画面帧进行边缘融合处理,并显示处理后的校正实景画面帧。
在上述优化的基础上,人物视线校正模块34,具体用于:
查找各重合被摄点的深度信息,确定所述二维关键点对应的深度信息;根据所述深度信息及所述坐标信息,对预设的三维人脸参数模型进行人脸图像拟合,获得所述当前实景画面帧中人脸图像的实际三维人脸模型;根据确定的几何变换矩阵,将所述实际三维人脸模型由当前姿态变换投影成二维的人脸正视图像。
实施例四
图4为本发明实施例四提供的一种智能会议终端的硬件结构示意图,如图 4所示,本发明实施例四提供的智能会议终端,包括:光轴平行的两个摄像头41,处理器42和存储装置43。该智能会议终端中的处理器可以是一个或多个,图4中以一个处理器42为例,所述智能会议终端中的两个摄像头41可以通过总线或其他方式分别与处理器42和存储装置43连接,且处理器42和存储装置43也通过总线或其他方式连接,图4中以通过总线连接为例。
可以理解的是,智能会议终端属于上述智能终端中的一种,可以进行远程的视频会议通话。在本实施例中,智能会议终端中的处理器42可以控制两个摄像头41进行图像捕获,处理器42还可以根据两个摄像头所捕获的画面帧进行所需的操作,此外,两个摄像头41所捕获的画面帧还可以存储至存储装置43,以实现图像数据的存储。
该智能会议终端中的存储装置43作为一种计算机可读存储介质,可用于存储一个或多个程序,所述程序可以是软件程序、计算机可执行程序以及模块,如本发明实施例中视线校正方法对应的程序指令/模块(例如,附图3所示的视线校正装置中的模块,包括:深度信息确定模块31、图像拼接合成模块32、关键点信息确定模块33以及人物视线校正模块34)。处理器42通过运行存储在存储装置43中的软件程序、指令以及模块,从而执行智能会议终端的各种功能应用以及数据处理,即实现上述方法实施例中视线校正方法。
存储装置43可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据设备的使用所创建的数据等。此外,存储装置43可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储装置43可进一步包括相对于处理器42远程 设置的存储器,这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
并且,当上述智能会议终端所包括一个或者多个程序被所述一个或者多个处理器42执行时,程序进行如下操作:
获取双摄像头同步捕获的两张当前画面帧,确定所述两张当前画面帧中各重合被摄点的深度信息,并合并形成一幅当前实景画面帧;检测所述当前实景画面帧中构成人脸图像的二维关键点,并确定所述二维关键点的坐标信息;根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像。
此外,本发明实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被控制装置执行时实现本发明实施例一或实施例二提供的视线校正方法,该方法包括:获取双摄像头同步捕获的两张当前画面帧,确定所述两张当前画面帧中各重合被摄点的深度信息,并合并形成一幅当前实景画面帧;检测所述当前实景画面帧中构成人脸图像的二维关键点,并确定所述二维关键点的坐标信息;根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本发明可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、 闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。

Claims (10)

  1. 一种视线校正方法,其特征在于,包括:
    获取双摄像头同步捕获的两张当前画面帧,确定所述两张当前画面帧中各重合被摄点的深度信息,并合并形成一幅当前实景画面帧;
    检测所述当前实景画面帧中构成人脸图像的二维关键点,并确定所述二维关键点的坐标信息;
    根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像。
  2. 根据权利要求1所述的方法,其特征在于,所述获取双摄像头同步捕获的两张当前画面帧,确定所述两张当前画面帧中各重合被摄点的深度信息,并合并形成一幅当前实景画面帧,包括:
    获取双摄像头在当前视频场景下同步捕获的两张当前画面帧;
    对所述两张当前画面帧进行立体匹配,获得所述两张当前画面帧中各重合被摄点的视差值;
    根据各重合被摄点的视差值及深度计算公式,确定各重合被摄点的深度信息;
    根据设定的图像合并策略,将所述两张当前画面帧合并成一幅无缝高分辨率的当前实景画面帧。
  3. 根据权利要求1所述的方法,其特征在于,在所述确定所述两张当前画面帧中各重合被摄点的深度信息之后,还包括:
    基于所述各重合被摄点的深度信息,形成所述各重合被摄点对应的深度图;
    基于设定的图像平滑算法对所述深度图进行平滑优化处理,获得与所述各重合被摄点对应的优化后的深度信息。
  4. 根据权利要求1所述的方法,其特征在于,在根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像之后,还包括:
    将所述当前实景画面帧中的人脸图像替换为所述人脸正视图像,获得校正实景画面帧;
    对所述校正实景画面帧进行边缘融合处理,并显示处理后的校正实景图像帧。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像,包括:
    查找各重合被摄点的深度信息,确定所述二维关键点对应的深度信息;
    根据所述深度信息及所述坐标信息,对预设的三维人脸参数模型进行人脸图像拟合,获得所述当前实景画面帧中人脸图像的实际三维人脸模型;
    根据确定的几何变换矩阵,将所述实际三维人脸模型由当前姿态变换投影成二维的人脸正视图像。
  6. 一种视线校正装置,其特征在于,包括:
    深度信息确定模块,用于获取双摄像头同步捕获的两张当前画面帧,确定所述两张当前画面帧中各重合被摄点的深度信息;
    图像拼接合成模块,用于将所述两张当前画面帧合并形成一幅当前实景画面帧;
    关键点信息确定模块,用于检测所述当前实景画面帧中构成人脸图像的二维关键点,并确定所述二维关键点的坐标信息;
    人物视线校正模块,用于根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像。
  7. 根据权利要求6所述的装置,其特征在于,还包括:
    深度图确定模块,用于在所述确定所述两张当前画面帧中各重合被摄点的深度信息之后,基于所述各重合被摄点的深度信息,形成所述各重合被摄点对应的深度图;
    深度信息优化模块,用于基于设定的图像平滑算法对所述深度图进行平滑优化处理,获得与所述各重合被摄点对应的优化后的深度信息。
  8. 根据权利要求6所述的装置,其特征在于,还包括:
    人脸图像替换模块,用于在根据所述二维关键点对应的深度信息及所述坐标信息,在三维空间中校正所述人脸图像获得二维的人脸正视图像之后,将所述当前实景画面帧中的人脸图像替换为所述人脸正视图像,获得校正实景画面帧;
    校正图像处理模块,用于对所述校正实景画面帧进行边缘融合处理,并显示处理后的校正实景画面帧。
  9. 一种智能会议终端,其特征在于,包括:光轴平行的两个摄像头;
    一个或多个处理器;
    存储装置,用于存储一个或多个程序;
    所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-5中任一项所述的视线校正方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-5中任一项所述的视线校正方法。
PCT/CN2017/103270 2017-04-14 2017-09-25 视线校正方法、装置、智能会议终端及存储介质 WO2018188277A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710245026.6A CN106981078B (zh) 2017-04-14 2017-04-14 视线校正方法、装置、智能会议终端及存储介质
CN201710245026.6 2017-04-14

Publications (1)

Publication Number Publication Date
WO2018188277A1 true WO2018188277A1 (zh) 2018-10-18

Family

ID=59345693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/103270 WO2018188277A1 (zh) 2017-04-14 2017-09-25 视线校正方法、装置、智能会议终端及存储介质

Country Status (2)

Country Link
CN (1) CN106981078B (zh)
WO (1) WO2018188277A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886246A (zh) * 2019-03-04 2019-06-14 上海像我信息科技有限公司 一种人物注意力判断方法、装置、系统、设备和存储介质
CN111985280A (zh) * 2019-05-24 2020-11-24 北京小米移动软件有限公司 图像处理方法及装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106981078B (zh) * 2017-04-14 2019-12-31 广州视源电子科技股份有限公司 视线校正方法、装置、智能会议终端及存储介质
CN108196667A (zh) * 2017-09-30 2018-06-22 苏州美房云客软件科技股份有限公司 存储装置、计算机设备和基于虚拟现实技术的选房方法
CN108960097B (zh) * 2018-06-22 2021-01-08 维沃移动通信有限公司 一种获取人脸深度信息的方法及装置
CN111368608B (zh) * 2018-12-26 2023-10-13 杭州海康威视数字技术股份有限公司 一种人脸识别方法、装置及系统
WO2020210937A1 (en) * 2019-04-15 2020-10-22 Shanghai New York University Systems and methods for interpolative three-dimensional imaging within the viewing zone of a display
CN112085647B (zh) * 2019-06-14 2024-01-19 华为技术有限公司 一种人脸校正方法及电子设备
CN113191197B (zh) * 2021-04-01 2024-02-09 杭州海康威视系统技术有限公司 一种图像还原方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150228081A1 (en) * 2014-02-10 2015-08-13 Electronics And Telecommunications Research Institute Method and apparatus for reconstructing 3d face with stereo camera
CN104978548A (zh) * 2014-04-02 2015-10-14 汉王科技股份有限公司 一种基于三维主动形状模型的视线估计方法与装置
CN105763829A (zh) * 2014-12-18 2016-07-13 联想(北京)有限公司 一种图像处理方法及电子设备
CN105787884A (zh) * 2014-12-18 2016-07-20 联想(北京)有限公司 一种图像处理方法及电子设备
CN106503671A (zh) * 2016-11-03 2017-03-15 厦门中控生物识别信息技术有限公司 确定人脸姿态的方法和装置
CN106981078A (zh) * 2017-04-14 2017-07-25 广州视源电子科技股份有限公司 视线校正方法、装置、智能会议终端及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150228081A1 (en) * 2014-02-10 2015-08-13 Electronics And Telecommunications Research Institute Method and apparatus for reconstructing 3d face with stereo camera
CN104978548A (zh) * 2014-04-02 2015-10-14 汉王科技股份有限公司 一种基于三维主动形状模型的视线估计方法与装置
CN105763829A (zh) * 2014-12-18 2016-07-13 联想(北京)有限公司 一种图像处理方法及电子设备
CN105787884A (zh) * 2014-12-18 2016-07-20 联想(北京)有限公司 一种图像处理方法及电子设备
CN106503671A (zh) * 2016-11-03 2017-03-15 厦门中控生物识别信息技术有限公司 确定人脸姿态的方法和装置
CN106981078A (zh) * 2017-04-14 2017-07-25 广州视源电子科技股份有限公司 视线校正方法、装置、智能会议终端及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886246A (zh) * 2019-03-04 2019-06-14 上海像我信息科技有限公司 一种人物注意力判断方法、装置、系统、设备和存储介质
CN109886246B (zh) * 2019-03-04 2023-05-23 上海像我信息科技有限公司 一种人物注意力判断方法、装置、系统、设备和存储介质
CN111985280A (zh) * 2019-05-24 2020-11-24 北京小米移动软件有限公司 图像处理方法及装置
CN111985280B (zh) * 2019-05-24 2023-12-29 北京小米移动软件有限公司 图像处理方法及装置

Also Published As

Publication number Publication date
CN106981078A (zh) 2017-07-25
CN106981078B (zh) 2019-12-31

Similar Documents

Publication Publication Date Title
WO2018188277A1 (zh) 视线校正方法、装置、智能会议终端及存储介质
US11830141B2 (en) Systems and methods for 3D facial modeling
US10609282B2 (en) Wide-area image acquiring method and apparatus
TWI712918B (zh) 擴增實境的影像展示方法、裝置及設備
US10853625B2 (en) Facial signature methods, systems and software
US9684953B2 (en) Method and system for image processing in video conferencing
WO2019101113A1 (zh) 一种图像融合方法及其设备、存储介质、终端
JP4198054B2 (ja) 3dビデオ会議システム
JP4069855B2 (ja) 画像処理装置及び方法
Eng et al. Gaze correction for 3D tele-immersive communication system
CN103034330B (zh) 一种用于视频会议的眼神交互方法及系统
WO2015139454A1 (zh) 一种高动态范围图像合成的方法及装置
WO2010028559A1 (zh) 图像拼接方法及装置
US11068699B2 (en) Image processing device, image processing method, and telecommunication system to generate an output image for telecommunication
US9613404B2 (en) Image processing method, image processing apparatus and electronic device
Yang et al. Eye gaze correction with stereovision for video-teleconferencing
US9380263B2 (en) Systems and methods for real-time view-synthesis in a multi-camera setup
WO2018032841A1 (zh) 绘制三维图像的方法及其设备、系统
Seo et al. Automatic Gaze Correction based on Deep Learning and Image Warping
WO2018232630A1 (zh) 三维影像预处理方法、装置及头戴显示设备
JP2006024141A (ja) 画像処理装置及び方法、プログラム
EP3182367A1 (en) Apparatus and method for generating and visualizing a 3d model of an object
CN111080689B (zh) 确定面部深度图的方法和装置
JP2024062935A (ja) 立体視表示コンテンツを生成する方法および装置
JP2006024142A (ja) 画像処理装置及び方法、プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17905706

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.02.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17905706

Country of ref document: EP

Kind code of ref document: A1