WO2020172783A1 - 一种用于经颅磁刺激诊疗的人头姿态跟踪系统 - Google Patents

一种用于经颅磁刺激诊疗的人头姿态跟踪系统 Download PDF

Info

Publication number
WO2020172783A1
WO2020172783A1 PCT/CN2019/076104 CN2019076104W WO2020172783A1 WO 2020172783 A1 WO2020172783 A1 WO 2020172783A1 CN 2019076104 W CN2019076104 W CN 2019076104W WO 2020172783 A1 WO2020172783 A1 WO 2020172783A1
Authority
WO
WIPO (PCT)
Prior art keywords
camera
face
algorithm
pose
module
Prior art date
Application number
PCT/CN2019/076104
Other languages
English (en)
French (fr)
Inventor
孙聪
王波
蔡胜安
Original Assignee
武汉资联虹康科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉资联虹康科技股份有限公司 filed Critical 武汉资联虹康科技股份有限公司
Priority to CN201980001096.4A priority Critical patent/CN110268444A/zh
Priority to PCT/CN2019/076104 priority patent/WO2020172783A1/zh
Publication of WO2020172783A1 publication Critical patent/WO2020172783A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering

Definitions

  • the invention relates to the technical fields of computer vision, face recognition and the like, and in particular to a human head posture tracking system for transcranial magnetic stimulation diagnosis and treatment.
  • Binocular stereo vision is an important branch widely studied and applied in the field of computer vision.
  • the system is a simulation of the principle of similar vision systems.
  • the stereo matching algorithm calculates the parallax between the corresponding image points of the two images, combines the parallax image and the camera calibration parameters to obtain the three-dimensional coordinates of each point of the measured object in the scene, and then reconstructs the three-dimensional structure information of the scene and obtains the depth of the corresponding point
  • the depth value is the actual distance between the camera and the measured object.
  • Head posture parameter measurement is an important part of human-computer interaction, and it has high application value in computer vision, face recognition and driver fatigue detection.
  • sensor-based is to attach the sensor to the head to directly output the corresponding posture data, but due to the fact that the attached sensor will bring the patient's movement unchanged, the sensor itself is expensive, and the economical and practicality is poor.
  • This method is only suitable for high precision
  • the measurement is not suitable for a wide range of applications; the image-based method uses a computer to process the acquired image to obtain the attitude parameters. This method requires less equipment performance and only needs to be equipped with a camera and a computer to complete the measurement.
  • the most practical measurement method at present For example, the Chinese patent with publication number CN103558910B "A smart display system for automatically tracking head posture", the Chinese patent with publication number CN104106262B "head posture tracking using depth camera” and so on.
  • head posture information measurement methods based on machine vision
  • machine vision a method that combines a camera and a computer to complete head posture measurement.
  • the more common methods include methods based on statistical learning and methods based on registration tracking.
  • the former assumes that there is a specific correspondence between the posture of the human head and certain features of the face image, but this correspondence cannot be accurately described using traditional mathematical methods.
  • Using the method based on statistical learning needs to collect a large number of facial images in different poses for learning, and then establish the correspondence between the human head pose and the facial image features.
  • Beymer of MIT proposed a human head gesture recognition algorithm based on template matching.
  • this algorithm it is necessary to extract multiple human head images in different poses as samples. But in the specific operation, only need to input a human head image, then head posture can be analyzed and judged, and the head posture information in the input image can be calculated by the method based on template matching. Normally, this method requires a huge number of training samples in its implementation. If the number of samples is limited, an interpolation operation is also needed to calculate the posture information of the human head. This will not only lead to a huge amount of calculation, but the accuracy of the calculation results cannot be guaranteed.
  • the vision-based head posture information measurement technology can also be divided into: a method based on stereo vision and a method based on monocular vision.
  • R.G.Yang et al. proposed a model-based stereo vision head pose tracking method with better robustness. This method can be run in real time on an ordinary computer. It uses a personalized three-dimensional head model, plus the epipolar constraint of the stereo image pair, which greatly improves the robustness of head tracking.
  • This method can track the six-degree-of-freedom motion of a rigid body's head, and can be applied to the fields of human-computer interaction and sight correction in video conferences.
  • K.Terada proposed a human head tracking system based on stereo camera, using particle filter algorithm for the sequence of depth images collected from stereo camera.
  • the advantage of using depth images is that they are not sensitive to background clutter and light changes.
  • the head posture information measurement method based on monocular vision uses common geometric structures, such as plane, cylinder, or ellipsoid, to approximate the human head. In view of the unique characteristics of each geometric structure, the corresponding relationship between it and the human head image can be established, and then a variety of spatial posture information parameters of the human head can be calculated through the method of geometric derivation.
  • Q.Ji proposed a method for estimating and tracking the three-dimensional pose of a human face.
  • the method assumes that the three-dimensional human face is approximately an ellipse and the aspect ratio of the ellipse is known.
  • the pupils of the two eyes are used to constrain the ellipse of the face.
  • the angle estimation error will be relatively large.
  • S. Birchfield proposed an algorithm for tracking the human head.
  • the projection of the human head on the imaging plane is modeled as a two-dimensional ellipse.
  • the position of the head is obtained by color histogram or image gradient.
  • the advantage of this method is that the processing speed is fast and real-time can be achieved. However, changes in illumination and differences in skin color will cause tracking failure.
  • Another disadvantage of this method is that it cannot provide the head posture.
  • Peking University Liang Guoyuan proposed a method to calculate head pose parameters using only one camera. This is a model-based method. Its core idea is to use a three-dimensional scanner to build a three-dimensional head model. The three-dimensional model completes the measurement of head posture parameters of the monocular image sequence. For two consecutive frames of images, affine transformation is used to calculate the pose parameters of the previous frame, which is used as the reference pose, and certain constraints are added to the next frame of image using the generated model information to obtain the current pose parameters.
  • the method can complete the head posture parameter measurement well, but it is not suitable for actual measurement due to its complex algorithm and high requirements on equipment.
  • Tsinghua University Liu Kun and others proposed an image-based method, which uses the gradient histogram and principal component analysis of the image to obtain posture features, classifies the features of the image, and uses the SVM classification method to identify the acquired image. Get head pose parameters. This method has good robustness to illumination changes, but the error of the obtained attitude parameters is large.
  • Ma Bingpeng, Chinese Academy of Sciences and others proposed a method of using the apparent features of the image to obtain head pose parameters, using a one-dimensional Gabor filter for feature extraction, and analyzing and discriminating the extracted features to obtain the pose parameters. This method operates The speed is faster, but when the attitude changes greatly, the attitude estimation cannot be performed well.
  • TMS Transcranial Magnetic Stimulation
  • the specific realization is embodied as a fast current pulse passing through the stimulation coil to generate a strong instantaneous magnetic field, which passes through the skull and causes nearby nerve tissue Generate secondary currents, depolarize local neurons, and produce physiological effects.
  • the biological effect produced by it can last for a period of time after the stimulation stops, and it is non-invasive and painless.
  • It is a biostimulation technology that uses a time-varying magnetic field to generate induced currents and affects the action potential, blood flow, and metabolism of cerebral cortex neurons. It has been applied to the clinical treatment of schizophrenia.
  • the purpose of the present invention is to provide a human head posture tracking system for transcranial magnetic stimulation diagnosis and treatment in view of the problems in the prior art. Based on machine vision technology, combined with a camera and a computer to complete the measurement and tracking of human head posture.
  • a human head posture tracking system for transcranial magnetic stimulation diagnosis and treatment comprising a photographing device, an intelligent terminal, and a computer program of a program module executed by the intelligent terminal.
  • the photographing device includes a binocular camera and a binocular camera.
  • the camera calibration module for the internal and external parameters and the relationship between the cameras based on the two images acquired by the binocular camera from different angles of the same scene, the stereo matching algorithm is used to calculate the disparity map between the corresponding pixels of the two images;
  • the face detection module used to eliminate the non-face area in the input image; restore the three-dimensional space coordinates of the face in the binocular camera coordinate system through the disparity map and the internal and external parameters of the camera, and use the iterative nearest point algorithm to calculate the position of the head pose Pose estimation module.
  • the system of the present invention calibrates and corrects the binocular camera, collects the image of the template posture through the binocular camera, obtains the parallax of the feature point pixels on the left and right views through ASM feature point detection, and calculates the three-dimensional information of the feature points and the relative template posture Based on the pose relationship of the camera coordinate system; and calculate the pose relationship between the template pose and the target pose through the improved iterative nearest point algorithm.
  • the system of the present invention uses binocular cameras to obtain head posture images, uses a computer to process the head posture images to obtain accurate head postures, and feeds the obtained head postures to the mechanical control equipment in transcranial magnetic stimulation diagnosis and treatment in real time to control the treatment
  • the coil is kept on the area to be treated on the patient's head to improve the positioning accuracy of the treatment target in transcranial magnetic stimulation diagnosis and treatment.
  • the camera calibration module includes sub-modules for the following operations:
  • the working principle of the camera calibration module specifically includes:
  • the world coordinate (O w X w Y w Z w ) is adopted as the reference coordinate system of the system, and the coordinate calculation can be performed through the world coordinate system to achieve the purpose of mutual conversion with other coordinate systems.
  • the world coordinate system is used to establish the relative pose between the camera and the target in the binocular vision system.
  • the binocular vision system performs pose calculation, it first converts the position of the target in the coordinates to obtain the real physical position, and establish the physical coordinate system (O 1 xy) of the actual position unit (such as mm).
  • the perspective projection model is used as the camera imaging model.
  • the equivalent plane and the imaging plane are symmetrical about the origin, and the pinhole plane represents the plane where the optical center of the lens is actually located.
  • Point O is the optical center of the camera, and the focal length of the lens is F.
  • the focal lengths in the X and Y axis directions are different, which can be expressed as F x and F y respectively .
  • the left camera coordinate system is O-XYZ, and it is assumed to completely coincide with the world coordinate system.
  • the physical coordinate system of the left camera is O il -x il y il .
  • the effective focal length of the camera is F l .
  • the right camera coordinates are O cr -X cr Y cr Z cr .
  • the physical coordinate system is O ir -x ir y ir .
  • the effective focal length of the camera is F r .
  • the fundamental matrix F integrates all the parameters in the system, including the camera internal parameters and R and T describing the spatial relationship between the two cameras, and links the pixel coordinates.
  • Calibration determines the geometric structure parameters (R, T) between stereo image pairs, and uses Bouguet. algorithm for stereo correction. Rotate the two cameras. At this time, the R array is split into r r and r l . After rotation, the imaging planes are coplanar but not aligned. When realizing row alignment, a rotation matrix R rect that transforms the pole of the image to infinity is required. R rect can be described by equation (20)
  • e 1 is the unit vector of the displacement vector t
  • e 2 is orthogonal to e 1 and the chief ray
  • e 3 e 2 ⁇ e 1 , as shown in the following formula
  • R rect rotates the image around the principal point so that the epipolar lines are parallel and the pole is located at infinity.
  • the camera in the binocular system can achieve line alignment, and the alignment method is as follows:
  • the projection matrix that can achieve image row alignment is
  • M re_r , M re_l are the internal parameter matrix after correction
  • P re_l , P re_r are the reprojection matrix after correction
  • the camera pixel coordinates can be calculated from the above formula
  • the stereo matching module adopts a cross-scale cost aggregation stereo matching algorithm based on epipolar distance transformation to obtain the disparity map.
  • Using the method of the present invention can obtain a better disparity map in the face area.
  • the stereo matching module includes sub-modules for the following operations:
  • the cross-scale cost aggregation algorithm is used to calculate the disparity map after fusion.
  • the core idea of the method that combines multi-scale thinking and epipolar distance transformation is to perform epipolar distance transformation on images at different scales under the condition of a fixed search window ⁇ S w.
  • ⁇ S w For small-scale high-resolution images , The high-texture area is more abundant, and the initial ⁇ S w value is appropriately small, and the characteristic of "soft segmentation" of the image by the epipolar distance transformation can be retained in the high-texture area.
  • the search window ⁇ S w is relatively large for large-scale images, which satisfies the need for a sufficiently large search window for low-texture areas. Claim.
  • the disparity map after their fusion is calculated through the cross-scale cost aggregation algorithm.
  • the face detection module adopts an improved AdaBoost algorithm for face detection.
  • the AdaBoost algorithm first uses Haar-like features to characterize human faces, and uses integral graphs to speed up the process of evaluating Haar-like features; and then uses AdaBoost to filter out the best face rectangular features, which are called Weak classifier; Finally, these classifiers are connected in series to form a strong classifier to achieve the purpose of detecting human faces. At the same time, the method is not easily sensitive to changes in illumination, so it meets the requirements of the system of the present invention for face detection.
  • the face detection module includes sub-modules for the following operations:
  • the sample consists of a positive sample with a face and a negative sample without a face.
  • the positive sample uses face images covering different lighting and postures, while the negative sample uses various other categories Image;
  • the pose estimation module includes sub-modules for the following operations:
  • the face point cloud in the initial pose is used as the template point cloud, and the iterative nearest point algorithm with initial value estimation is used to match the template point cloud to the target point cloud to obtain an accurate pose estimation result.
  • the present invention uses binocular cameras to obtain head posture images, uses a computer to process the head posture images, obtains accurate head postures, and feeds back the obtained head postures in real time.
  • the treatment coil is controlled to be maintained on the area to be treated on the patient's head, so as to improve the positioning accuracy of the treatment target in the diagnosis and treatment of the transcranial magnetic stimulation;
  • the cross-scale cost aggregation stereo matching algorithm performs cost volume fusion to obtain the disparity map between the template pose image and the target pose image.
  • This algorithm can achieve correct matching in low-texture areas of the face and obtain a better disparity map; 3) this The invention uses the AdaBoost algorithm to detect the face area, eliminates images outside the face area, reduces the amount of calculations for stereo matching and reduces the interference in the head pose estimation; 4) The initial value estimation of the present invention for the traditional ICP algorithm will fall into the local minimum For the problem of large value and large amount of calculation, a method of using the ASM algorithm to estimate the initial value and using the weight method to eliminate the farthest point to reduce the amount of calculation is proposed, which improves the stability of the traditional ICP algorithm.
  • Figure 1 is a flow chart of the stereo matching algorithm
  • Figure 2 is a schematic diagram of the results of the algorithm of the present invention.
  • Figure 3 is a schematic diagram of the effect of the algorithm of the present invention in a large area of low texture
  • Figure 4 is a schematic diagram of the face region matching result of the algorithm of the present invention.
  • Figure 5 is a schematic diagram of the comparison between the algorithm of the present invention and the Yang algorithm
  • Figure 6 is a schematic diagram of the training process of the face classifier
  • Figure 7 is a partial image of the face of the experiment.
  • Fig. 8 is a flow chart of the algorithm for estimation of head pose of the present invention.
  • Figure 9 is a schematic diagram of template posture foot detection
  • Figure 10 is a schematic diagram of the initial value estimation of the closest point of the iteration
  • Figure 11 is a schematic diagram of attaching the template to the target point cloud in the improved ICP algorithm.
  • a human head posture tracking system for transcranial magnetic stimulation diagnosis and treatment comprising a photographing device, an intelligent terminal, and a computer program of a program module executed by the intelligent terminal.
  • the photographing device includes a binocular camera and a binocular camera.
  • the camera calibration module for the internal and external parameters and the relationship between the cameras based on the two images acquired by the binocular camera from different angles of the same scene, the stereo matching algorithm is used to calculate the disparity map between the corresponding pixels of the two images;
  • the face detection module used to eliminate the non-face area in the input image; restore the three-dimensional space coordinates of the face in the binocular camera coordinate system through the disparity map and the internal and external parameters of the camera, and use the iterative nearest point algorithm to calculate the position of the head pose Pose estimation module.
  • the camera calibration module includes sub-modules for the following operations: establishing a reference coordinate system, and establishing the relative pose between the camera and the target in the binocular vision system based on the reference coordinate system; establishing an imaging model , Make the target in the scene have a linear relationship with the image obtained by the camera; establish a binocular vision measurement model; calculate the internal parameters of the binocular camera and the rotation matrix and translation vector of the binocular camera; use the Bouguet algorithm to perform stereo correction of the binocular vision.
  • the binocular camera here consists of two cameras of the same type, and the camera hardware parameters are shown in Table 1:
  • the left and right camera parameters can be obtained respectively, and the internal parameters of the left camera in this system are calculated as:
  • the radial distortion coefficient is:
  • the parameters in the right camera are:
  • the radial distortion coefficient is:
  • the basic matrix of the binocular camera is:
  • the essential matrix of the binocular camera is:
  • the working principle of the stereo matching module is based on the accelerated matching of image Gaussian pyramids, the multi-scale cost volume is merged through the feature of different scales having different image frequencies, and the epipolar distance transform is adopted.
  • the cross-scale cost aggregation stereo matching algorithm realizes the stereo matching of different views, obtains the view difference, and solves the contradiction between the parallax quality and the operation speed.
  • Cross-scale cost aggregation based on epipolar distance transformation changes the gray value of matching primitives into F( OL ), and performs multi-scale fusion operations on the cost volume obtained after cost aggregation.
  • the algorithm flow chart is shown in Figure 1.
  • the image data is collected by a binocular system composed of ordinary web cameras.
  • Yang’s cross-scale cost aggregation is used to compare with the algorithm in this paper. It can be seen that Yang’s method has mismatched in the low-resolution face area, which is in the subsequent depth. The wrong point cloud data will be mapped in the mapping, which will bring a lot of trouble to the pose estimation.
  • the algorithm in this paper can match the continuous disparity in the face area, and the effect is shown in Figure 5.
  • the face detection module first uses Haar-like features to characterize the face, and speeds up the process of evaluating the Haar-like features with the help of an integral graph. Then use AdaBoost to filter out the best face rectangle features. This feature is called a weak classifier, and finally these classifiers are connected in series to form a strong classifier to achieve the purpose of detecting human faces.
  • the training flowchart of the face classifier is shown in Figure 6.
  • AdaBoost AdaBoost's detection of multi-pose face images depends on whether the training samples contain positive samples of multiple pose faces, whether the sample selection is reasonable directly affects the performance of the classifier.
  • the training samples are divided into positive samples of human faces and negative samples of non-human faces.
  • the selected sample images should be as rich and diverse as possible.
  • the positive samples need to include faces in different environments and different states, such as different lighting environments. Changing expressions and No accessories, etc.
  • the training process is shown in Figure 7. Complete the AdaBoost training according to the following process to obtain a classifier capable of detecting multi-pose face, and then perform multi-pose face detection on the generated cascade classifier.
  • the weak classifier is obtained through training, and then the strong classifier is constructed by voting weighted form.
  • p j represents the offset in the unequal sign direction, and its value is 1 or -1, and ⁇ j represents the threshold. Assign a value of 1 to positive samples and 0 to negative samples;
  • T training a strong classifier constructed by T weak classifiers is finally obtained, such as:
  • h t (x) represents a weak classifier.
  • the strong classifier of this embodiment is composed of T ⁇ t weak classifiers, which are superimposed in the form of cascade, so as to accurately and quickly detect human faces.
  • This embodiment uses the CMU PIE face database of Carnegie Mellon University to conduct experiments to verify the face recognition using the AdaBoost algorithm (a) and the improved AdaBoost algorithm (b) under strong light, low light, and attitude deflection environments. Rates, some of the experimental face images are shown in Figure 7, and the experimental results are shown in Table 3.
  • the improved AdaBoost algorithm has a recognition rate of 97%, 94%, and 92% in a strong light environment, a low light environment, and a posture deflection, which are all higher than the original Adaboost algorithm.
  • the average recognition rate of the improved algorithm is 94.33%.
  • the experimental results show that the improved Adaboost algorithm has a higher recognition rate and real-time performance.
  • the present invention addresses the problem that the initial value estimation of the traditional ICP algorithm will fall into a local minimum and the amount of calculation is large, and proposes to use the ASM algorithm to estimate the initial value and use the weight method to eliminate the farthest point minus The method of small amount of calculation improves the stability of the traditional ICP algorithm.
  • the system model is Windows 7 64-bit
  • the memory is 8GB
  • the processor is Inter Core i3 dual-core 2.30 GHz
  • the experimental platform is Visual Studio 2012.
  • a USB camera is used to form a binocular camera to capture images.
  • the algorithm flow chart is shown in Figure 8:
  • the template pose should try to face the camera lens as much as possible.
  • the parallax of the feature point pixels on the left and right views, and the feature points under the template pose The result is shown in Figure 9.
  • the three-dimensional information of the feature points and the pose relationship of the template pose relative to the camera coordinate system are calculated.
  • the initial value data obtained is shown in Table 4.
  • Figure 10 shows the detection results of the feature points of the three sets of template poses and target poses and the disparity map.
  • Figure (a) is the posture mainly rotated relative to the Z axis of the camera coordinate system
  • Figures (b) and (c) are postures rotated relative to the X, Y, and Z axis of the camera coordinate system.
  • Table 5 shows the initial estimation data of the three groups of attitudes in the above figure:
  • the face disparity map obtained by matching face detection and stereo matching is used to measure the coordinates of the main camera coordinate system and the binocular camera pixels at any point in the space in the binocular vision measurement model
  • the relationship between the coordinates, the corresponding point cloud of the disparity map is calculated, and the improved iterative nearest point algorithm is used to calculate the pose relationship between the point cloud in the template pose and the point cloud in the target pose.
  • Figure 11 describes the use of the traditional iterative closest point algorithm and the improved iterative closest point algorithm to estimate the pose relationship between the template pose and the target pose.
  • (a) is the template point cloud
  • (b) ⁇ (d) are the registration effect of the template point cloud to the target point cloud.
  • the images in (b) ⁇ (d) are the target pose point cloud, the result of traditional iterative closest point algorithm and target pose registration, and the result of improved iterative closest point algorithm and target pose registration.
  • the present invention first performs binocular camera calibration and correction, and then uses a face detection algorithm on the corrected binocular image to obtain a face area. Then the key points on the face are obtained by the ASM feature point detection algorithm. According to the internal and external parameters of the binocular camera, this group of feature points can be mapped into a three-dimensional sparse point cloud, and then the position of this group of feature points can be obtained through initial value estimation of singular value decomposition. Pose relationship, use this group of pose relationships as initial value estimates. And use cross-scale cost aggregation based on epipolar distance transformation to obtain dense face disparity map, and calculate dense face point cloud through binocular camera internal and external parameters.
  • the face point cloud in the initial pose is used as the template point cloud, and the iterative nearest point algorithm with initial value estimation is used to match the template point cloud to the target point cloud to obtain an accurate pose estimation result.
  • the binocular camera takes real-time head posture images, and performs stereo matching, face detection and pose estimation processing on the taken posture images to obtain real-time head posture and realize head posture tracking.
  • the obtained pose estimation results are fed back to the mechanical control equipment of transcranial magnetic stimulation diagnosis and treatment in real time, and the mechanical control equipment is adjusted in real time to keep the TMS coil in the effective area to be treated, so as to improve the accuracy of target location in TMS treatment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Processing (AREA)

Abstract

一种用于经颅磁刺激诊疗的人头姿态跟踪系统,包括拍摄装置、智能终端及由所述智能终端执行的程序模块的计算机程序,所述拍摄装置包括双目相机及将所述双目相机固定在能将人头完全纳入拍摄范围内的固定装置;所述智能终端由所述计算机程序的程序模块来驱动执行;所述计算机程序的程序模块包括:相机标定模块;立体匹配模块;人脸检测模块;位姿估计模块。所述系统基于机器视觉技术,结合相机和计算机完成人头姿态的测量及跟踪。

Description

一种用于经颅磁刺激诊疗的人头姿态跟踪系统 技术领域
本发明涉及计算机视觉、人脸识别等技术领域,具体涉及一种用于经颅磁刺激诊疗的人头姿态跟踪系统。
背景技术
从80年代末至今,计算机视觉在不断增加的实际应用需求推动下早已成为计算机行业中一个极其重要的研究领域,同时实际应用和理论研究的互相推动使该领域在各行各业中的应用也取得了巨大的进步。双目立体视觉是计算机视觉领域里被广泛研究和应用的一个重要分支,该系统是对类视觉系统原理的模拟,它主要采用双相机对同一场景从不同的角度获取两幅数字图像,然后利用立体匹配算法求出两幅图像对应像点间的视差,结合视差图像与相机标定所得参数求出场景内被测物体各点的三维坐标,进而重建出场景的三维结构信息并获得相应点的深度值,此深度值即为相机和被测物体之间的实际距离。
头部姿态参数测量是人机交互的一个重要部分,在计算机视觉、人脸识别和驾驶员疲劳检测等领域有很高的应用价值。目前人体头部姿态参数获得方法有基于传器感和基于图像两种。基于传感器的方法是在头部附着传感器直接输出对应的姿态数据,但由于附着传感器会给患者行动带来不变以及传感器自身价格高、经济实用性差等原因,使得这种方法只适合于高精度的测量,不适合于广泛应用;基于图像的方法则是利用计算机对获取的图像进行处理来得到姿态参数,这种方法对设备性能要求较低,只需要配备相机和计算机就可完成测量,是目前较为实用的测量方法。如公开号CN103558910B的中国专利“一种自动跟踪头部姿态的智能显示器系统”、公开号CN104106262B的中国专利“使用深度相机的头部姿态跟踪”等。
经过国内外科研人员的多年精心研究,现在已经出现了多种基于机器视觉的头部姿态信息测量方法,即,结合相机和计算机完成头部姿态测量的方法。根据不同的分类标准,我们可以将基于机器视觉的头部姿态信息测量技术进行分类。目前较为常见的方法包括了 基于统计学习的方法以及基于注册跟踪的方法。前者假设在人体头部姿态与人脸部图像的某些特征之间存在着特定的对应关系,只是这种对应关系不能够使用传统的数学方法来精确描述。使用基于统计学习的方法需要采集大量的位于不同姿态的人脸部图像来进行学习,进而建立人体头部姿态与人脸部图像特征的对应关系。MIT的D.J.Beymer提出了基于模板匹配的人体头部姿态识别算法。使用该算法时,便需要提取多幅位于不同姿态的人体头部图像作为样本。但在具体运行时,仅需输入一幅人体头部图像,便可以进行头部姿态的分析和判断,采用基于模板匹配的方法计算出输入图像中的头部姿态信息。通常情况下,该方法在实现时需要数量巨大的训练样本。如果样本数量有限,则还需要使用插值操作来计算人体头部姿态信息。这样将不仅会导致巨大的计算量,而且计算结果的准确性还无法得到保障。
根据在头部姿态信息测量过程中采用的相机数量,还可以将基于视觉的头部姿态信息测量技术分为:基于立体视觉的方法和基于单目视觉的方法。R.G.Yang等人提出了一种鲁棒性较好的基于模型的立体视觉头部姿态跟踪方法。该方法可以在普通电脑上实时运行,它使用了个性化的三维头部模型,加上立体图像对的极线约束,极大地提高了头部跟踪的鲁棒性。该方法可以跟踪刚体头部的六自由度运动,可以应用于人机交互领域以及视频会议中的视线修正等领域。K.Terada提出了一种基于立体摄像的人体头部跟踪系统,对从立体相机采集到的序列深度图像使用了粒子滤波算法。使用深度图像的优点是对背景杂波和光照变化不敏感。基于单目视觉的头部姿态信息测量方法,多采用常见的几何结构,如平面、圆柱或椭球等模型来近似模拟人体头部。针对每种几何结构的独特特点,可以建立其与人体头部图像之间的对应关系,然后通过几何推导的方法便可以计算人体头部的多种空间姿态信息参数。Q.Ji提出了一种用来估计和跟踪人脸三维姿态的方法,该方法假设三维人脸近似成椭圆并且椭圆纵横比已知。在此方法中,两个眼睛的瞳孔被用来约束脸部椭圆。但是对于存在噪声的图像,角度估计的误差会比较大。S.Birchfield提出了一种跟踪人头部的算法,人头部在成像平面上的投影被建模为一个二维椭圆。头部的位置通过彩色直方图或图像梯度获得。这种方法的优势在于处理速度快,可以实现实时性。然而光照的变化和 肤色的不同会导致跟踪的失败。这种方法的另一个缺点是不能够提供头部的姿态。R.Wooju等人提出了一种快速三维头部跟踪方法,使用了三维圆柱头部模型,可以工作在如快速姿态的变化等的各种复杂状态下。此方法还可以识别头部的姿势,如点头、摇头、眨眼等。但是由于圆柱仅是大概近似于头部几何,所以这个方法在旋转估计时精度不高。而且在人体头部距离相机比较远的时候,无法区分小的旋转运动和平移运动。曹万鹏研究了基于立体视觉的三维运动测量,提出了一种基于离散特征标记杆的立体视觉三维刚体运动及自旋转中心测量方法,解决了基于立体视觉的三维刚体运动目标建模与运动参数计算,运动序列图像中特征目标边缘检测与提取,立体序列图像间对应特征匹配以及圆形特征目标圆心提取等关键问题。北京大学梁国远提出一种在只用一部相机的情况下计算头部姿态参数的方法,这是一种基于模型的方法,它的核心思想是利用三维扫描仪来建立三维头部模型,利用这个三维模型完成对单目图像序列的头部姿态参数测量。对于连续的两帧图像利用仿射变换来计算前一帧图像的姿态参数,将其作为基准姿态,对后一帧图像利用生成的模型信息附加一定的约束条件来获取当前的姿态参数,这种方法可以较好的完成头部姿态参数测量,但是由于其算法复杂及对设备的要求较高,不适用于实际测量。清华大学刘坤等人提出一种基于图像的方法,它利用图像的梯度直方图和主成分分析来得到姿态特征,对图像的特征进行分类处理,利用SVM分类方法对获取的图像进行识别,近似的得到头部姿态参数。这种方法对光照变化具有很好的鲁棒性,但是得到的姿态参数误差较大。中国科学院马丙鹏等人提出了一种利用图像表观特征的方法来获取头部姿态参数,使用一维Gabor滤波器进行特征提取,对提取到的特征进行分析判别来获取姿态参数,这种方法运算速度较快,但是当姿态变化较大时就不能很好的进行姿态估计。
经颅磁刺激(Transcranial Magnetic Stimulation,TMS)作用原理是通过时变磁场诱发出感应电场,具体实现体现为一个快速电流脉冲通过刺激线圈,产生强大瞬间磁场,该磁场穿过颅骨,引起临近神经组织产生继发电流,使局部神经元发生去极化,从而产生生理效应。其产生的生物学效应可以持续到刺激停止后的一段时间,且具有无创、无痛苦的特点。它是一种利用时变磁场产生感应电流,影响大脑皮层神经元动作电位、血流量、新陈代谢的 生物刺激技术,目前已应用于精神分裂症的临床治疗。现阶段在进行经颅磁治疗时需要医护人员将线圈设备固定在患者头部的待治疗区域。为了获得更好的疗效,治疗期间患者应当保持头部姿态不变。但由于患者长期保持同一姿势造成的颈肩不适易使患者改变头部姿势,医护人员需要时刻关注治疗仪和患者头部是否对齐。该方法成本高、耗时长、误差大,因此医护人员需要一种速度快、精度高的人体头部姿态跟踪系统,该系统能够真实的反馈人体头部姿态变化。
发明内容
本发明的目的是针对现有技术存在的问题,提供一种用于经颅磁刺激诊疗的人头姿态跟踪系统,基于机器视觉技术,结合相机和计算机完成人头姿态的测量及跟踪。
为实现上述目的,本发明采用的技术方案是:
一种用于经颅磁刺激诊疗的人头姿态跟踪系统,包括拍摄装置、智能终端及由所述智能终端执行的程序模块的计算机程序,所述拍摄装置包括双目相机及将所述双目相机固定在能将人头完全纳入拍摄范围内的固定装置;所述智能终端由所述计算机程序的程序模块来驱动执行;所述计算机程序的程序模块包括:对双目相机进行标定,以获取各相机内外参数以及相机之间关系参数的相机标定模块;依据双目相机对同一场景从不同角度获取的两幅图像,利用立体匹配算法计算得到两幅图像对应像点间的视差图的立体匹配模块;用于剔除输入图像中的非面部区域的人脸检测模块;通过视差图与相机内外参数还原人脸面部在双目相机坐标系下三维空间坐标,并利用迭代最近点算法计算得到人头姿态的位姿估计模块。
本发明的系统对双目相机进行标定和校正,通过双目相机采集模板姿态的图像,通过ASM特征点检测得到特征点像素在左右视图上的视差,计算得到特征点的三维信息以及模板姿态相对于相机坐标系的位姿关系;并通过改进迭代最近点算法计算模板姿态与目标姿态的位姿关系,在得到模板姿态与目标姿态的位姿初值估计后,利用人脸检测与立体匹配得到的人脸视差图计算出视差图相应的点云,利用改进的迭代最近点算法计算模板姿态下的点云与目标姿态下的点云之间的位姿关系,将模板点云匹配到目标点云上,得到精确的 位姿估计结果。本发明的系统利用双目相机获取人头姿态图像,利用计算机对人头姿态图像进行处理,得到精确的人头姿态,并将得到的人头姿态实时反馈给经颅磁刺激诊疗中的机械控制设备,控制治疗线圈保持在患者头部的待治疗区域,提高经颅磁刺激诊疗中治疗靶点的定位准确性。
优选地,所述相机标定模块包括用于以下操作的子模块:
建立基准坐标系,基于所述基准坐标系建立双目视觉系统中相机与目标之间的相对位姿;
建立成像模型,使场景中的目标与相机获得图像具有线性联系;
建立双目视觉测量模型;
计算得到双目相机的内参以及双目相机的旋转矩阵和平移矢量;
采用Bouguet算法进行双目视觉的立体校正。
具体地,所述相机标定模块的工作原理具体包括:
采用世界坐标(O wX wY wZ w)为本系统的基准坐标系,可以通过世界坐标系进行坐标运算从而达到与其他坐标系相互转换的目的。在进行目标标定之前,使用该世界坐标系建立双目视觉系统中相机与目标之间的相对位姿。双目视觉系统进行位姿解算时,首先将目标在坐标中的位置进行坐标换算,得到真实的物理位置,建立实际位置单位(如mm)的物理坐标系(O 1xy)。(u,v)为某一点像素坐标系(O pu pv p)上的坐标位置,相机物理坐标系(O ix iy i),则坐标系之间的转换关系可用式(1)表示
Figure PCTCN2019076104-appb-000001
采用透视投影模型作为相机成像模型。等效平面与成像平面关于原点对称,针孔平面则是代表实际中镜头光心所在平面。O点为相机的光心,镜头的焦距为F。实际中由于误差的影响,X和Y轴方向上的焦距不同,可分别表示成F x和F y。空间中点P(X,Y,Z)及其 对应投影点p(x,y)。该模型通过简单的三角形相似关系建立空间点P到成像点p的关系。
在实际应用中,考虑相机的制造,安装等过程中的各种影响因素,构建实际的双目视觉测量模型。左相机坐标系O-XYZ,并假定其与世界坐标系完全重合。左相机物理坐标系为O il-x ily il。相机有效焦距为F l。右相机坐标是O cr-X crY crZ cr。物理坐标系为O ir-x iry ir。相机有效焦距为F r
依据现有的双目视觉测量模型,可得到下列相机内参矩阵M:
Figure PCTCN2019076104-appb-000002
空间一点P在像素坐标系下的坐标约束关系:
q r T(M r -1) TEM l -1q l=0  (3)
F=(M r -1) TEM l -1被称作为系统的基础矩阵。基础矩阵F其中融合系统中所有的参数,包括相机内参和描述两相机空间关系的R和T,并且将像素坐标联系起来。
标定确定立体图像对之间的几何结构参数(R,T),采用Bouguet.算法进行立体校正。将两相机进行旋转,此时R阵被拆分为r r和r l。经过旋转,成像平面虽共面但还不是行对准的。在实现行对准时,需要求出图像极点变换到无穷远的旋转矩阵R rect。R rect可以用式(20)来描述
Figure PCTCN2019076104-appb-000003
其中,e 1是位移矢量t的单位矢量,e 2与e 1和主光线正交,e 3=e 2×e 1,如下式子所示
Figure PCTCN2019076104-appb-000004
R rect将图像绕主点旋转,使得极线平行,并且使极点位于无穷远处。此时,双目系统中的相机可以实现行对准,对准方式如下式所示:
R l=R rectr l,R r=R rectr r  (6)
能够实现图像行对准的投影矩阵为
Figure PCTCN2019076104-appb-000005
式中,M re_r,M re_l是校正后内参数矩阵,P re_l,P re_r为校正后重投影矩阵,
Figure PCTCN2019076104-appb-000006
则为校正后内参数矩阵与投影矩阵:
Figure PCTCN2019076104-appb-000007
把空间任意点转换到相机像素坐标系下:
Figure PCTCN2019076104-appb-000008
由上式可计算出相机像素坐标
Figure PCTCN2019076104-appb-000009
优选地,所述立体匹配模块采用基于极线距离变换的跨尺度代价聚合立体匹配算法得到视差图。使用本发明方法能够在人脸区域获得更好的视差图。
优选地,所述立体匹配模块包括用于以下操作的子模块:
进行匹配代价计算;
利用跨尺度代价聚合算法计算出融合后的视差图。
具体地,将多尺度思想和极线距离变换结合的方法核心思想是在固定搜索窗口σ Sw不 变的情况下,对不同尺度下的图像进行极线距离变换,对于小尺度高分辨率图像,高纹理区域更丰富,初始σ Sw取值适当小时,可以在高纹理区域保留极线距离变换对图像“软分割”的特性。对于大尺度低分辨率图像已经丢失了高频分量,高纹理区域的信息较少,同时搜索窗口σ Sw相对大尺度图像来说也比较大,满足对低纹理区域要有足够大的搜索窗口要求。最后通过跨尺度代价聚合算法计算出它们融合后的视差图。
优选地,所述人脸检测模块采用改进的AdaBoost算法进行人脸检测。
具体地,所述AdaBoost算法首先采用Haar-like特征来表征人脸,借助积分图来加快Haar-like特征求值的过程;然后采用AdaBoost筛选出最好的人脸矩形特征,该特征被称为弱分类器;最后串联这些分类器,构成强分类器,实现检测人脸的目的。同时,该方法对光照的变化不容易敏感,因此满足本发明的系统对人脸检测的要求。
优选地,所述人脸检测模块包括用于以下操作的子模块:
加载已有的训练样本,该样本由含人脸的正样本与不含人脸的负样本组成,正样本采用涵盖不同光照及姿态的人脸图像,而负样本则采用其他类别各式各样的图像;
对正负样本图像中不同位置不同尺度的Haar-like特征进行计算,形成每个特征的弱分类器;
采用基于AdaBoost的迭代算法筛选出最优的弱分类器来构建一个强分类器;
采用大小不同的搜索窗遍历整幅待检图像从而搜寻图像中可能存在的人脸,若找到人脸,则用矩形框对人脸进行标识并提取出来。
优选地,所述位姿估计模块包括用于以下操作的子模块:
利用ASM特征点检测算法得到人脸面部的特征点;
根据双目相机内外参数将所述特征点映射为三维稀疏点云;
通过奇异值分解初值估计得到所述特征点的位姿关系,将所述位姿关系作为初值估计;
利用基于极线距离变换的跨尺度代价聚合得到人脸稠密视差图,通过双目相机内外参数计算出稠密人脸点云;
将初始姿态下的人脸点云作为模板点云,使用带初值估计的迭代最近点算法将模板点云匹配到目标点云上,得到精确的位姿估计结果。
与现有技术相比,本发明的有益效果是:1)本发明利用双目相机获取人头姿态图像,利用计算机对人头姿态图像进行处理,得到精确的人头姿态,并将得到的人头姿态实时反馈给经颅磁刺激诊疗中的机械控制设备,控制治疗线圈保持在患者头部的待治疗区域,提高经颅磁刺激诊疗中治疗靶点的定位准确性;2)本发明采用基于极线距离变换的跨尺度代价聚合立体匹配算法进行代价卷融合,获取模板姿态图像与目标姿态图像之间的视差图,该算法在人脸低纹理区域能实现正确匹配,获得更好的视差图;3)本发明利用AdaBoost算法进行人脸区域检测,剔除人脸面部区域外的图像,降低立体匹配的运算量并减少人头姿态估计中的干扰;4)本发明针对传统ICP算法的初值估计会陷入局部最小值以及运算量大的问题,提出了利用ASM算法估计初值并用权重法剔除距离最远点减小运算量的方法,提高了传统ICP算法的稳定性。
附图说明
图1为立体匹配算法的流程图;
图2为本发明算法的结果示意图;
图3为本发明算法在大片低纹理区域的效果示意图;
图4为本发明算法的人脸区域匹配结果示意图;
图5为本发明算法与Yang算法的对比示意图;
图6为人脸分类器训练流程示意图;
图7为部分实验人脸图;
图8为本发明人头姿态估计的算法流程图;
图9为模板姿态脚点检测示意图;
图10为迭代最近点的初值估计示意图;
图11为改进ICP算法中将模板贴合到目标点云的示意图。
具体实施方式
下面将结合本发明中的附图,对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动条件下所获得的所有其它实施例,都属于本发明保护的范围。
一种用于经颅磁刺激诊疗的人头姿态跟踪系统,包括拍摄装置、智能终端及由所述智能终端执行的程序模块的计算机程序,所述拍摄装置包括双目相机及将所述双目相机固定在能将人头完全纳入拍摄范围内的固定装置;所述智能终端由所述计算机程序的程序模块来驱动执行;所述计算机程序的程序模块包括:对双目相机进行标定,以获取各相机内外参数以及相机之间关系参数的相机标定模块;依据双目相机对同一场景从不同角度获取的两幅图像,利用立体匹配算法计算得到两幅图像对应像点间的视差图的立体匹配模块;用于剔除输入图像中的非面部区域的人脸检测模块;通过视差图与相机内外参数还原人脸面部在双目相机坐标系下三维空间坐标,并利用迭代最近点算法计算得到人头姿态的位姿估计模块。
作为一种实施方式,所述相机标定模块包括用于以下操作的子模块:建立基准坐标系,基于所述基准坐标系建立双目视觉系统中相机与目标之间的相对位姿;建立成像模型,使场景中的目标与相机获得图像具有线性联系;建立双目视觉测量模型;计算得到双目相机的内参以及双目相机的旋转矩阵和平移矢量;采用Bouguet算法进行双目视觉的立体校正。这里的双目相机由两个同款摄像头组成,相机硬件参数如表1所示:
表1相机硬件参数表
Figure PCTCN2019076104-appb-000010
利用左右相机各自拍摄不同角度的目标图像,可以分别得到左右相机参数,求得本系统中左相机内参数为:
Figure PCTCN2019076104-appb-000011
其径向畸变系数为:
[k L1,k L2,k L3]=[0.227,-1.607,3.534]                   (2)
右相机内参数为:
Figure PCTCN2019076104-appb-000012
其径向畸变系数为:
[k R1,k R2,k R3]=[0.161,-0.373,-1.488]                  (4)
右相机坐标系旋转至左相机坐标系的旋转矩阵:
Figure PCTCN2019076104-appb-000013
右相机坐标系平移至左相机坐标系的平移向量:
Figure PCTCN2019076104-appb-000014
双目相机的基本矩阵为:
Figure PCTCN2019076104-appb-000015
双目相机的本质矩阵为:
Figure PCTCN2019076104-appb-000016
其中两个相机的标定重投影误差如表2所示:
表2双目相机标定的重投影误差
Figure PCTCN2019076104-appb-000017
作为一种实施方式,所述立体匹配模块的工作原理是,在图像高斯金字塔的加速匹配基础上,通过不同尺度具有不同图像频率这一特点将多尺度的代价卷融合,采用基于极线距离变换的跨尺度代价聚合立体匹配算法实现不同视图的立体匹配,获取视图差,解决了视差质量与运算速度之间的矛盾。将
Figure PCTCN2019076104-appb-000018
设为固定值T,则有
Figure PCTCN2019076104-appb-000019
其中w S+1=w S/η,η为采样尺度,本文η取值为2,σ 0取值为0.1,则:
Figure PCTCN2019076104-appb-000020
基于极线距离变换的跨尺度代价聚合将匹配基元灰度值变为F(O L),并对代价聚合后得到的代价卷做多尺度融合运算。其算法流程图如图1所示。
如图2所示,根据算法流程图匹配出的视差图效果与普通块匹配算法效果比较可以看出,本文算法在低纹理区域都能正确匹配,而普通区域算法(Fix Window)在低纹理区域表现不尽人意。
在大片的低纹理区域中,普通区域匹配算法与本文算法比较如图3所示,可见本文算法效果优于普通区域匹配算法。
如图4,根据已有的人脸图片进行立体匹配,本文算法在相机低分辨率且在人脸低纹理区域相较于其他算法匹配效果更好。
通过普通网络相机组成的双目系统采集图像数据,图像校正后使用Yang的跨尺度代价聚合与本文算法比较,可见Yang的方法在低分辨率的人脸区域已经出现误匹配,这在后续的深度映射中会映射出错误的点云数据,从而给姿态估计带来不小的麻烦,而本文算法可以在人脸区域匹配出连续的视差,效果如图5所示。
作为一种实施方式,所述人脸检测模块首先采用Haar-like特征来表征人脸,借助积分图来加快Haar-like特征求值的过程。然后采用AdaBoost筛选出最好的人脸矩形特征。该特征被称为弱分类器,最后串联这些分类器,构成强分类器,实现检测人脸的目的。人脸分类器训练流程图如图6所示。
具体为:
(1)训练样本的收集
因为AdaBoost检测多姿态人脸图像依赖于训练样本中是否包含多种姿态人脸的正样本,所以样本选取是否合理直接影响分类器的性能。训练样本分为人脸的正样本和非人脸的负样本,选取样本图像应尽可能丰富多样,正样本需包含不同环境下和不同状态的人脸,例如不同光照的环境.变化的表情和有无佩戴饰物等。
(2)人脸分类器的训练
训练流程如图7所示。按照以下流程完成AdaBoost训练即可获得能够检测多姿态人脸的分类器,然后将生成的级联分类器进行多姿态人脸检测。
当利用积分图方法获得特征值的结果后,随即通过训练获得弱分类器,再通过投票加权的形式构造强分类器。假设输入的m个训练样本为(x 1,y 1),(x 2,y 2),…,(x m,y m),其中y i∈{0,1},i=1,2,...,m。当y i=0时,表示为负样本;当y i=1时,表示为正样本,其实现过程描述如下:
首先将全部样本的权重作初始化处理,当训练开始时将样本指定为均匀分布,如:
Figure PCTCN2019076104-appb-000021
其次将全部样本经过T轮训练,t=1,2,...,T(T即表示弱分类器数目):
(1)归一化:
Figure PCTCN2019076104-appb-000022
(2)训练弱分类器h j(x),如:
Figure PCTCN2019076104-appb-000023
式中,p j表示不等号方向的偏置,其值取1或-1,θ j表示阈值。为正样本赋值为1,为负样本赋值为0;
(3)加入强分类器之中。如:
Figure PCTCN2019076104-appb-000024
(4).再依据最小的加权检测错误率将所有样本的权重进行微调,如:
Figure PCTCN2019076104-appb-000025
式中,
Figure PCTCN2019076104-appb-000026
e t为分类结果,若为0则表示准确分类。
最后在T次训练之后,最终获得由T个弱分类器构建而成的强分类器,如:
Figure PCTCN2019076104-appb-000027
式中,
Figure PCTCN2019076104-appb-000028
h t(x)表示弱分类器。当
Figure PCTCN2019076104-appb-000029
则h(x)=1,于是标记x为正样本。本实施方式的强分类器由T个ε t弱分类器构成,并通过级联的形式进行叠加,从而准确快速地检测人脸。
本实施方式使用卡耐基梅隆大学的CMU PIE人脸库进行实验,分别验证在强光、弱 光、姿态偏转环境下分别使用AdaBoost算法(a)与改进后的AdaBoost算法(b)的人脸识别率,部分实验人脸图如图7所示,实验结果如表3所示。
表3不同光照条件下实验结果
Figure PCTCN2019076104-appb-000030
由表3可知,改进后的AdaBoost算法在强光照环境、弱光环境和姿态偏转情况下识别率分别为97%、94%、92%,都高于原Adaboost算法。改进后的算法平均识别率为94.33%,实验结果表明,改进后的Adaboost算法,具有较高的识别率,同时具有实时性。
作为一种实施方式,对于人头姿态估计,本发明针对传统ICP算法的初值估计会陷入局部最小值以及运算量大的问题,提出了利用ASM算法估计初值并用权重法剔除距离最远点减小运算量的方法,提高了传统ICP算法的稳定性。
实施例
本实施例中所有算法的实验环境:系统型号为Windows7 64位,内存为8GB,处理器为Inter Core i3双核2.30GHz,实验平台为Visual Studio 2012。本实施例利用USB摄像头组成双目相机捕捉图像。其算法流程图如图8所示:
首先进行双目相机标定与校正。
在完成双目相机标定与校正后,首先采集模板姿态的图像,模板姿态应该尽量将面部正对相机镜头,通过ASM特征点检测得到特征点像素在左右视图上的视差,模板姿态下的特征点结果如图9所示。
计算得到特征点的三维信息以及模板姿态相对于相机坐标系的位姿关系,得到的初值数据如表4所示,其中,欧拉角(Yaw,Pitch,Roll):
表4模板姿态矫正数据
Figure PCTCN2019076104-appb-000031
计算出相机坐标系和模板姿态关系后,通过改进迭代最近点方法计算模板姿态与目标姿态的位姿关系。首先需要通过人脸模板特征点与目标特征点进行位姿初值估计,图10为三组模板姿态与目标姿态的特征点检测结果以及视差图。其中,图(a)为主要相对于相机坐标系Z轴旋转的姿态,图(b)与图(c)为相对于相机坐标系X,Y,Z轴旋转的姿态。
通过ASM特征点检测与双目视觉测量模型中的坐标关系,可以得到模板姿态与目标姿态三维人脸特征点的点对,通过初值估计可以计算得到各组点对的旋转与平移估计值,表5表示了上图三组姿态的初值估计数据:
表5各个姿态下相应的初值估计
Figure PCTCN2019076104-appb-000032
在得到模板姿态与目标姿态的姿态初值估计后,将人脸检测与立体匹配得到的人脸视差图通过式双目视觉测量模型中空间任一点的主相机坐标系下坐标与双目相机像素坐标的关系,计算出视差图相应的点云,利用改进的迭代最近点算法计算模板姿态下的点云与目标姿态下的点云之间的位姿关系。
图11描述了分别利用传统迭代最近点算法与改进迭代最近点算法去估计模板姿态与目标姿态之间的位姿关系。其中,(a)为模板点云,(b)~(d)为模板点云配准到目标点云效果。(b)~(d)中图像分别为目标姿态点云、传统迭代最近点算法结果与目标姿态配准结果、改进迭代最近点算法结果与目标姿态配准结果。
可以看到传统迭代最近点算法在配准结果并不能将模板姿态点云很好的与目标姿态点云配准在一起,通过观察发现在目标姿态为(b)的主要围绕Z轴旋转的姿态情况下,传统算法与改进算法效果差别不明显,但在(c)与(d)的稍复杂的情况下,传统算法明显将模板姿态点云配准超出了目标姿态点云范围,而改进迭代最近点算法明显减少了点云间配准的错误。表6为通过改进ICP计算得到的姿态估计数据。
表6改进迭代最近点算法结果
Figure PCTCN2019076104-appb-000033
本发明首先进行双目相机标定与校正,然后在校正后的双目图像上使用人脸检测算法得到人脸区域。再通过ASM特征点检测算法得到人脸上的关键点,根据双目相机内外参数可以将这组特征点映射为三维稀疏点云,再通过奇异值分解初值估计可以得到这组特征点的位姿关系,使用这组位姿关系作为初值估计。并且利用基于极线距离变换的跨尺度代价聚合得到人脸稠密视差图,通过双目相机内外参数计算出稠密人脸点云。将初始姿态下的人脸点云作为模板点云,使用带初值估计的迭代最近点算法将模板点云匹配到目标点云上,从而得到精确的位姿估计结果。双目相机实时拍摄人头姿态图像,并对拍摄的姿态图像进行立体匹配、人脸检测和位姿估计的处理,获取实时的人头姿态,实现人头姿态的跟踪。得到的位姿估计结果实时反馈给经颅磁刺激诊疗的机械控制设备,机械控制设备进行实时调整,将TMS线圈保持待待治疗的有效区域内,提高TMS治疗中靶点定位的准确性。
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。

Claims (7)

  1. 一种用于经颅磁刺激诊疗的人头姿态跟踪系统,其特征在于,包括拍摄装置、智能终端及由所述智能终端执行的程序模块的计算机程序,所述拍摄装置包括双目相机及将所述双目相机固定在能将人头完全纳入拍摄范围内的固定装置;所述智能终端由所述计算机程序的程序模块来驱动执行;所述计算机程序的程序模块包括:对双目相机进行标定,以获取各相机内外参数以及相机之间关系参数的相机标定模块;依据双目相机对同一场景从不同角度获取的两幅图像,利用立体匹配算法计算得到两幅图像对应像点间的视差图的立体匹配模块;用于剔除输入图像中的非面部区域的人脸检测模块;通过视差图与相机内外参数还原人脸面部在双目相机坐标系下三维空间坐标,并利用迭代最近点算法计算得到人头姿态的位姿估计模块。
  2. 根据权利要求1所述的一种用于经颅磁刺激诊疗的人头姿态跟踪系统,其特征在于,所述相机标定模块包括用于以下操作的子模块:
    建立基准坐标系,基于所述基准坐标系建立双目视觉系统中相机与目标之间的相对位姿;
    建立成像模型,使场景中的目标与相机获得图像具有线性联系;
    建立双目视觉测量模型;
    计算得到双目相机的内参以及双目相机的旋转矩阵和平移矢量;
    采用Bouguet算法进行双目视觉的立体校正。
  3. 根据权利要求1所述的一种用于经颅磁刺激诊疗的人头姿态跟踪系统,其特征在于,所述立体匹配模块采用基于极线距离变换的跨尺度代价聚合立体匹配算法得到视差图。
  4. 根据权利要求3所述的一种用于经颅磁刺激诊疗的人头姿态跟踪系统,其特征在于,所述立体匹配模块包括用于以下操作的子模块:
    进行匹配代价计算;
    在固定搜索窗口不变的情况下,对不同尺度下的图像进行极线距离变换;
    利用跨尺度代价聚合算法计算出融合后的视差图。
  5. 根据权利要求1所述的一种用于经颅磁刺激诊疗的人头姿态跟踪系统,其特征在于, 所述人脸检测模块采用改进的AdaBoost算法进行人脸检测。
  6. 根据权利要求5所述的一种用于经颅磁刺激诊疗的人头姿态跟踪系统,其特征在于,所述人脸检测模块包括用于以下操作的子模块:
    加载已有的训练样本,该样本由含人脸的正样本与不含人脸的负样本组成,正样本采用涵盖不同光照及姿态的人脸图像,而负样本则采用其他类别各式各样的图像;
    对正负样本图像中不同位置不同尺度的Haar-like特征进行计算,形成每个特征的弱分类器;
    采用基于AdaBoost的迭代算法筛选出最优的弱分类器来构建一个强分类器;
    采用大小不同的搜索窗遍历整幅待检图像从而搜寻图像中可能存在的人脸,若找到人脸,则用矩形框对人脸进行标识并提取出来。
  7. 根据权利要求1所述的一种用于经颅磁刺激诊疗的人头姿态跟踪系统,其特征在于,所述位姿估计模块包括用于以下操作的子模块:
    利用ASM特征点检测算法得到人脸面部的特征点;
    根据双目相机内外参数将所述特征点映射为三维稀疏点云;
    通过奇异值分解初值估计得到所述特征点的位姿关系,将所述位姿关系作为初值估计;
    利用基于极线距离变换的跨尺度代价聚合得到人脸稠密视差图,通过双目相机内外参数计算出稠密人脸点云;
    将初始姿态下的人脸点云作为模板点云,使用带初值估计的迭代最近点算法将模板点云匹配到目标点云上,得到精确的位姿估计结果。
PCT/CN2019/076104 2019-02-26 2019-02-26 一种用于经颅磁刺激诊疗的人头姿态跟踪系统 WO2020172783A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980001096.4A CN110268444A (zh) 2019-02-26 2019-02-26 一种用于经颅磁刺激诊疗的人头姿态跟踪系统
PCT/CN2019/076104 WO2020172783A1 (zh) 2019-02-26 2019-02-26 一种用于经颅磁刺激诊疗的人头姿态跟踪系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/076104 WO2020172783A1 (zh) 2019-02-26 2019-02-26 一种用于经颅磁刺激诊疗的人头姿态跟踪系统

Publications (1)

Publication Number Publication Date
WO2020172783A1 true WO2020172783A1 (zh) 2020-09-03

Family

ID=67912983

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/076104 WO2020172783A1 (zh) 2019-02-26 2019-02-26 一种用于经颅磁刺激诊疗的人头姿态跟踪系统

Country Status (2)

Country Link
CN (1) CN110268444A (zh)
WO (1) WO2020172783A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112419389A (zh) * 2020-11-25 2021-02-26 中科融合感知智能研究院(苏州工业园区)有限公司 一种实现双目增量视差匹配算法的方法及装置
CN113627261A (zh) * 2021-07-12 2021-11-09 深圳市瑞立视多媒体科技有限公司 一种恢复头部刚体正确位姿的方法及其装置、设备、存储介质
CN113689555A (zh) * 2021-09-09 2021-11-23 武汉惟景三维科技有限公司 一种双目图像特征匹配方法及系统
CN114155289A (zh) * 2021-12-08 2022-03-08 电子科技大学 基于双目视觉的电点火系统电火花轮廓尺寸测量方法
CN115880783A (zh) * 2023-02-21 2023-03-31 山东泰合心康医疗科技有限公司 用于儿科保健的儿童运动姿态识别方法
CN116630382A (zh) * 2023-07-18 2023-08-22 杭州安劼医学科技有限公司 神经调控图像监测配准系统和控制方法
CN116883945A (zh) * 2023-07-21 2023-10-13 江苏省特种设备安全监督检验研究院 一种融合目标边缘检测和尺度不变特征变换的人员识别定位方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807585A (zh) * 2019-10-30 2020-02-18 山东商业职业技术学院 一种学员课堂学习状态在线评估方法及系统
CN111611913A (zh) * 2020-05-20 2020-09-01 北京海月水母科技有限公司 一种单目人脸识别探头人形定位技术
CN111672029A (zh) * 2020-06-04 2020-09-18 杭州师范大学 基于颅表解剖标志的智能导航方法、导航系统及导航仪
CN111729200B (zh) * 2020-07-27 2022-06-17 浙江大学 基于深度相机和磁共振的经颅磁刺激自动导航系统和方法
CN112489113B (zh) * 2020-11-25 2024-06-11 深圳地平线机器人科技有限公司 相机外参标定方法、装置及相机外参标定系统
CN114299120B (zh) * 2021-12-31 2023-08-04 北京银河方圆科技有限公司 补偿方法、注册方法和可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345751A (zh) * 2013-07-02 2013-10-09 北京邮电大学 一种基于鲁棒特征跟踪的视觉定位方法
US20140002605A1 (en) * 2012-06-27 2014-01-02 Imec Taiwan Co. Imaging system and method
CN104036488A (zh) * 2014-05-04 2014-09-10 北方工业大学 一种基于双目视觉的人体姿态动作研究方法
CN106851252A (zh) * 2017-03-29 2017-06-13 武汉嫦娥医学抗衰机器人股份有限公司 自适应变基线双目立体相机系统
CN108416791A (zh) * 2018-03-01 2018-08-17 燕山大学 一种基于双目视觉的并联机构动平台位姿监测与跟踪方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108074266A (zh) * 2016-11-09 2018-05-25 哈尔滨工大天才智能科技有限公司 一种机器人的机器视觉构造方法
CN108749819B (zh) * 2018-04-03 2019-09-03 吉林大学 基于双目视觉的轮胎垂向力估算系统及估算方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140002605A1 (en) * 2012-06-27 2014-01-02 Imec Taiwan Co. Imaging system and method
CN103345751A (zh) * 2013-07-02 2013-10-09 北京邮电大学 一种基于鲁棒特征跟踪的视觉定位方法
CN104036488A (zh) * 2014-05-04 2014-09-10 北方工业大学 一种基于双目视觉的人体姿态动作研究方法
CN106851252A (zh) * 2017-03-29 2017-06-13 武汉嫦娥医学抗衰机器人股份有限公司 自适应变基线双目立体相机系统
CN108416791A (zh) * 2018-03-01 2018-08-17 燕山大学 一种基于双目视觉的并联机构动平台位姿监测与跟踪方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANG HUANG : "Research of Human Head Pose Estimation Based on Binocular Vision", THESES , 31 January 2019 (2019-01-31), CN, pages 1 - 69, XP009522810 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112419389A (zh) * 2020-11-25 2021-02-26 中科融合感知智能研究院(苏州工业园区)有限公司 一种实现双目增量视差匹配算法的方法及装置
CN112419389B (zh) * 2020-11-25 2024-01-23 中科融合感知智能研究院(苏州工业园区)有限公司 一种实现双目增量视差匹配算法的方法及装置
CN113627261A (zh) * 2021-07-12 2021-11-09 深圳市瑞立视多媒体科技有限公司 一种恢复头部刚体正确位姿的方法及其装置、设备、存储介质
CN113689555A (zh) * 2021-09-09 2021-11-23 武汉惟景三维科技有限公司 一种双目图像特征匹配方法及系统
CN113689555B (zh) * 2021-09-09 2023-08-22 武汉惟景三维科技有限公司 一种双目图像特征匹配方法及系统
CN114155289A (zh) * 2021-12-08 2022-03-08 电子科技大学 基于双目视觉的电点火系统电火花轮廓尺寸测量方法
CN115880783A (zh) * 2023-02-21 2023-03-31 山东泰合心康医疗科技有限公司 用于儿科保健的儿童运动姿态识别方法
CN116630382A (zh) * 2023-07-18 2023-08-22 杭州安劼医学科技有限公司 神经调控图像监测配准系统和控制方法
CN116630382B (zh) * 2023-07-18 2023-10-03 杭州安劼医学科技有限公司 神经调控图像监测配准系统和控制方法
CN116883945A (zh) * 2023-07-21 2023-10-13 江苏省特种设备安全监督检验研究院 一种融合目标边缘检测和尺度不变特征变换的人员识别定位方法
CN116883945B (zh) * 2023-07-21 2024-02-06 江苏省特种设备安全监督检验研究院 一种融合目标边缘检测和尺度不变特征变换的人员识别定位方法

Also Published As

Publication number Publication date
CN110268444A (zh) 2019-09-20

Similar Documents

Publication Publication Date Title
WO2020172783A1 (zh) 一种用于经颅磁刺激诊疗的人头姿态跟踪系统
CN111414798B (zh) 基于rgb-d图像的头部姿态检测方法及系统
US10082868B2 (en) Calculation method of line-of-sight direction based on analysis and match of iris contour in human eye image
Papazov et al. Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features
WO2017211066A1 (zh) 基于虹膜与瞳孔的用于头戴式设备的视线估计方法
CN102697508B (zh) 采用单目视觉的三维重建来进行步态识别的方法
CN109598196B (zh) 一种多形变多姿态人脸序列的特征点定位方法
CN106796449A (zh) 视线追踪方法及装置
Lu et al. Appearance-based gaze estimation via uncalibrated gaze pattern recovery
CN104821010A (zh) 基于双目视觉的人手三维信息实时提取方法及系统
CN107563323A (zh) 一种视频人脸特征点定位方法
CN113077519A (zh) 一种基于人体骨架提取的多相机外参自动标定方法
CN111486798B (zh) 图像测距方法、图像测距系统及终端设备
CN112069986A (zh) 高龄老人眼动机器视觉跟踪方法及装置
CN111582036B (zh) 可穿戴设备下基于形状和姿态的跨视角人物识别方法
CN115830675A (zh) 一种注视点跟踪方法、装置、智能眼镜及存储介质
CN109993116B (zh) 一种基于人体骨骼相互学习的行人再识别方法
Darujati et al. Facial motion capture with 3D active appearance models
CN113256789A (zh) 一种三维实时人体姿态重建方法
Strupczewski Commodity camera eye gaze tracking
Ma et al. Research on kinect-based gesture recognition
CN108694348B (zh) 一种基于自然特征的跟踪注册方法及装置
Yang Face feature tracking algorithm of aerobics athletes based on Kalman filter and mean shift
Cui et al. Trajectory simulation of badminton robot based on fractal brown motion
Wang et al. Camper’s Plane Localization and Head Pose Estimation Based on Multi-View RGBD Sensors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19917284

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19917284

Country of ref document: EP

Kind code of ref document: A1