WO2008070701A2 - Fast human pose estimation using appearance and motion via multi-dimensional boosting regression - Google Patents
Fast human pose estimation using appearance and motion via multi-dimensional boosting regression Download PDFInfo
- Publication number
- WO2008070701A2 WO2008070701A2 PCT/US2007/086458 US2007086458W WO2008070701A2 WO 2008070701 A2 WO2008070701 A2 WO 2008070701A2 US 2007086458 W US2007086458 W US 2007086458W WO 2008070701 A2 WO2008070701 A2 WO 2008070701A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- mapping function
- representations
- image sequence
- pose
- training image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
- G06V10/7747—Organisation of the process, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
Definitions
- the invention generally relates to computer vision, and more specifically, to fast human pose estimation for motion tracking.
- An important problem in modern computer vision is full body tracking of humans in video sequences.
- Applications for human tracking including video surveillance, gesture analysis, human computer interface, and computer animation.
- 3D motion tracking is important in analyzing and solving problems relating to the movement of human joints.
- subjects wear suits with special markers and perform motions recorded by complex 3D capture systems.
- 3D capture systems are expensive due to the special equipment and significant studio time required.
- conventional 3D motion capture systems require considerable postprocessing work which adds to the time and cost associated with traditional 3D tracking methods.
- a training module determines a mapping function between an input image sequence and pose representations of a subject in the input image sequence.
- the training module receives a sequence of training images and a set of known poses of a subject in the images.
- the training module generates image representations of the sequence of training images.
- the image representations comprise appearance patches representing the appearance of the subject and motion patches representing movement of the subject between image frames.
- Features are then extracted from the image representations.
- the set of features comprise Haar-like features computed at a variety of orientations.
- the training module learns a multidimensional regression function.
- the multidimensional regression function provides a mapping between the image representations and a multidimensional vector output corresponding to the known poses.
- the multidimensional vector output comprises a vector of joint angles completely describing the pose.
- a testing module receives a test image sequence comprising a subject in unknown pose configurations.
- the learned mapping function from the training stage is applied to the received test image sequence.
- the learned mapping function outputs a multidimensional vector providing a pose estimation of the subject.
- FIG. 1 is an example computer system in accordance with an embodiment of the present invention.
- FIG. 2 is a block diagram illustrating an embodiment of a pose estimation module.
- FIG. 3 is a flowchart illustrating an embodiment of a process for learning a mapping function for fast human pose estimation.
- FIG.4 is a flowchart illustrating an embodiment of a process for generating appearance and motion patches.
- FIG. 5 is a flowchart illustrating an embodiment of a process for extracting features from the image representations.
- FIG. 6A-C illustrate examples of Haar features at a variety of orientations.
- FIG. 7 is a flowchart illustrating an embodiment of a process for learning a mapping function.
- FIG. 8 is a flowchart illustrating an embodiment of a process for fast human pose estimation of a test image sequence.
- the present invention provides a fast body pose estimator for human tracking applications that estimates a three dimensional (3D) body pose from a two dimensional (2D) input image sequence.
- the pose estimator can be used to initialize a conventional tracking module, and re-initialize the tracker when tracking is lost.
- the pose estimator can provide a pose estimation at each frame of the image sequence and the sequence of pose estimations itself can function as the tracker.
- the pose estimation module of the present invention is fast enough to run at every frame of a video and can be used for real-time tracking applications. Furthermore, the pose estimator operates with improved accuracy by exploiting both appearance and motion information from the image sequence.
- FIG. 1 is an illustration of a computer system 100 in which an embodiment of the present invention may operate.
- the computer system 100 includes a processor 110, an input controller 102, an output controller 108, and a memory 104.
- the processor 110 processes data signals and may comprise various computing architectures such as a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, or an architecture implementing a combination of instruction sets. Although only a single processor is shown in FIG. 1, multiple processors may be included.
- the processor 110 may comprises an arithmetic logic unit, a microprocessor, a general purpose computer, or some other information appliance equipped to transmit, receive and process electronic data signals from the memory 104, the input controller 102, or the output controller 108.
- the input controller 102 is any device configured to provide input (e.g., a video input) to the computer system 100.
- the input controller 102 is configured to receive an input image sequence from one or more of a network 120, a database 130, and an image capture unit 140 (e.g., a video camera).
- the output controller 108 represents any device equipped to output processed data to one or more of a database 150, a network 160, and a display 170 (e.g., an organic light emitting diode display (OLED), a liquid crystal display (LCD), or a cathode ray tube (CRT) display).
- OLED organic light emitting diode display
- LCD liquid crystal display
- CRT cathode ray tube
- the memory 104 stores data and/or instructions that may be executed by processor 110.
- the instructions may comprise code for performing any and/or all of the techniques described herein.
- Memory 104 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, Flash RAM (non- volatile storage), combinations of the above, or some other memory device known in the art.
- the memory 104 comprises a data store 107 and a pose estimation module 106, and is adapted to communicate with the processor 110, the input controller 102, and/or the output controller 108.
- the pose estimation module 106 comprises computer executable instructions for carrying out the pose estimation processes described below.
- computer system 100 may include more or less components than those shown in FIG. 1 without departing from the scope of the present invention.
- computer system 100 may include additional memory, such as, for example, a first or second level cache, or one or more application specific integrated circuits (ASICs).
- ASICs application specific integrated circuits
- computer system 100 may include additional input or output devices.
- FIG. 2 is a high-level block diagram illustrating an embodiment of the pose estimation module 106.
- the pose estimation module 106 comprises computer executable instructions that are executed by the processor 110 of the computer system 100.
- the pose estimation module 106 may further utilize data stored in data store 107 or data received by the input controller 102. Output data and intermediate data used by the pose estimation module 106 may be outputted by output controller 108 and/or stored in data store 107.
- output controller 108 may be outputted by output controller 108 and/or stored in data store 107.
- alternative embodiments of the pose estimation module 106 can be implemented in any combination of firmware, hardware, or software.
- the pose estimation module 106 comprises a training module 202 and a testing module 204.
- the training module 202 receives a sequence of 2D training images from, for example, an external database 130, network 120, or image capture unit 140.
- the training images 206 contain humans having known pose configurations 208 that are also inputted to the training stage 202.
- the training images 206 may comprise, for example, walking sequences of one or more subjects, or any number of other common motions.
- the pose configurations 208 can comprise, for example, a vector of joint angles or any other set of information that completely describes the 3D pose.
- the pose configurations 208 may be obtained using any conventional 3D motion capture technique.
- the training module 202 learns a mapping function 210 that describes the relationship between the information in the training images 206 and the known 3D pose configurations 208.
- the training module 202 may operate on many different training image sequences 206 corresponding to different motions.
- multiple mapping functions 210 are learned, with each mapping function 210 corresponding to a different type of motion.
- the training stage is executed in an offline mode so that the mapping function 210 is only learned once.
- the mapping function 210 can be stored in data store 107 for use by the testing module 204.
- the learned mapping function 210 is used by the testing module 204 to generate a sequence of 3D pose estimations 212 of a human subject that is detected in an input test image sequence 214.
- the testing module 204 receives the test image sequence 214 having humans in unknown pose configurations, applies the mapping function 210, and outputs the pose estimation 212.
- the pose estimation 212 comprises a multidimensional vector representation of the pose of a subject (e.g., a human) in the images.
- the 3D pose estimations 212 may comprise a vector of joint angles describing the poses.
- the testing module 204 estimates the 3D pose 212 from the 2D test image sequence 214 without utilizing markers or special motion capture cameras. In one embodiment, the testing module 204 can operate fast enough to generate the pose estimations 212 in real-time as each test image in the test image sequence 214 is received. Thus, it is possible, for example, to provide a pose estimate at each frame of a video.
- FIG. 3 is a flow diagram illustrating an embodiment of a process for learning a mapping function 210 for fast human pose estimation.
- the training module 202 receives 302 a training image sequence 206 and generates 304 image representations from the image sequence 206.
- the image representations comprise motion and appearance patches derived from the training image sequence 206.
- An appearance patch comprises information from an image frame representing the appearance of a subject in the image frame.
- a motion patch comprises information representing movement of the subject between image frames.
- the training module 202 extracts 306 features from the image representations (e.g., motion and appearance patches).
- the features describe characteristics of the images such as for example, edges and/or lines at various orientations.
- a process for feature extraction is described in more detail below with reference to FIG. 5.
- the training module 202 then learns 308 the mapping function 210.
- the mapping function 210 maps the image representations to the known body pose configurations 208 based in part on the extracted features. For example, in one embodiment, the mapping function 210 describes the relationship between an input vector of motion and appearance patches and a multidimensional vector of joint angles representing the pose. A process for learning 308 the mapping function 210 is described in more detail below with reference to FIG. 7. [0030] Referring now to FIG.
- a flow diagram illustrates an embodiment of a process for generating 304 image representations of the training image sequence 206.
- the training module 202 first detects 402 a human in an image frame received from the training image sequence 206.
- Human detection processes are known in the art and an example process is described in more detail in P. Viola, et al., "Detecting Pedestrians Using Patterns of Motion and Appearance, " ICCV, p. 734-741, 2003, the content of which is incorporated by reference herein in its entirety.
- the detection step 402 outputs a bounding box that bounds the detected human body in the image frame. [0031] Using the bounding boxes, the process then extracts 404 an image patch containing the human body from the image frame.
- the patches can be normalized according to different variables such as, for example, intensity value and resolution (e.g., patches can be scaled to 64 x 64 pixels).
- the exact patch size can be chosen based on visual inspection and should ensure that the patch contains enough information for a human observer to distinguish between poses.
- the silhouette of the human body can be extracted 408 using a background subtraction technique to mask out the background pixels. In some instances, this can improve learning speed and generalization performance.
- the step 408 is omitted.
- the result of steps 402-406 (and optionally 408) is an appearance patch denoted herein by I 1 representing the appearance of a human in an image frame.
- the appearance patch, I 1 is represented by a 2D matrix of pixel intensity values.
- motion information is computed 410 from the appearance patches by computing the absolute difference of image values between adjacent frames. This information is denoted as an image patch A 1 and is given by:
- the direction of motion can be determined by taking the difference of the first image with a shifted version of the second using a technique similar to that described by Viola, et al., referenced above. For example, image patch I 1+ I can be shifted upward by one pixel and the difference between the shifted image patch I 1+ i and the previous image patch I 1 can be determined. Similarly, the image patch I 1+ i can be shifted leftward, rightward, or downward and compared to I 1 . Based on the differences, the most likely direction of motion can be determined. In order to limit the number of features considered by the training module 202, this additional source of information can be optionally omitted.
- Haar-like features are extracted from the images similar to the features described by Viola, et al., referenced above.
- Haar features are extracted by applying a set of filters to images that measure the difference between rectangular areas in the image with different size, position and aspect ratio.
- the features can be computed very efficiently from the integral image.
- the Haar filters applied in Viola, et al. are used in detection of either faces or pedestrians and are not used for full body pose detection.
- face or pedestrian detection a small image patch of about 20 pixels per side is large enough to discriminate the object from the background.
- higher resolution patches e.g. 64x64 pixels. This prevents the description of limbs from being limited to an area of only a few pixels.
- an increase in patch size also increases the number of basic Haar features that fit in the patch (approximately squared in its area) and increases the level of computation used in feature extraction.
- FIG. 5 a process for feature extraction 306 is illustrated in accordance with an embodiment of the present invention.
- the process uses a set of differential filters tailored to the human body to extract temporal and spatial information from the images.
- a large pool of features is created for use in a boosting process that learns the mapping function 210 from image frames to the 3D pose estimations.
- the set of filters are generated 502.
- the process extends the set of basic vertical Haar features by introducing rotated versions computed at a few major orientations as illustrated in FIGS. 6A-C. This allows the features to isolate limbs having any arbitrary orientation.
- one type of edge feature (FIG. 6A) and two types of lines features (FIG. 6B and FIG. 6C) are used where each feature can assume any of 18 equally spaced orientations in the range [0, ⁇ ].
- the features in FIG. 6C are suitable to match body limbs, while the features in 6A and 6B are suitable to match trunk, head, and full body.
- the features can have any position inside the patch.
- each rectangle of the filter set can be restricted to have a minimum area (e.g., 80 pixels) and/or can be restricted in their distance from the border (e.g., rectangles not closer than 8 pixels from the border). In addition, rectangles can be limited to those having even width and even height.
- a number K filters from the filter set are randomly selected 504 by uniform sampling.
- the set of filters are applied 506 to the appearance and motion patches to extract features. Using this approach, oriented features can be extracted very efficiently from integral images computed on rotated versions of the image patch.
- a boosting regression process provides a way to automatically select from the large pool of features the most informative ones to be used as basic elements for building the mapping function 210.
- Examples of boosting regression techniques are described in J.H. Friedman, "Greedy Function Approximation: A Gradient Boosting Machine, " Annals of Statistics, 29:1189-1232, 2001. 3,4; A. Torralba, "Sharing Features: Efficient Boosting Procedures for Multiclass Object Detection, " CVPR, 2004; and S. K.
- A is a L-terminal node Classification and Regression Tree (CART) where internal node splits the partition associated to the parent node by comparing a feature response to a threshold, and the leaves describe the final values A 1n .
- CART is described in further detail in L. Brieman, et al., "Classification and Regression Trees, " Wadsworth & Brooks, 1984, 4, 6, 8, the content of which is incorporated by reference herein in its entirety.
- Eq. (2) can be solved by a greedy stagewise approach where at each step m the parameters of the basic function h(x; A 1n , R 1n ) are determined that maximally decreases the loss function: v
- an extension to the Gradient TreeBoost process described above is provided in order to efficiently handle multidimensional maps in accordance with an embodiment of the present invention. Given a training set Jy 1 , x ; ⁇ with vector inputs X 1 e 9T and vector outputs y ; e 5R P , the method determines the function F(x):
- p represents the number of joints angles.
- the number of joint angles is given by the number of joints multiplied by 3, as each joint is represented by a set of 3 angles.
- the input Xi is the normalized appearance and motion patches previously derived from the training image sequence 206.
- the output j is the vector of known pose configurations 208 corresponding to the image sequence 206 and may be, for example, a vector of joint angles describing the poses.
- both the input Xi and output y t are vectors.
- the function can provide the complete vector of joint angles for a given input rather than using multiple mapping functions to derive the joint angles.
- the Multidimensional Treeboost process assumes that the mapping function, F(x), can be expressed as a sum of basic piecewise constant (vector) functions: m ) (12)
- FIG. 7 Using decision stumps on Haar feature responses as basic learners, a process for Multidimensional Gradient Treeboost is illustrated in FIG. 7 and described below in accordance with an embodiment of the present invention.
- the process can be implemented using, for example, Least Squares (LS) or Least Absolute Deviation (LAD) as loss functions.
- LS Least Squares
- LAD Least Absolute Deviation
- the described process derives the mapping function 210 using an iterative approach. In each iteration, the process updates the mapping function 210 until a stopping criterion is reached.
- the mapping function derived by the m* iteration will be denoted by F m (x).
- the process first initializes 702 the mapping function 210 to a constant function Fo(x)that minimizes the loss function, ⁇ (y, F(x)). If an LS loss function is used, the constant function Fo(x) is initialized to the mean of the training outputs yi (i.e. the known pose configuration 208). If an LAD loss function is used, the constant function F 0 (x) is initialized to the median of the training outputs yi:
- the training module 202 then computes 704 the pseudo-residual vectors, f ⁇ m . If an LS loss function is used, the pseudo-residuals are computed 704 from the training residuals y, - F ⁇ 1 (X 1 ). If an LAD loss function is used, the pseudo-residuals are computed from the signs of the training residuals:
- the pseudo-residuals describe an error between the known pose configuration 208 and the output of the current mapping function F m-1 (x) (i.e. the mapping function derived in the previous iteration) applied to the training input 206.
- the regions Rj m are computed 706 by finding the optimal feature k m and associated threshold value ⁇ m
- the input space is partitioned into regions Rj m using decision trees or decision stumps.
- the decision trees (and stumps) partition input vectors into several regions (i.e., areas). These regions can in turn be further partitioned using stumps or information can be gathered at the leaf nodes.
- the least-squares approximation errors to the pseudo-residuals y im is computed using/? vector stumps h s whose inputs are the filter responses f (X 1 ), and the feature with the lowest error is chosen. Notice that the least-squares criterion allows for efficiently finding the values Ci 1 , since the mean of the outputs is only incrementally computed sorted by feature value while searching for the optimal threshold, ⁇ m .
- Eq. 16 finds 708 the two vector parameters aj, (I 2 of the basic stump learner h s , which are the constant predictions of the residuals in the two regions found in the previous step 706.
- _ f mean ⁇ yi - iXj"! ⁇ , yfr, X ! ,,. f , . mesn ⁇ y; - F.,,-i i ' x, ) K : f ⁇ . ⁇ , : > a LS d j ; - i . ⁇ ! _ » .
- the parameters ai, (I2 are computed as the mean of the sample residuals in each region. If an LAD loss function is used, the parameters «7, «2 are computed as the medians of the sample residuals.
- the stump learner function h s is then added 710 to the current mapping function, F m _i(x), and scaled by the learning rate ⁇ to compute the updated mapping function F m (x):
- F. m ix) F,,,_i fx . t 4- ⁇ h n fx; ai m . a2,,, . k. m J ) ., f! ) n ⁇
- M is a predetermined constant. In another embodiment, M is the number of iterations until the changes to the pseudo-residuals, yTM, becomes negligible.
- yTM pseudo-residuals
- the process of FIG. 7 is not limited to stumps but can be formulated for arbitrary decision trees. For example, Classification and Regression Trees (CART) can be applied as basic functions h(x). These are decision trees modeling a piecewise constant function, where each node of the tree uses a feature f k and a threshold ⁇ to recursively split the current region of the input space in two, and the terminal leaves define the input space partition Ri 1n .
- CART Classification and Regression Trees
- the disclosed method advantageously provides a gradient boosting technique that derives a multidimensional regression function. Instead of learning a separate regression function for each joint angle, a vector function is learned that maps features to sets of joint angles representing full body poses.
- One advantage of learning multidimensional maps is that it allows the joint angle estimators to share the same set of features. This is beneficial because of the high degree of correlation between joint angles for natural human poses.
- the resulting pose estimator is sensibly faster than the collection of scalar counterparts, since it uses a number of features which grows with the effective dimension of the target space instead of with the number of joint angles.
- the described embodiments are well suited to fit multidimensional maps having components at different scales, and can be extended to include more complex basic functions such as regression trees.
- the testing module 204 receives 802 an input test image 214 and generates 804 image representations of the test images.
- the image representations comprise motion and appearance patches generated according to the process of FIG. 4 described above.
- the learned mapping function 210 is then applied 806 to the image representations.
- the mapping function 210 outputs 308 a pose estimation comprising, for example, a vector of joint angles describing the pose of a subject in the test image 214.
- application of the mapping function 210 generates a vector output completely describing the pose.
- the testing module executes the process of FIG. 8 quickly enough to provide pose estimations at every frame of an input video having a standard frame rate (e.g., 30 frames/second).
- Certain aspects of the present invention include process steps and instructions described herein in the form of an algorithm. It should be noted that the process steps and instructions of the present invention could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by a variety of operating systems. [0072]
- the present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
- Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
- the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Social Psychology (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Psychiatry (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2009540439A JP4677046B2 (ja) | 2006-12-06 | 2007-12-05 | 多次元ブースト回帰を経た外観及び動作を使用する高速人間姿勢推定 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US86883006P | 2006-12-06 | 2006-12-06 | |
| US60/868,830 | 2006-12-06 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2008070701A2 true WO2008070701A2 (en) | 2008-06-12 |
| WO2008070701A3 WO2008070701A3 (en) | 2008-10-23 |
Family
ID=39493047
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2007/086458 Ceased WO2008070701A2 (en) | 2006-12-06 | 2007-12-05 | Fast human pose estimation using appearance and motion via multi-dimensional boosting regression |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US7778446B2 (enExample) |
| JP (1) | JP4677046B2 (enExample) |
| WO (1) | WO2008070701A2 (enExample) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101794515A (zh) * | 2010-03-29 | 2010-08-04 | 河海大学 | 基于协方差和二叉树支持向量机的目标检测系统及方法 |
| CN101976345A (zh) * | 2010-09-30 | 2011-02-16 | 哈尔滨工程大学 | 一种噪声条件下图像尺度不变模式识别方法 |
| CN105975923A (zh) * | 2016-05-03 | 2016-09-28 | 湖南拓视觉信息技术有限公司 | 用于跟踪人体对象的方法和系统 |
| CN109949368A (zh) * | 2019-03-14 | 2019-06-28 | 郑州大学 | 一种基于图像检索的人体三维姿态估计方法 |
Families Citing this family (35)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR100682987B1 (ko) * | 2005-12-08 | 2007-02-15 | 한국전자통신연구원 | 선형판별 분석기법을 이용한 3차원 동작인식 장치 및 그방법 |
| US20090232365A1 (en) * | 2008-03-11 | 2009-09-17 | Cognimatics Ab | Method and device for face recognition |
| JP5098981B2 (ja) * | 2008-12-08 | 2012-12-12 | トヨタ自動車株式会社 | 顔部位検出装置 |
| US8565476B2 (en) * | 2009-01-30 | 2013-10-22 | Microsoft Corporation | Visual target tracking |
| US8682028B2 (en) | 2009-01-30 | 2014-03-25 | Microsoft Corporation | Visual target tracking |
| US8565477B2 (en) * | 2009-01-30 | 2013-10-22 | Microsoft Corporation | Visual target tracking |
| US8588465B2 (en) | 2009-01-30 | 2013-11-19 | Microsoft Corporation | Visual target tracking |
| US8577085B2 (en) | 2009-01-30 | 2013-11-05 | Microsoft Corporation | Visual target tracking |
| US8577084B2 (en) | 2009-01-30 | 2013-11-05 | Microsoft Corporation | Visual target tracking |
| US8267781B2 (en) | 2009-01-30 | 2012-09-18 | Microsoft Corporation | Visual target tracking |
| US8773355B2 (en) * | 2009-03-16 | 2014-07-08 | Microsoft Corporation | Adaptive cursor sizing |
| US8503720B2 (en) | 2009-05-01 | 2013-08-06 | Microsoft Corporation | Human body pose estimation |
| US8638985B2 (en) | 2009-05-01 | 2014-01-28 | Microsoft Corporation | Human body pose estimation |
| US8358839B2 (en) * | 2009-11-30 | 2013-01-22 | Xerox Corporation | Local regression methods and systems for image processing systems |
| US8811743B2 (en) | 2010-06-09 | 2014-08-19 | Microsoft Corporation | Resource-aware computer vision |
| JP5671928B2 (ja) * | 2010-10-12 | 2015-02-18 | ソニー株式会社 | 学習装置、学習方法、識別装置、識別方法、およびプログラム |
| US8942917B2 (en) | 2011-02-14 | 2015-01-27 | Microsoft Corporation | Change invariant scene recognition by an agent |
| US8620026B2 (en) * | 2011-04-13 | 2013-12-31 | International Business Machines Corporation | Video-based detection of multiple object types under varying poses |
| US9076227B2 (en) * | 2012-10-01 | 2015-07-07 | Mitsubishi Electric Research Laboratories, Inc. | 3D object tracking in multiple 2D sequences |
| US9857470B2 (en) | 2012-12-28 | 2018-01-02 | Microsoft Technology Licensing, Llc | Using photometric stereo for 3D environment modeling |
| US9940553B2 (en) | 2013-02-22 | 2018-04-10 | Microsoft Technology Licensing, Llc | Camera/object pose from predicted coordinates |
| WO2014149827A1 (en) * | 2013-03-15 | 2014-09-25 | REMTCS Inc. | Artificial neural network interface and methods of training the same for various use cases |
| CN103679677B (zh) * | 2013-12-12 | 2016-11-09 | 杭州电子科技大学 | 一种基于模型互更新的双模图像决策级融合跟踪方法 |
| CN105096304B (zh) | 2014-05-22 | 2018-01-02 | 华为技术有限公司 | 一种图像特征的估计方法和设备 |
| US9832373B2 (en) | 2014-06-24 | 2017-11-28 | Cyberlink Corp. | Systems and methods for automatically capturing digital images based on adaptive image-capturing templates |
| JP6628494B2 (ja) * | 2015-04-17 | 2020-01-08 | Kddi株式会社 | 実空間情報によって学習する識別器を用いて物体を追跡する装置、プログラム及び方法 |
| CN105631861B (zh) * | 2015-12-21 | 2019-10-01 | 浙江大学 | 结合高度图从无标记单目图像中恢复三维人体姿态的方法 |
| CN107786867A (zh) | 2016-08-26 | 2018-03-09 | 原相科技股份有限公司 | 基于深度学习架构的图像辨识方法及系统 |
| US10726573B2 (en) * | 2016-08-26 | 2020-07-28 | Pixart Imaging Inc. | Object detection method and system based on machine learning |
| WO2018058419A1 (zh) * | 2016-09-29 | 2018-04-05 | 中国科学院自动化研究所 | 二维图像人体关节点定位模型的构建方法及定位方法 |
| US10235771B2 (en) | 2016-11-11 | 2019-03-19 | Qualcomm Incorporated | Methods and systems of performing object pose estimation |
| US10863206B2 (en) * | 2018-11-08 | 2020-12-08 | Alibaba Group Holding Limited | Content-weighted deep residual learning for video in-loop filtering |
| CN112906438B (zh) * | 2019-12-04 | 2023-05-02 | 内蒙古科技大学 | 人体动作行为的预测方法以及计算机设备 |
| CN116507276A (zh) * | 2020-09-11 | 2023-07-28 | 爱荷华大学研究基金会 | 用于机器学习以从图像分析肌肉骨骼康复的方法和设备 |
| CN115661929B (zh) * | 2022-10-28 | 2023-11-17 | 北京此刻启动科技有限公司 | 一种时序特征编码方法、装置、电子设备及存储介质 |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5404167A (en) * | 1993-03-12 | 1995-04-04 | At&T Corp. | Subband color video coding using a reduced motion information subband |
| US6009210A (en) * | 1997-03-05 | 1999-12-28 | Digital Equipment Corporation | Hands-free interface to a virtual reality environment using head tracking |
| US6741756B1 (en) * | 1999-09-30 | 2004-05-25 | Microsoft Corp. | System and method for estimating the orientation of an object |
| JP2003514309A (ja) * | 1999-11-09 | 2003-04-15 | ザ・ビクトリア・ユニバーシテイ・オブ・マンチエスター | 物体の種類の識別、検証あるいは物体の画像合成 |
| KR100507780B1 (ko) * | 2002-12-20 | 2005-08-17 | 한국전자통신연구원 | 고속 마커프리 모션 캡쳐 장치 및 방법 |
| GB0308943D0 (en) * | 2003-04-17 | 2003-05-28 | Univ Dundee | A system for determining the body pose of a person from images |
| JP4546956B2 (ja) * | 2003-06-12 | 2010-09-22 | 本田技研工業株式会社 | 奥行き検出を用いた対象の向きの推定 |
| US7894647B2 (en) * | 2004-06-21 | 2011-02-22 | Siemens Medical Solutions Usa, Inc. | System and method for 3D contour tracking of anatomical structures |
| US7804999B2 (en) * | 2005-03-17 | 2010-09-28 | Siemens Medical Solutions Usa, Inc. | Method for performing image based regression using boosting |
-
2007
- 2007-12-05 US US11/950,662 patent/US7778446B2/en active Active
- 2007-12-05 WO PCT/US2007/086458 patent/WO2008070701A2/en not_active Ceased
- 2007-12-05 JP JP2009540439A patent/JP4677046B2/ja not_active Expired - Fee Related
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101794515A (zh) * | 2010-03-29 | 2010-08-04 | 河海大学 | 基于协方差和二叉树支持向量机的目标检测系统及方法 |
| CN101794515B (zh) * | 2010-03-29 | 2012-01-04 | 河海大学 | 基于协方差和二叉树支持向量机的目标检测系统及方法 |
| CN101976345A (zh) * | 2010-09-30 | 2011-02-16 | 哈尔滨工程大学 | 一种噪声条件下图像尺度不变模式识别方法 |
| CN105975923A (zh) * | 2016-05-03 | 2016-09-28 | 湖南拓视觉信息技术有限公司 | 用于跟踪人体对象的方法和系统 |
| CN109949368A (zh) * | 2019-03-14 | 2019-06-28 | 郑州大学 | 一种基于图像检索的人体三维姿态估计方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20080137956A1 (en) | 2008-06-12 |
| JP4677046B2 (ja) | 2011-04-27 |
| JP2010512581A (ja) | 2010-04-22 |
| US7778446B2 (en) | 2010-08-17 |
| WO2008070701A3 (en) | 2008-10-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US7778446B2 (en) | Fast human pose estimation using appearance and motion via multi-dimensional boosting regression | |
| Bissacco et al. | Fast human pose estimation using appearance and motion via multi-dimensional boosting regression | |
| Jeyakar et al. | Robust object tracking with background-weighted local kernels | |
| Zhang et al. | Structural sparse tracking | |
| US9134399B2 (en) | Attribute-based person tracking across multiple cameras | |
| Simo-Serra et al. | Single image 3D human pose estimation from noisy observations | |
| US7620204B2 (en) | Method for tracking objects in videos using covariance matrices | |
| US20070086621A1 (en) | Flexible layer tracking with weak online appearance model | |
| Battiato et al. | An integrated system for vehicle tracking and classification | |
| CN110998594A (zh) | 检测动作的方法和系统 | |
| Bešić et al. | Dynamic object removal and spatio-temporal RGB-D inpainting via geometry-aware adversarial learning | |
| US7369682B2 (en) | Adaptive discriminative generative model and application to visual tracking | |
| Demir et al. | Co-difference based object tracking algorithm for infrared videos | |
| Shen et al. | Adaptive pedestrian tracking via patch-based features and spatial–temporal similarity measurement | |
| Karavasilis et al. | Visual tracking using the Earth Mover's Distance between Gaussian mixtures and Kalman filtering | |
| Ji et al. | Detect foreground objects via adaptive fusing model in a hybrid feature space | |
| Holzer et al. | Online learning of linear predictors for real-time tracking | |
| Wang et al. | Tracking by third-order tensor representation | |
| Tran et al. | Robust object trackinng wvith regional affine invariant features | |
| Chang et al. | Single-shot person re-identification based on improved random-walk pedestrian segmentation | |
| Dederscheck et al. | Illumination invariance for driving scene optical flow using comparagram preselection | |
| Zhang et al. | Target tracking for mobile robot platforms via object matching and background anti-matching | |
| Chen et al. | Sequentially adaptive active appearance model with regression-based online reference appearance template | |
| Cuzzolin et al. | Belief modeling regression for pose estimation | |
| Bart et al. | Class-based matching of object parts |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07854945 Country of ref document: EP Kind code of ref document: A2 |
|
| ENP | Entry into the national phase |
Ref document number: 2009540439 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 07854945 Country of ref document: EP Kind code of ref document: A2 |