CN106897697A - A kind of personage and pose detection method based on visualization compiler - Google Patents
A kind of personage and pose detection method based on visualization compiler Download PDFInfo
- Publication number
- CN106897697A CN106897697A CN201710103927.1A CN201710103927A CN106897697A CN 106897697 A CN106897697 A CN 106897697A CN 201710103927 A CN201710103927 A CN 201710103927A CN 106897697 A CN106897697 A CN 106897697A
- Authority
- CN
- China
- Prior art keywords
- network
- pedestrian
- posture
- compiler
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
A kind of personage and pose detection method based on visualization compiler proposed in the present invention, its main contents include:The Data Synthesis of scene description, by generated data learning network, using basic block define network, posture network (Pose Net) alignment by union, its process is, first with scene description as the input for visualizing compiler, annotated to train pedestrian detecting system and posture estimation system with the True Data demarcated, then by generated data learning network;It is then used by remaining module and the two base units of space confidence module to define network, finally with posture network positions pedestrian.The present invention can automatically obtain annotation, body part position and the segmentation mask of detection, and pedestrian is positioned by using video camera, estimate its posture and carry out activity analysis;The influence to detecting such as reduce illumination, block, being effectively improved recognition efficiency.
Description
Technical field
The present invention relates to personage's posture detection field, more particularly, to a kind of personage based on visualization compiler and appearance
Gesture detection method.
Background technology
The detection of human action posture the fields such as video monitoring, virtual reality, interpersonal intelligent interaction extensive use and
Study hotspot as computer vision field, its prison that can be used for dangerous posture in the intelligent monitoring of public arena and crowd
Control etc..Although the research of recent year human posture detection achieves impressive progress, the high complexity of human posture and many
Variability causes that the accuracy and high efficiency of identification do not fully meet the use requirement of relevant industries.Different illumination, regard
The condition such as angle and background can cause that human body behavior produces difference in posture and characteristic, in addition human body from blocking, partial occlusion, people
Body individual difference, many person recognitions etc. are all the embodiments spatially of human posture's detection of complex, so personage and posture inspection
Survey method needs further research.
The present invention proposes a kind of personage based on visualization compiler and pose detection method, first uses scene description conduct
The input of compiler is visualized, is annotated to train pedestrian detecting system and posture estimation system with the True Data demarcated, then
By generated data learning network;Remaining module and the two base units of space confidence module are then used by define network,
Finally use posture network positions pedestrian.The present invention can automatically obtain annotation, body part position and the segmentation mask of detection, lead to
Cross using video camera to position pedestrian, estimate its posture and carry out activity analysis;Reduce illumination, block etc. to detection
Influence, be effectively improved recognition efficiency.
The content of the invention
For illumination, the problem that influence can be produced such as block, compiled based on visualization it is an object of the invention to provide one kind
The personage for translating device and pose detection method, first with scene description as the input for visualizing compiler, with the true number demarcated
Pedestrian detecting system and posture estimation system are trained according to annotation, then by generated data learning network;It is then used by residual mode
Block and the two base units of space confidence module define network, finally with posture network positions pedestrian.
To solve the above problems, the present invention provides a kind of personage based on visualization compiler and pose detection method, its
Main contents include:
(1) Data Synthesis of scene description;
(2) by generated data learning network;
(3) network is defined using basic block;
(4) posture network (Pose Net) alignment by union.
Wherein, described visualization compiler, for generating the specific mankind's detection of scene and posture estimation system;It is
Know that information has:
(1) the inherent parameter and extrinsic parameter of camera;
(2) rough physical geometry layout (walk, be seated, standing) of scene and may not be blocked (obstacle) or physically not
The scene areas in the region (wall) of presence;
(3) posture of scene regional pedestrian and direction;
Together with single image, scene description synthesizes physically as the input of compiler in the effective coverage of scene
Ground connection and geometrically accurate people;The set of compiler learning region particular model, detection, Attitude estimation for people and point
Cut;During reasoning, each region in these particular models is run simultaneously on its corresponding region.
Wherein, the Data Synthesis of described scene description are, it is necessary to the good True Data of high-quality demarcation annotates to train
Pedestrian detecting system and posture estimation system;Without complicated manually labeling process, visualization compiler usage scenario is retouched
Pedestrian outward appearance of the simulation suitable for each region of scene is stated, so as in expanding to a large amount of scenes.
Further, described scene description, given scenario description, the plane 3D models that compiler firstly generates scene come
Barrier is surrounded, that is, is fitted ground level, planar wall and cube;Then camera lens characteristic (example is considered using camera parameter
Such as, the perspective distortion in wide angle camera) and for rendering the scene of the accurate people of geometry;Except each " the effective row in scene
People position " is presented outside the outward appearance of people, and rendering pipeline can also accurately control the change of human appearance, such as sex, height, width
Degree, orientation and attitude;Virtual mankind's database includes 139 different models, covers sex, clothing color and race;Compiling
Device can be from 0 degree to 360 degree, it is also possible to guided by any previous available information;
In order to mark to the life rendered in image into the True Data demarcated, attribute is closed first by following label
It is linked to each 3D dummy model:The 3D positions of 27 parts of segmentation mask and the center of the people for detecting;Then from
3D annotations and camera projective parameter automatically extract the 2D labels for training, and this process allows the consistent noiseless mark of generation
Sign;Further, it is also possible to evenly across the change of all of outward appearance, direction, posture or position.
Wherein, it is described by generated data learning network, the specific data of scene for producing are used, visualization compiler is produced
The visualization procedure of raw deep neural network form, the standard operation training according to scene description;
The visualization procedure generated by visualization compiler completes following task jointly:The localization of pedestrian, defines its appearance
The boundary mark of gesture, and split define their pixel;In order to predict pedestrian position, attitude and segmentation mask, network must be to pedestrian
Overall picture, the model before the useful space configuration of the local appearance of terrestrial reference and these parts is modeled;It is outer in order to capture
RGB input mappings are used for the essence of pedestrian, local terrestrial reference and segmentation mask for sight, complete pedestrian and local terrestrial reference outward appearance, study
It is determined that the thermal map regression problem of position;Priori in spatial relationship between component locations is learnt by space confidence (SB) module,
Space confidence module considers the correlation between the thermal map of pedestrian, local terrestrial reference and segmentation mask;By this visualization procedure
Particular instance is referred to as posture network (Pose Net).
Further, described human body attitude estimating system, is generally considered as detection and Attitude estimation independent and order and appoints
Business, is Attitude estimation after detection;The True Data mankind detection that these systems or expection have been demarcated, or using ready-made
Detector is detected roughly;However, detection and positioning parts are highly complementary processes;Detection can greatly influence
Attitude estimation process, the presence confidence being accurately positioned for strengthening people in corresponding position of part;Therefore, posture network model
These tasks are coupled, the efficiency of pedestrian detection and Attitude estimation is improved.
Wherein, described use basic block defines network, and using remaining module and space confidence module, the two are substantially single
Position defines network;It is introduced into remaining unit and solves the problems, such as disappearance gradient in training depth convolutional network;It is substantially single using this
Unit is network, and sets up it and carry out definition space confidence (SB) module.
Further, described space confidence module, is mapped to the input feature vector of block part and positions confidence (thermal map), together
When treatment from previous piece of input feature vector and part positioning confidence;The characteristics of image and part positioning confidence generated by the block lead to
Cross the input that cascade forms next piece;Given input x to SB modules, output y is given by:
Wherein,Represent attended operation, r=freaX () is the operation by the non-same branch of remaining unit, b=
fbeliefX () is represented from input x to expectation thermal map (people's detection, part are detected and segmentation mask) by a series of 1 × 1 convolution;SB
Unit makes network consider contextual information detection confidence level;Confidence level b is positioned from i-th part of SB unitsiTravel to down
One (i+1) individual SB block, and processed by non-identity path, the correlation between capture various pieces thermal map;By passing
Be can be seen that using SB unit-distance codes with returning
Due to attended operation, mark shortcut and f in each SB unitrea() treatment comes from all previous SB units
Confidence;Additionally, the detection confidence level figure generated in each SB unit have also contemplated that the part at all previous SB units
Positioning confidence level, each SB unit is with different reception field computations;Therefore, network is utilized in multiple stages and received by multiple
The detection confidence level figure of field size.
Wherein, described posture network (Pose Net) alignment by union, gives input picture, posture network association home row
People, positions body part and pedestrian in the form of thermal map;Network is made up of complete convolutional layer, spatial context is kept, while carrying
Computationally efficient;To realize being accurately positioned and Attitude estimation for pedestrian, predicted using intensive thermal map in the entire network, prevent by
The information caused in sub-sampling (pond) is lost;
Input picture is by with 5 × 5 convolutional layers and 3 × 3 wave filter of wave filter, it then follows for Object identifying
The design of remaining network;It is afterwards 3 SB units, each has the convolution filter of big received field, increases the received field of network,
Dense prediction is performed simultaneously;SB units are followed by two 1 × 1 convolutional layers, by image feature maps to thermal map;Finally, the company of skipping
Connect for merging the information from multiple difference context areas, combination receives the feature of field from various yardsticks;For examining
Thermal map, body centre and the segmentation that the bounding box of survey is positioned around joint are inferred;
By optimizing network, neural network forecast is minimizedWith for going
People's detection, part are positioned and the multitask mean square error loss L between the preferable thermal map of segmentation mask, are defined as follows,
Wherein, α, β and γ are that hyper parameter is traded off different loss functions.
Further, described posture network, it is the high-quality composograph of the pedestrian's outward appearance in usage scenario, visually
Change the complete convolutional neural networks of compiler study scene and the specific spatial variations in region;Detected while for pedestrian, appearance
State is estimated and is split;It can start anew to train generated data.
Brief description of the drawings
Fig. 1 is a kind of system flow chart of personage and pose detection method based on visualization compiler of the present invention.
Fig. 2 is a kind of visualization compiler of personage and pose detection method based on visualization compiler of the present invention.
Fig. 3 is that a kind of use basic block of personage and pose detection method based on visualization compiler of the present invention defines net
Network.
Fig. 4 is a kind of posture network (Pose of personage and pose detection method based on visualization compiler of the present invention
Net) alignment by union.
Specific embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
Fig. 1 is a kind of system flow chart of personage and pose detection method based on visualization compiler of the present invention.Mainly
Data Synthesis including scene description, by generated data learning network, network, posture network (Pose are defined using basic block
Net) alignment by union.
The Data Synthesis of scene description are, it is necessary to the good True Data of high-quality demarcation annotates to train pedestrian detecting system
And posture estimation system;Without complicated manually labeling process, visualization compiler usage scenario describes simulation and is applied to
Pedestrian's outward appearance in each region of scene, so as in expanding to a large amount of scenes.
Given scenario is described, and compiler firstly generates the plane 3D models of scene to surround barrier, that is, be fitted ground level,
Planar wall and cube;Then camera lens characteristic (for example, the perspective distortion in wide angle camera) is considered using camera parameter
With the scene for rendering the accurate people of geometry;In addition to the outward appearance of people is presented in each " the effective pedestrian position " of scene,
Rendering pipeline can also accurately control the change of human appearance, such as sex, height, width, orientation and attitude;Virtual mankind's number
Include 139 different models according to storehouse, cover sex, clothing color and race;Compiler can be from 0 degree to 360 degree, it is also possible to by
Any previous available information guiding;
In order to mark to the life rendered in image into the True Data demarcated, attribute is closed first by following label
It is linked to each 3D dummy model:The 3D positions of 27 parts of segmentation mask and the center of the people for detecting;Then from
3D annotations and camera projective parameter automatically extract the 2D labels for training, and this process allows the consistent noiseless mark of generation
Sign;Further, it is also possible to evenly across the change of all of outward appearance, direction, posture or position.
By generated data learning network, using the specific data of scene for producing, visualization compiler produces depth nerve
The visualization procedure of latticed form, the standard operation training according to scene description;
The visualization procedure generated by visualization compiler completes following task jointly:The localization of pedestrian, defines its appearance
The boundary mark of gesture, and split define their pixel;In order to predict pedestrian position, attitude and segmentation mask, network must be to pedestrian
Overall picture, the model before the useful space configuration of the local appearance of terrestrial reference and these parts is modeled;It is outer in order to capture
RGB input mappings are used for the essence of pedestrian, local terrestrial reference and segmentation mask for sight, complete pedestrian and local terrestrial reference outward appearance, study
It is determined that the thermal map regression problem of position;Priori in spatial relationship between component locations is learnt by space confidence (SB) module,
Space confidence module considers the correlation between the thermal map of pedestrian, local terrestrial reference and segmentation mask;By this visualization procedure
Particular instance is referred to as posture network (Pose Net).
Wherein, human body attitude estimating system, is generally considered as independence and serial task, after detection by detection and Attitude estimation
It is Attitude estimation;The True Data mankind detection that these systems or expection have been demarcated, or carried out using ready-made detector
Rough detection;However, detection and positioning parts are highly complementary processes;Detection can greatly influence Attitude estimation mistake
Journey, the presence confidence being accurately positioned for strengthening people in corresponding position of part;Therefore, posture network model couples these
Business, improves the efficiency of pedestrian detection and Attitude estimation.
Fig. 2 is a kind of visualization compiler of personage and pose detection method based on visualization compiler of the present invention.Can
It is used to generate the specific mankind's detection of scene and posture estimation system depending on changing compiler;Its Given information has:
(1) the inherent parameter and extrinsic parameter of camera;
(2) rough physical geometry layout (walk, be seated, standing) of scene and may not be blocked (obstacle) or physically not
The scene areas in the region (wall) of presence;
(3) posture of scene regional pedestrian and direction;
Together with single image, scene description synthesizes physically as the input of compiler in the effective coverage of scene
Ground connection and geometrically accurate people;The set of compiler learning region particular model, detection, Attitude estimation for people and point
Cut;During reasoning, each region in these particular models is run simultaneously on its corresponding region.
Fig. 3 is that a kind of use basic block of personage and pose detection method based on visualization compiler of the present invention defines net
Network.Network is defined using remaining module and the two base units of space confidence module;Introduce remaining unit and solve training deeply
The problem of disappearance gradient in degree convolutional network;It is network to use this elementary cell, and sets up it and carry out definition space confidence (SB)
Module.
Wherein, space confidence module, is mapped to the input feature vector of block part and positions confidence (thermal map), while treatment comes from
Previous piece of input feature vector and part positioning confidence;The characteristics of image and part positioning confidence generated by the block are formed by cascade
Next piece of input;Given input x to SB modules, output y is given by:
Wherein,Represent attended operation, r=frea(X) it is operation by the non-same branch of remaining unit, b=fbelief
X () is represented from input x to expectation thermal map (people's detection, part are detected and segmentation mask) by a series of 1 × 1 convolution;SB units
Network is set to consider contextual information detection confidence level;Confidence level b is positioned from i-th part of SB unitsiTravel to next
(i+1) individual SB blocks, and processed by non-identity path, the correlation between capture various pieces thermal map;By recursively
Be can be seen that using SB unit-distance codes
Due to attended operation, mark shortcut and f in each SB unitrea() treatment comes from all previous SB units
Confidence;Additionally, the detection confidence level figure generated in each SB unit have also contemplated that the part at all previous SB units
Positioning confidence level, each SB unit is with different reception field computations;Therefore, network is utilized in multiple stages and received by multiple
The detection confidence level figure of field size.
Fig. 4 is a kind of posture network (Pose of personage and pose detection method based on visualization compiler of the present invention
Net) alignment by union.Given input picture, posture network association positioning pedestrian, positions body part and row in the form of thermal map
People;Network is made up of complete convolutional layer, keeps spatial context, while improving computational efficiency;To realize being accurately positioned for pedestrian
And Attitude estimation, predicted using intensive thermal map in the entire network, prevent the information caused due to sub-sampling (pond) from losing;
Input picture is by with 5 × 5 convolutional layers and 3 × 3 wave filter of wave filter, it then follows for Object identifying
The design of remaining network;It is afterwards 3 SB units, each has the convolution filter of big received field, increases the received field of network,
Dense prediction is performed simultaneously;SB units are followed by two 1 × 1 convolutional layers, by image feature maps to thermal map;Finally, the company of skipping
Connect for merging the information from multiple difference context areas, combination receives the feature of field from various yardsticks;For examining
Thermal map, body centre and the segmentation that the bounding box of survey is positioned around joint are inferred;
By optimizing network, neural network forecast is minimizedWith for going
People's detection, part are positioned and the multitask mean square error loss L between the preferable thermal map of segmentation mask, are defined as follows,
Wherein, α, β and γ are that hyper parameter is traded off different loss functions.
Wherein, posture network is the high-quality composograph of the pedestrian's outward appearance in usage scenario, visualization compiler study
The complete convolutional neural networks of the specific spatial variations of scene and region;Detection, Attitude estimation and segmentation while for pedestrian;
It can start anew to train generated data.
For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, without departing substantially from essence of the invention
In the case of god and scope, the present invention can be realized with other concrete forms.Additionally, those skilled in the art can be to this hair
Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these improvement also should be regarded as of the invention with modification
Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention
More and modification.
Claims (10)
1. it is a kind of based on the personage for visualizing compiler and pose detection method, it is characterised in that mainly to include scene description
Data Synthesis (one);By generated data learning network (two);Network (three) is defined using basic block;Posture network (Pose
Net) alignment by union (four).
2. based on the visualization compiler described in claims 1, it is characterised in that for generating the specific mankind's detection of scene
And posture estimation system;Its Given information has:
(1) the inherent parameter and extrinsic parameter of camera;
(2) the rough physical geometry of scene is laid out (walk, be seated, standing) and may be blocked (obstacle) or not exist physically
Region (wall) scene areas;
(3) posture of scene regional pedestrian and direction;
Together with single image, scene description synthesizes in the effective coverage of scene and is physically grounded as the input of compiler
Geometrically accurate people;The set of compiler learning region particular model, detection, Attitude estimation and segmentation for people;
During reasoning, each region in these particular models is run simultaneously on its corresponding region.
3. the Data Synthesis () of the scene description being based on described in claims 1, it is characterised in that need high-quality demarcation
Good True Data annotates to train pedestrian detecting system and posture estimation system;Without complicated manually labeling process,
The usage scenario description simulation of visualization compiler is applied to pedestrian's outward appearance in each region of scene, so as to expand to a large amount of scenes
In.
4., based on the scene description described in claims 3, it is characterised in that given scenario is described, compiler firstly generates field
The plane 3D models of scape surround barrier, that is, be fitted ground level, planar wall and cube;Then considered using camera parameter
Camera lens characteristic (for example, the perspective distortion in wide angle camera) and the scene for rendering the accurate people of geometry;Except on the scene
Each " effective pedestrian position " of scape is presented outside the outward appearance of people, and rendering pipeline can also accurately control the change of human appearance
Change, such as sex, height, width, orientation and attitude;Virtual mankind's database includes 139 different models, covers sex, clothes
Dress color and race;Compiler can be from 0 degree to 360 degree, it is also possible to guided by any previous available information;
In order to mark to the life rendered in image into the True Data demarcated, Attribute Association is arrived first by following label
Each 3D dummy model:The 3D positions of 27 parts of segmentation mask and the center of the people for detecting;Then noted from 3D
Release and automatically extract 2D labels for training with camera projective parameter, this process allows the consistent noiseless label of generation;This
Outward, can also be evenly across the change of all of outward appearance, direction, posture or position.
5. based on described in claims 1 by generated data learning network (two), it is characterised in that use the scene for producing
Specific data, visualization compiler produces the visualization procedure of deep neural network form, according to the standard operation of scene description
Training;
The visualization procedure generated by visualization compiler completes following task jointly:The localization of pedestrian, defines its posture
Boundary mark, and split define their pixel;In order to predict pedestrian position, attitude and segmentation mask, network must be to the complete of pedestrian
Model before the useful space configuration of looks, the local appearance of terrestrial reference and these parts is modeled;In order to capture outward appearance, complete
Whole pedestrian and local terrestrial reference outward appearance, study being accurately positioned for pedestrian, local terrestrial reference and segmentation mask by RGB inputs mapping
Thermal map regression problem;Priori in spatial relationship between component locations is learnt by space confidence (SB) module, space is put
Letter module considers the correlation between the thermal map of pedestrian, local terrestrial reference and segmentation mask;By the specific reality of this visualization procedure
Exampleization is referred to as posture network (Pose Net).
6. based on the human body attitude estimating system described in claims 5, it is characterised in that generally regard detection and Attitude estimation
It is independence and serial task, is Attitude estimation after detection;The True Data mankind detection that these systems or expection have been demarcated,
Or detected roughly using ready-made detector;However, detection and positioning parts are highly complementary processes;Detection
Attitude estimation process, the presence confidence being accurately positioned for strengthening people in corresponding position of part can greatly be influenceed;Cause
This, posture network model couples these tasks, improves the efficiency of pedestrian detection and Attitude estimation.
7. network (three) is defined based on the use basic block described in claims 1, it is characterised in that use remaining module and sky
Between confidence module the two base units define network;It is introduced into remaining unit and solves the gradient that disappears in training depth convolutional network
Problem;It is network to use this elementary cell, and sets up it and carry out definition space confidence (SB) module.
8. based on the space confidence module described in claims 7, it is characterised in that the input feature vector of block is mapped into part fixed
(thermal map) is believed in position, while treatment is from previous piece of input feature vector and part positioning confidence;The characteristics of image generated by the block
With the input that part positioning confidence forms next piece by cascade;Given input x to SB modules, output y is given by:
Wherein,Represent attended operation, r=freaX () is the operation by the non-same branch of remaining unit, b=fbelief(x)
Represent from input x to expectation thermal map (people's detection, part are detected and segmentation mask) by a series of 1 × 1 convolution;SB units make net
Network considers contextual information detection confidence level;Part positioning confidence level bi from i-th SB unit travels to next (i+
1) individual SB blocks, and processed by non-identity path, the correlation between capture various pieces thermal map;By recursively applying SB
Unit-distance code can be seen that
Due to attended operation, mark shortcut and f in each SB unitrea() processes putting from all previous SB units
Letter;Additionally, the detection confidence level figure generated in each SB unit have also contemplated that the part positioning at all previous SB units
Confidence level, each SB unit is with different reception field computations;Therefore, network is using in multiple stages and big by multiple received fields
Small detection confidence level figure.
9. based on posture network (Pose Net) alignment by union (four) described in claims 1, it is characterised in that given input
Image, posture network association positioning pedestrian, positions body part and pedestrian in the form of thermal map;Network is by complete convolutional layer group
Into holding spatial context, while improving computational efficiency;To realize being accurately positioned and Attitude estimation for pedestrian, in whole network
It is middle to be predicted using intensive thermal map, prevent the information caused due to sub-sampling (pond) from losing;
Input picture is by with 5 × 5 convolutional layers and 3 × 3 wave filter of wave filter, it then follows for the remnants of Object identifying
The design of network;It is afterwards 3 SB units, each has the convolution filter of big received field, increases the received field of network, while
Perform dense prediction;SB units are followed by two 1 × 1 convolutional layers, by image feature maps to thermal map;Finally, connection is skipped to use
In information of the fusion from multiple difference context areas, combination receives the feature of field from various yardsticks;For what is detected
Thermal map, body centre and the segmentation that bounding box is positioned around joint are inferred;
By optimizing network, neural network forecast is minimizedExamined with for pedestrian
Survey, part positions and the multitask mean square error loss L between the preferable thermal map of segmentation mask, be defined as follows,
Wherein, α, β and γ are that hyper parameter is traded off different loss functions.
10. based on the posture network described in claims 9, it is characterised in that it is the height of the pedestrian's outward appearance in usage scenario
The complete convolutional neural networks of quality combined image, visualization compiler study scene and the specific spatial variations in region;For
Detection, Attitude estimation and segmentation while pedestrian;It can start anew to train generated data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710103927.1A CN106897697A (en) | 2017-02-24 | 2017-02-24 | A kind of personage and pose detection method based on visualization compiler |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710103927.1A CN106897697A (en) | 2017-02-24 | 2017-02-24 | A kind of personage and pose detection method based on visualization compiler |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106897697A true CN106897697A (en) | 2017-06-27 |
Family
ID=59184211
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710103927.1A Withdrawn CN106897697A (en) | 2017-02-24 | 2017-02-24 | A kind of personage and pose detection method based on visualization compiler |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106897697A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107481263A (en) * | 2017-08-10 | 2017-12-15 | 上海体育学院 | Table tennis method for tracking target, device, storage medium and computer equipment |
CN107767419A (en) * | 2017-11-07 | 2018-03-06 | 广州深域信息科技有限公司 | A kind of skeleton critical point detection method and device |
CN108038465A (en) * | 2017-12-25 | 2018-05-15 | 深圳市唯特视科技有限公司 | A kind of three-dimensional more personage's Attitude estimations based on generated data collection |
CN108549844A (en) * | 2018-03-22 | 2018-09-18 | 华侨大学 | A kind of more people's Attitude estimation methods based on multi-layer fractal network and joint relatives' pattern |
CN108717531A (en) * | 2018-05-21 | 2018-10-30 | 西安电子科技大学 | Estimation method of human posture based on Faster R-CNN |
CN108900788A (en) * | 2018-07-12 | 2018-11-27 | 北京市商汤科技开发有限公司 | Video generation method, video-generating device, electronic device and storage medium |
CN109190537A (en) * | 2018-08-23 | 2019-01-11 | 浙江工商大学 | A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning |
CN109215080A (en) * | 2018-09-25 | 2019-01-15 | 清华大学 | 6D Attitude estimation network training method and device based on deep learning Iterative matching |
CN109784296A (en) * | 2019-01-27 | 2019-05-21 | 武汉星巡智能科技有限公司 | Bus occupant quantity statistics method, device and computer readable storage medium |
CN110008915A (en) * | 2019-04-11 | 2019-07-12 | 电子科技大学 | The system and method for dense human body attitude estimation is carried out based on mask-RCNN |
CN110799991A (en) * | 2017-06-28 | 2020-02-14 | 奇跃公司 | Method and system for performing simultaneous localization and mapping using a convolutional image transform |
CN111950321A (en) * | 2019-05-14 | 2020-11-17 | 杭州海康威视数字技术股份有限公司 | Gait recognition method and device, computer equipment and storage medium |
CN112336342A (en) * | 2020-10-29 | 2021-02-09 | 深圳市优必选科技股份有限公司 | Hand key point detection method and device and terminal equipment |
CN113255420A (en) * | 2020-02-11 | 2021-08-13 | 辉达公司 | 3D body pose estimation using unlabeled multi-view data trained models |
CN113408433A (en) * | 2021-06-22 | 2021-09-17 | 华侨大学 | Intelligent monitoring gesture recognition method, device, equipment and storage medium |
CN113436058A (en) * | 2021-06-24 | 2021-09-24 | 深圳市赛维网络科技有限公司 | Character virtual clothes changing method, terminal equipment and storage medium |
CN114322946A (en) * | 2021-12-30 | 2022-04-12 | 杭州环木信息科技有限责任公司 | Method for converting optical data into inertial data with high fidelity |
DE102022119865A1 (en) | 2022-08-08 | 2024-02-08 | Audi Aktiengesellschaft | Method for estimating positions of pivot points and control device for a motor vehicle |
-
2017
- 2017-02-24 CN CN201710103927.1A patent/CN106897697A/en not_active Withdrawn
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110799991A (en) * | 2017-06-28 | 2020-02-14 | 奇跃公司 | Method and system for performing simultaneous localization and mapping using a convolutional image transform |
CN110799991B (en) * | 2017-06-28 | 2023-09-05 | 奇跃公司 | Method and system for performing simultaneous localization and mapping using convolution image transformations |
CN107481263A (en) * | 2017-08-10 | 2017-12-15 | 上海体育学院 | Table tennis method for tracking target, device, storage medium and computer equipment |
CN107481263B (en) * | 2017-08-10 | 2020-05-19 | 上海体育学院 | Table tennis target tracking method, device, storage medium and computer equipment |
CN107767419A (en) * | 2017-11-07 | 2018-03-06 | 广州深域信息科技有限公司 | A kind of skeleton critical point detection method and device |
CN108038465A (en) * | 2017-12-25 | 2018-05-15 | 深圳市唯特视科技有限公司 | A kind of three-dimensional more personage's Attitude estimations based on generated data collection |
CN108549844A (en) * | 2018-03-22 | 2018-09-18 | 华侨大学 | A kind of more people's Attitude estimation methods based on multi-layer fractal network and joint relatives' pattern |
CN108549844B (en) * | 2018-03-22 | 2021-10-26 | 华侨大学 | Multi-person posture estimation method based on fractal network and joint relative mode |
CN108717531B (en) * | 2018-05-21 | 2021-06-08 | 西安电子科技大学 | Human body posture estimation method based on Faster R-CNN |
CN108717531A (en) * | 2018-05-21 | 2018-10-30 | 西安电子科技大学 | Estimation method of human posture based on Faster R-CNN |
CN108900788A (en) * | 2018-07-12 | 2018-11-27 | 北京市商汤科技开发有限公司 | Video generation method, video-generating device, electronic device and storage medium |
CN109190537B (en) * | 2018-08-23 | 2020-09-29 | 浙江工商大学 | Mask perception depth reinforcement learning-based multi-person attitude estimation method |
CN109190537A (en) * | 2018-08-23 | 2019-01-11 | 浙江工商大学 | A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning |
CN109215080A (en) * | 2018-09-25 | 2019-01-15 | 清华大学 | 6D Attitude estimation network training method and device based on deep learning Iterative matching |
US11200696B2 (en) | 2018-09-25 | 2021-12-14 | Tsinghua University | Method and apparatus for training 6D pose estimation network based on deep learning iterative matching |
CN109784296A (en) * | 2019-01-27 | 2019-05-21 | 武汉星巡智能科技有限公司 | Bus occupant quantity statistics method, device and computer readable storage medium |
CN110008915A (en) * | 2019-04-11 | 2019-07-12 | 电子科技大学 | The system and method for dense human body attitude estimation is carried out based on mask-RCNN |
CN110008915B (en) * | 2019-04-11 | 2023-02-03 | 电子科技大学 | System and method for estimating dense human body posture based on mask-RCNN |
CN111950321A (en) * | 2019-05-14 | 2020-11-17 | 杭州海康威视数字技术股份有限公司 | Gait recognition method and device, computer equipment and storage medium |
CN111950321B (en) * | 2019-05-14 | 2023-12-05 | 杭州海康威视数字技术股份有限公司 | Gait recognition method, device, computer equipment and storage medium |
CN113255420A (en) * | 2020-02-11 | 2021-08-13 | 辉达公司 | 3D body pose estimation using unlabeled multi-view data trained models |
CN112336342A (en) * | 2020-10-29 | 2021-02-09 | 深圳市优必选科技股份有限公司 | Hand key point detection method and device and terminal equipment |
CN112336342B (en) * | 2020-10-29 | 2023-10-24 | 深圳市优必选科技股份有限公司 | Hand key point detection method and device and terminal equipment |
CN113408433A (en) * | 2021-06-22 | 2021-09-17 | 华侨大学 | Intelligent monitoring gesture recognition method, device, equipment and storage medium |
CN113408433B (en) * | 2021-06-22 | 2023-12-05 | 华侨大学 | Intelligent monitoring gesture recognition method, device, equipment and storage medium |
CN113436058A (en) * | 2021-06-24 | 2021-09-24 | 深圳市赛维网络科技有限公司 | Character virtual clothes changing method, terminal equipment and storage medium |
CN113436058B (en) * | 2021-06-24 | 2023-10-20 | 深圳市赛维网络科技有限公司 | Character virtual clothes changing method, terminal equipment and storage medium |
CN114322946A (en) * | 2021-12-30 | 2022-04-12 | 杭州环木信息科技有限责任公司 | Method for converting optical data into inertial data with high fidelity |
CN114322946B (en) * | 2021-12-30 | 2024-01-09 | 杭州环木信息科技有限责任公司 | Method for converting optical data into inertial data with high fidelity |
DE102022119865A1 (en) | 2022-08-08 | 2024-02-08 | Audi Aktiengesellschaft | Method for estimating positions of pivot points and control device for a motor vehicle |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106897697A (en) | A kind of personage and pose detection method based on visualization compiler | |
US11816907B2 (en) | Systems and methods for extracting information about objects from scene information | |
Häne et al. | Dense semantic 3d reconstruction | |
KR20200040665A (en) | Systems and methods for detecting a point of interest change using a convolutional neural network | |
CN109176512A (en) | A kind of method, robot and the control device of motion sensing control robot | |
CN101243470A (en) | Object tracking system | |
Miclea et al. | Monocular depth estimation with improved long-range accuracy for UAV environment perception | |
CN109752855A (en) | A kind of method of hot spot emitter and detection geometry hot spot | |
US10885708B2 (en) | Automated costume augmentation using shape estimation | |
Wang et al. | Robust AUV visual loop-closure detection based on variational autoencoder network | |
Zhong et al. | WF-SLAM: A robust VSLAM for dynamic scenarios via weighted features | |
Zhao et al. | Real-time visual-inertial localization using semantic segmentation towards dynamic environments | |
Zhu et al. | Fusing panoptic segmentation and geometry information for robust visual slam in dynamic environments | |
Singh et al. | Fast semantic-aware motion state detection for visual slam in dynamic environment | |
Kim et al. | CT-Loc: Cross-domain visual localization with a channel-wise transformer | |
Lee et al. | Visual compiler: synthesizing a scene-specific pedestrian detector and pose estimator | |
JP7304235B2 (en) | Trained model, learning device, learning method, and learning program | |
Li et al. | Dynamic objects recognizing and masking for RGB-D SLAM | |
Xu et al. | Indoor localization using region-based convolutional neural network | |
Doğan et al. | An augmented crowd simulation system using automatic determination of navigable areas | |
Rimboux et al. | Smart IoT cameras for crowd analysis based on augmentation for automatic pedestrian detection, simulation and annotation | |
Guo et al. | 3D object detection and tracking based on streaming data | |
CN113191462A (en) | Information acquisition method, image processing method and device and electronic equipment | |
Zeng | Deep Building Recognition: Interior Layouts and Exterior Planes | |
CN113642395B (en) | Building scene structure extraction method for city augmented reality information labeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170627 |