CN106897697A - A kind of personage and pose detection method based on visualization compiler - Google Patents

A kind of personage and pose detection method based on visualization compiler Download PDF

Info

Publication number
CN106897697A
CN106897697A CN201710103927.1A CN201710103927A CN106897697A CN 106897697 A CN106897697 A CN 106897697A CN 201710103927 A CN201710103927 A CN 201710103927A CN 106897697 A CN106897697 A CN 106897697A
Authority
CN
China
Prior art keywords
network
pedestrian
posture
compiler
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710103927.1A
Other languages
Chinese (zh)
Inventor
夏春秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vision Technology Co Ltd
Original Assignee
Shenzhen Vision Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vision Technology Co Ltd filed Critical Shenzhen Vision Technology Co Ltd
Priority to CN201710103927.1A priority Critical patent/CN106897697A/en
Publication of CN106897697A publication Critical patent/CN106897697A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

A kind of personage and pose detection method based on visualization compiler proposed in the present invention, its main contents include:The Data Synthesis of scene description, by generated data learning network, using basic block define network, posture network (Pose Net) alignment by union, its process is, first with scene description as the input for visualizing compiler, annotated to train pedestrian detecting system and posture estimation system with the True Data demarcated, then by generated data learning network;It is then used by remaining module and the two base units of space confidence module to define network, finally with posture network positions pedestrian.The present invention can automatically obtain annotation, body part position and the segmentation mask of detection, and pedestrian is positioned by using video camera, estimate its posture and carry out activity analysis;The influence to detecting such as reduce illumination, block, being effectively improved recognition efficiency.

Description

A kind of personage and pose detection method based on visualization compiler
Technical field
The present invention relates to personage's posture detection field, more particularly, to a kind of personage based on visualization compiler and appearance Gesture detection method.
Background technology
The detection of human action posture the fields such as video monitoring, virtual reality, interpersonal intelligent interaction extensive use and Study hotspot as computer vision field, its prison that can be used for dangerous posture in the intelligent monitoring of public arena and crowd Control etc..Although the research of recent year human posture detection achieves impressive progress, the high complexity of human posture and many Variability causes that the accuracy and high efficiency of identification do not fully meet the use requirement of relevant industries.Different illumination, regard The condition such as angle and background can cause that human body behavior produces difference in posture and characteristic, in addition human body from blocking, partial occlusion, people Body individual difference, many person recognitions etc. are all the embodiments spatially of human posture's detection of complex, so personage and posture inspection Survey method needs further research.
The present invention proposes a kind of personage based on visualization compiler and pose detection method, first uses scene description conduct The input of compiler is visualized, is annotated to train pedestrian detecting system and posture estimation system with the True Data demarcated, then By generated data learning network;Remaining module and the two base units of space confidence module are then used by define network, Finally use posture network positions pedestrian.The present invention can automatically obtain annotation, body part position and the segmentation mask of detection, lead to Cross using video camera to position pedestrian, estimate its posture and carry out activity analysis;Reduce illumination, block etc. to detection Influence, be effectively improved recognition efficiency.
The content of the invention
For illumination, the problem that influence can be produced such as block, compiled based on visualization it is an object of the invention to provide one kind The personage for translating device and pose detection method, first with scene description as the input for visualizing compiler, with the true number demarcated Pedestrian detecting system and posture estimation system are trained according to annotation, then by generated data learning network;It is then used by residual mode Block and the two base units of space confidence module define network, finally with posture network positions pedestrian.
To solve the above problems, the present invention provides a kind of personage based on visualization compiler and pose detection method, its Main contents include:
(1) Data Synthesis of scene description;
(2) by generated data learning network;
(3) network is defined using basic block;
(4) posture network (Pose Net) alignment by union.
Wherein, described visualization compiler, for generating the specific mankind's detection of scene and posture estimation system;It is Know that information has:
(1) the inherent parameter and extrinsic parameter of camera;
(2) rough physical geometry layout (walk, be seated, standing) of scene and may not be blocked (obstacle) or physically not The scene areas in the region (wall) of presence;
(3) posture of scene regional pedestrian and direction;
Together with single image, scene description synthesizes physically as the input of compiler in the effective coverage of scene Ground connection and geometrically accurate people;The set of compiler learning region particular model, detection, Attitude estimation for people and point Cut;During reasoning, each region in these particular models is run simultaneously on its corresponding region.
Wherein, the Data Synthesis of described scene description are, it is necessary to the good True Data of high-quality demarcation annotates to train Pedestrian detecting system and posture estimation system;Without complicated manually labeling process, visualization compiler usage scenario is retouched Pedestrian outward appearance of the simulation suitable for each region of scene is stated, so as in expanding to a large amount of scenes.
Further, described scene description, given scenario description, the plane 3D models that compiler firstly generates scene come Barrier is surrounded, that is, is fitted ground level, planar wall and cube;Then camera lens characteristic (example is considered using camera parameter Such as, the perspective distortion in wide angle camera) and for rendering the scene of the accurate people of geometry;Except each " the effective row in scene People position " is presented outside the outward appearance of people, and rendering pipeline can also accurately control the change of human appearance, such as sex, height, width Degree, orientation and attitude;Virtual mankind's database includes 139 different models, covers sex, clothing color and race;Compiling Device can be from 0 degree to 360 degree, it is also possible to guided by any previous available information;
In order to mark to the life rendered in image into the True Data demarcated, attribute is closed first by following label It is linked to each 3D dummy model:The 3D positions of 27 parts of segmentation mask and the center of the people for detecting;Then from 3D annotations and camera projective parameter automatically extract the 2D labels for training, and this process allows the consistent noiseless mark of generation Sign;Further, it is also possible to evenly across the change of all of outward appearance, direction, posture or position.
Wherein, it is described by generated data learning network, the specific data of scene for producing are used, visualization compiler is produced The visualization procedure of raw deep neural network form, the standard operation training according to scene description;
The visualization procedure generated by visualization compiler completes following task jointly:The localization of pedestrian, defines its appearance The boundary mark of gesture, and split define their pixel;In order to predict pedestrian position, attitude and segmentation mask, network must be to pedestrian Overall picture, the model before the useful space configuration of the local appearance of terrestrial reference and these parts is modeled;It is outer in order to capture RGB input mappings are used for the essence of pedestrian, local terrestrial reference and segmentation mask for sight, complete pedestrian and local terrestrial reference outward appearance, study It is determined that the thermal map regression problem of position;Priori in spatial relationship between component locations is learnt by space confidence (SB) module, Space confidence module considers the correlation between the thermal map of pedestrian, local terrestrial reference and segmentation mask;By this visualization procedure Particular instance is referred to as posture network (Pose Net).
Further, described human body attitude estimating system, is generally considered as detection and Attitude estimation independent and order and appoints Business, is Attitude estimation after detection;The True Data mankind detection that these systems or expection have been demarcated, or using ready-made Detector is detected roughly;However, detection and positioning parts are highly complementary processes;Detection can greatly influence Attitude estimation process, the presence confidence being accurately positioned for strengthening people in corresponding position of part;Therefore, posture network model These tasks are coupled, the efficiency of pedestrian detection and Attitude estimation is improved.
Wherein, described use basic block defines network, and using remaining module and space confidence module, the two are substantially single Position defines network;It is introduced into remaining unit and solves the problems, such as disappearance gradient in training depth convolutional network;It is substantially single using this Unit is network, and sets up it and carry out definition space confidence (SB) module.
Further, described space confidence module, is mapped to the input feature vector of block part and positions confidence (thermal map), together When treatment from previous piece of input feature vector and part positioning confidence;The characteristics of image and part positioning confidence generated by the block lead to Cross the input that cascade forms next piece;Given input x to SB modules, output y is given by:
Wherein,Represent attended operation, r=freaX () is the operation by the non-same branch of remaining unit, b= fbeliefX () is represented from input x to expectation thermal map (people's detection, part are detected and segmentation mask) by a series of 1 × 1 convolution;SB Unit makes network consider contextual information detection confidence level;Confidence level b is positioned from i-th part of SB unitsiTravel to down One (i+1) individual SB block, and processed by non-identity path, the correlation between capture various pieces thermal map;By passing Be can be seen that using SB unit-distance codes with returning
Due to attended operation, mark shortcut and f in each SB unitrea() treatment comes from all previous SB units Confidence;Additionally, the detection confidence level figure generated in each SB unit have also contemplated that the part at all previous SB units Positioning confidence level, each SB unit is with different reception field computations;Therefore, network is utilized in multiple stages and received by multiple The detection confidence level figure of field size.
Wherein, described posture network (Pose Net) alignment by union, gives input picture, posture network association home row People, positions body part and pedestrian in the form of thermal map;Network is made up of complete convolutional layer, spatial context is kept, while carrying Computationally efficient;To realize being accurately positioned and Attitude estimation for pedestrian, predicted using intensive thermal map in the entire network, prevent by The information caused in sub-sampling (pond) is lost;
Input picture is by with 5 × 5 convolutional layers and 3 × 3 wave filter of wave filter, it then follows for Object identifying The design of remaining network;It is afterwards 3 SB units, each has the convolution filter of big received field, increases the received field of network, Dense prediction is performed simultaneously;SB units are followed by two 1 × 1 convolutional layers, by image feature maps to thermal map;Finally, the company of skipping Connect for merging the information from multiple difference context areas, combination receives the feature of field from various yardsticks;For examining Thermal map, body centre and the segmentation that the bounding box of survey is positioned around joint are inferred;
By optimizing network, neural network forecast is minimizedWith for going People's detection, part are positioned and the multitask mean square error loss L between the preferable thermal map of segmentation mask, are defined as follows,
Wherein, α, β and γ are that hyper parameter is traded off different loss functions.
Further, described posture network, it is the high-quality composograph of the pedestrian's outward appearance in usage scenario, visually Change the complete convolutional neural networks of compiler study scene and the specific spatial variations in region;Detected while for pedestrian, appearance State is estimated and is split;It can start anew to train generated data.
Brief description of the drawings
Fig. 1 is a kind of system flow chart of personage and pose detection method based on visualization compiler of the present invention.
Fig. 2 is a kind of visualization compiler of personage and pose detection method based on visualization compiler of the present invention.
Fig. 3 is that a kind of use basic block of personage and pose detection method based on visualization compiler of the present invention defines net Network.
Fig. 4 is a kind of posture network (Pose of personage and pose detection method based on visualization compiler of the present invention Net) alignment by union.
Specific embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
Fig. 1 is a kind of system flow chart of personage and pose detection method based on visualization compiler of the present invention.Mainly Data Synthesis including scene description, by generated data learning network, network, posture network (Pose are defined using basic block Net) alignment by union.
The Data Synthesis of scene description are, it is necessary to the good True Data of high-quality demarcation annotates to train pedestrian detecting system And posture estimation system;Without complicated manually labeling process, visualization compiler usage scenario describes simulation and is applied to Pedestrian's outward appearance in each region of scene, so as in expanding to a large amount of scenes.
Given scenario is described, and compiler firstly generates the plane 3D models of scene to surround barrier, that is, be fitted ground level, Planar wall and cube;Then camera lens characteristic (for example, the perspective distortion in wide angle camera) is considered using camera parameter With the scene for rendering the accurate people of geometry;In addition to the outward appearance of people is presented in each " the effective pedestrian position " of scene, Rendering pipeline can also accurately control the change of human appearance, such as sex, height, width, orientation and attitude;Virtual mankind's number Include 139 different models according to storehouse, cover sex, clothing color and race;Compiler can be from 0 degree to 360 degree, it is also possible to by Any previous available information guiding;
In order to mark to the life rendered in image into the True Data demarcated, attribute is closed first by following label It is linked to each 3D dummy model:The 3D positions of 27 parts of segmentation mask and the center of the people for detecting;Then from 3D annotations and camera projective parameter automatically extract the 2D labels for training, and this process allows the consistent noiseless mark of generation Sign;Further, it is also possible to evenly across the change of all of outward appearance, direction, posture or position.
By generated data learning network, using the specific data of scene for producing, visualization compiler produces depth nerve The visualization procedure of latticed form, the standard operation training according to scene description;
The visualization procedure generated by visualization compiler completes following task jointly:The localization of pedestrian, defines its appearance The boundary mark of gesture, and split define their pixel;In order to predict pedestrian position, attitude and segmentation mask, network must be to pedestrian Overall picture, the model before the useful space configuration of the local appearance of terrestrial reference and these parts is modeled;It is outer in order to capture RGB input mappings are used for the essence of pedestrian, local terrestrial reference and segmentation mask for sight, complete pedestrian and local terrestrial reference outward appearance, study It is determined that the thermal map regression problem of position;Priori in spatial relationship between component locations is learnt by space confidence (SB) module, Space confidence module considers the correlation between the thermal map of pedestrian, local terrestrial reference and segmentation mask;By this visualization procedure Particular instance is referred to as posture network (Pose Net).
Wherein, human body attitude estimating system, is generally considered as independence and serial task, after detection by detection and Attitude estimation It is Attitude estimation;The True Data mankind detection that these systems or expection have been demarcated, or carried out using ready-made detector Rough detection;However, detection and positioning parts are highly complementary processes;Detection can greatly influence Attitude estimation mistake Journey, the presence confidence being accurately positioned for strengthening people in corresponding position of part;Therefore, posture network model couples these Business, improves the efficiency of pedestrian detection and Attitude estimation.
Fig. 2 is a kind of visualization compiler of personage and pose detection method based on visualization compiler of the present invention.Can It is used to generate the specific mankind's detection of scene and posture estimation system depending on changing compiler;Its Given information has:
(1) the inherent parameter and extrinsic parameter of camera;
(2) rough physical geometry layout (walk, be seated, standing) of scene and may not be blocked (obstacle) or physically not The scene areas in the region (wall) of presence;
(3) posture of scene regional pedestrian and direction;
Together with single image, scene description synthesizes physically as the input of compiler in the effective coverage of scene Ground connection and geometrically accurate people;The set of compiler learning region particular model, detection, Attitude estimation for people and point Cut;During reasoning, each region in these particular models is run simultaneously on its corresponding region.
Fig. 3 is that a kind of use basic block of personage and pose detection method based on visualization compiler of the present invention defines net Network.Network is defined using remaining module and the two base units of space confidence module;Introduce remaining unit and solve training deeply The problem of disappearance gradient in degree convolutional network;It is network to use this elementary cell, and sets up it and carry out definition space confidence (SB) Module.
Wherein, space confidence module, is mapped to the input feature vector of block part and positions confidence (thermal map), while treatment comes from Previous piece of input feature vector and part positioning confidence;The characteristics of image and part positioning confidence generated by the block are formed by cascade Next piece of input;Given input x to SB modules, output y is given by:
Wherein,Represent attended operation, r=frea(X) it is operation by the non-same branch of remaining unit, b=fbelief X () is represented from input x to expectation thermal map (people's detection, part are detected and segmentation mask) by a series of 1 × 1 convolution;SB units Network is set to consider contextual information detection confidence level;Confidence level b is positioned from i-th part of SB unitsiTravel to next (i+1) individual SB blocks, and processed by non-identity path, the correlation between capture various pieces thermal map;By recursively Be can be seen that using SB unit-distance codes
Due to attended operation, mark shortcut and f in each SB unitrea() treatment comes from all previous SB units Confidence;Additionally, the detection confidence level figure generated in each SB unit have also contemplated that the part at all previous SB units Positioning confidence level, each SB unit is with different reception field computations;Therefore, network is utilized in multiple stages and received by multiple The detection confidence level figure of field size.
Fig. 4 is a kind of posture network (Pose of personage and pose detection method based on visualization compiler of the present invention Net) alignment by union.Given input picture, posture network association positioning pedestrian, positions body part and row in the form of thermal map People;Network is made up of complete convolutional layer, keeps spatial context, while improving computational efficiency;To realize being accurately positioned for pedestrian And Attitude estimation, predicted using intensive thermal map in the entire network, prevent the information caused due to sub-sampling (pond) from losing;
Input picture is by with 5 × 5 convolutional layers and 3 × 3 wave filter of wave filter, it then follows for Object identifying The design of remaining network;It is afterwards 3 SB units, each has the convolution filter of big received field, increases the received field of network, Dense prediction is performed simultaneously;SB units are followed by two 1 × 1 convolutional layers, by image feature maps to thermal map;Finally, the company of skipping Connect for merging the information from multiple difference context areas, combination receives the feature of field from various yardsticks;For examining Thermal map, body centre and the segmentation that the bounding box of survey is positioned around joint are inferred;
By optimizing network, neural network forecast is minimizedWith for going People's detection, part are positioned and the multitask mean square error loss L between the preferable thermal map of segmentation mask, are defined as follows,
Wherein, α, β and γ are that hyper parameter is traded off different loss functions.
Wherein, posture network is the high-quality composograph of the pedestrian's outward appearance in usage scenario, visualization compiler study The complete convolutional neural networks of the specific spatial variations of scene and region;Detection, Attitude estimation and segmentation while for pedestrian; It can start anew to train generated data.
For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, without departing substantially from essence of the invention In the case of god and scope, the present invention can be realized with other concrete forms.Additionally, those skilled in the art can be to this hair Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these improvement also should be regarded as of the invention with modification Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention More and modification.

Claims (10)

1. it is a kind of based on the personage for visualizing compiler and pose detection method, it is characterised in that mainly to include scene description Data Synthesis (one);By generated data learning network (two);Network (three) is defined using basic block;Posture network (Pose Net) alignment by union (four).
2. based on the visualization compiler described in claims 1, it is characterised in that for generating the specific mankind's detection of scene And posture estimation system;Its Given information has:
(1) the inherent parameter and extrinsic parameter of camera;
(2) the rough physical geometry of scene is laid out (walk, be seated, standing) and may be blocked (obstacle) or not exist physically Region (wall) scene areas;
(3) posture of scene regional pedestrian and direction;
Together with single image, scene description synthesizes in the effective coverage of scene and is physically grounded as the input of compiler Geometrically accurate people;The set of compiler learning region particular model, detection, Attitude estimation and segmentation for people; During reasoning, each region in these particular models is run simultaneously on its corresponding region.
3. the Data Synthesis () of the scene description being based on described in claims 1, it is characterised in that need high-quality demarcation Good True Data annotates to train pedestrian detecting system and posture estimation system;Without complicated manually labeling process, The usage scenario description simulation of visualization compiler is applied to pedestrian's outward appearance in each region of scene, so as to expand to a large amount of scenes In.
4., based on the scene description described in claims 3, it is characterised in that given scenario is described, compiler firstly generates field The plane 3D models of scape surround barrier, that is, be fitted ground level, planar wall and cube;Then considered using camera parameter Camera lens characteristic (for example, the perspective distortion in wide angle camera) and the scene for rendering the accurate people of geometry;Except on the scene Each " effective pedestrian position " of scape is presented outside the outward appearance of people, and rendering pipeline can also accurately control the change of human appearance Change, such as sex, height, width, orientation and attitude;Virtual mankind's database includes 139 different models, covers sex, clothes Dress color and race;Compiler can be from 0 degree to 360 degree, it is also possible to guided by any previous available information;
In order to mark to the life rendered in image into the True Data demarcated, Attribute Association is arrived first by following label Each 3D dummy model:The 3D positions of 27 parts of segmentation mask and the center of the people for detecting;Then noted from 3D Release and automatically extract 2D labels for training with camera projective parameter, this process allows the consistent noiseless label of generation;This Outward, can also be evenly across the change of all of outward appearance, direction, posture or position.
5. based on described in claims 1 by generated data learning network (two), it is characterised in that use the scene for producing Specific data, visualization compiler produces the visualization procedure of deep neural network form, according to the standard operation of scene description Training;
The visualization procedure generated by visualization compiler completes following task jointly:The localization of pedestrian, defines its posture Boundary mark, and split define their pixel;In order to predict pedestrian position, attitude and segmentation mask, network must be to the complete of pedestrian Model before the useful space configuration of looks, the local appearance of terrestrial reference and these parts is modeled;In order to capture outward appearance, complete Whole pedestrian and local terrestrial reference outward appearance, study being accurately positioned for pedestrian, local terrestrial reference and segmentation mask by RGB inputs mapping Thermal map regression problem;Priori in spatial relationship between component locations is learnt by space confidence (SB) module, space is put Letter module considers the correlation between the thermal map of pedestrian, local terrestrial reference and segmentation mask;By the specific reality of this visualization procedure Exampleization is referred to as posture network (Pose Net).
6. based on the human body attitude estimating system described in claims 5, it is characterised in that generally regard detection and Attitude estimation It is independence and serial task, is Attitude estimation after detection;The True Data mankind detection that these systems or expection have been demarcated, Or detected roughly using ready-made detector;However, detection and positioning parts are highly complementary processes;Detection Attitude estimation process, the presence confidence being accurately positioned for strengthening people in corresponding position of part can greatly be influenceed;Cause This, posture network model couples these tasks, improves the efficiency of pedestrian detection and Attitude estimation.
7. network (three) is defined based on the use basic block described in claims 1, it is characterised in that use remaining module and sky Between confidence module the two base units define network;It is introduced into remaining unit and solves the gradient that disappears in training depth convolutional network Problem;It is network to use this elementary cell, and sets up it and carry out definition space confidence (SB) module.
8. based on the space confidence module described in claims 7, it is characterised in that the input feature vector of block is mapped into part fixed (thermal map) is believed in position, while treatment is from previous piece of input feature vector and part positioning confidence;The characteristics of image generated by the block With the input that part positioning confidence forms next piece by cascade;Given input x to SB modules, output y is given by:
Wherein,Represent attended operation, r=freaX () is the operation by the non-same branch of remaining unit, b=fbelief(x) Represent from input x to expectation thermal map (people's detection, part are detected and segmentation mask) by a series of 1 × 1 convolution;SB units make net Network considers contextual information detection confidence level;Part positioning confidence level bi from i-th SB unit travels to next (i+ 1) individual SB blocks, and processed by non-identity path, the correlation between capture various pieces thermal map;By recursively applying SB Unit-distance code can be seen that
Due to attended operation, mark shortcut and f in each SB unitrea() processes putting from all previous SB units Letter;Additionally, the detection confidence level figure generated in each SB unit have also contemplated that the part positioning at all previous SB units Confidence level, each SB unit is with different reception field computations;Therefore, network is using in multiple stages and big by multiple received fields Small detection confidence level figure.
9. based on posture network (Pose Net) alignment by union (four) described in claims 1, it is characterised in that given input Image, posture network association positioning pedestrian, positions body part and pedestrian in the form of thermal map;Network is by complete convolutional layer group Into holding spatial context, while improving computational efficiency;To realize being accurately positioned and Attitude estimation for pedestrian, in whole network It is middle to be predicted using intensive thermal map, prevent the information caused due to sub-sampling (pond) from losing;
Input picture is by with 5 × 5 convolutional layers and 3 × 3 wave filter of wave filter, it then follows for the remnants of Object identifying The design of network;It is afterwards 3 SB units, each has the convolution filter of big received field, increases the received field of network, while Perform dense prediction;SB units are followed by two 1 × 1 convolutional layers, by image feature maps to thermal map;Finally, connection is skipped to use In information of the fusion from multiple difference context areas, combination receives the feature of field from various yardsticks;For what is detected Thermal map, body centre and the segmentation that bounding box is positioned around joint are inferred;
By optimizing network, neural network forecast is minimizedExamined with for pedestrian Survey, part positions and the multitask mean square error loss L between the preferable thermal map of segmentation mask, be defined as follows,
Wherein, α, β and γ are that hyper parameter is traded off different loss functions.
10. based on the posture network described in claims 9, it is characterised in that it is the height of the pedestrian's outward appearance in usage scenario The complete convolutional neural networks of quality combined image, visualization compiler study scene and the specific spatial variations in region;For Detection, Attitude estimation and segmentation while pedestrian;It can start anew to train generated data.
CN201710103927.1A 2017-02-24 2017-02-24 A kind of personage and pose detection method based on visualization compiler Withdrawn CN106897697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710103927.1A CN106897697A (en) 2017-02-24 2017-02-24 A kind of personage and pose detection method based on visualization compiler

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710103927.1A CN106897697A (en) 2017-02-24 2017-02-24 A kind of personage and pose detection method based on visualization compiler

Publications (1)

Publication Number Publication Date
CN106897697A true CN106897697A (en) 2017-06-27

Family

ID=59184211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710103927.1A Withdrawn CN106897697A (en) 2017-02-24 2017-02-24 A kind of personage and pose detection method based on visualization compiler

Country Status (1)

Country Link
CN (1) CN106897697A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107481263A (en) * 2017-08-10 2017-12-15 上海体育学院 Table tennis method for tracking target, device, storage medium and computer equipment
CN107767419A (en) * 2017-11-07 2018-03-06 广州深域信息科技有限公司 A kind of skeleton critical point detection method and device
CN108038465A (en) * 2017-12-25 2018-05-15 深圳市唯特视科技有限公司 A kind of three-dimensional more personage's Attitude estimations based on generated data collection
CN108549844A (en) * 2018-03-22 2018-09-18 华侨大学 A kind of more people's Attitude estimation methods based on multi-layer fractal network and joint relatives' pattern
CN108717531A (en) * 2018-05-21 2018-10-30 西安电子科技大学 Estimation method of human posture based on Faster R-CNN
CN108900788A (en) * 2018-07-12 2018-11-27 北京市商汤科技开发有限公司 Video generation method, video-generating device, electronic device and storage medium
CN109190537A (en) * 2018-08-23 2019-01-11 浙江工商大学 A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning
CN109215080A (en) * 2018-09-25 2019-01-15 清华大学 6D Attitude estimation network training method and device based on deep learning Iterative matching
CN109784296A (en) * 2019-01-27 2019-05-21 武汉星巡智能科技有限公司 Bus occupant quantity statistics method, device and computer readable storage medium
CN110008915A (en) * 2019-04-11 2019-07-12 电子科技大学 The system and method for dense human body attitude estimation is carried out based on mask-RCNN
CN110799991A (en) * 2017-06-28 2020-02-14 奇跃公司 Method and system for performing simultaneous localization and mapping using a convolutional image transform
CN111950321A (en) * 2019-05-14 2020-11-17 杭州海康威视数字技术股份有限公司 Gait recognition method and device, computer equipment and storage medium
CN112336342A (en) * 2020-10-29 2021-02-09 深圳市优必选科技股份有限公司 Hand key point detection method and device and terminal equipment
CN113255420A (en) * 2020-02-11 2021-08-13 辉达公司 3D body pose estimation using unlabeled multi-view data trained models
CN113408433A (en) * 2021-06-22 2021-09-17 华侨大学 Intelligent monitoring gesture recognition method, device, equipment and storage medium
CN113436058A (en) * 2021-06-24 2021-09-24 深圳市赛维网络科技有限公司 Character virtual clothes changing method, terminal equipment and storage medium
CN114322946A (en) * 2021-12-30 2022-04-12 杭州环木信息科技有限责任公司 Method for converting optical data into inertial data with high fidelity
DE102022119865A1 (en) 2022-08-08 2024-02-08 Audi Aktiengesellschaft Method for estimating positions of pivot points and control device for a motor vehicle

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110799991A (en) * 2017-06-28 2020-02-14 奇跃公司 Method and system for performing simultaneous localization and mapping using a convolutional image transform
CN110799991B (en) * 2017-06-28 2023-09-05 奇跃公司 Method and system for performing simultaneous localization and mapping using convolution image transformations
CN107481263A (en) * 2017-08-10 2017-12-15 上海体育学院 Table tennis method for tracking target, device, storage medium and computer equipment
CN107481263B (en) * 2017-08-10 2020-05-19 上海体育学院 Table tennis target tracking method, device, storage medium and computer equipment
CN107767419A (en) * 2017-11-07 2018-03-06 广州深域信息科技有限公司 A kind of skeleton critical point detection method and device
CN108038465A (en) * 2017-12-25 2018-05-15 深圳市唯特视科技有限公司 A kind of three-dimensional more personage's Attitude estimations based on generated data collection
CN108549844A (en) * 2018-03-22 2018-09-18 华侨大学 A kind of more people's Attitude estimation methods based on multi-layer fractal network and joint relatives' pattern
CN108549844B (en) * 2018-03-22 2021-10-26 华侨大学 Multi-person posture estimation method based on fractal network and joint relative mode
CN108717531B (en) * 2018-05-21 2021-06-08 西安电子科技大学 Human body posture estimation method based on Faster R-CNN
CN108717531A (en) * 2018-05-21 2018-10-30 西安电子科技大学 Estimation method of human posture based on Faster R-CNN
CN108900788A (en) * 2018-07-12 2018-11-27 北京市商汤科技开发有限公司 Video generation method, video-generating device, electronic device and storage medium
CN109190537B (en) * 2018-08-23 2020-09-29 浙江工商大学 Mask perception depth reinforcement learning-based multi-person attitude estimation method
CN109190537A (en) * 2018-08-23 2019-01-11 浙江工商大学 A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning
CN109215080A (en) * 2018-09-25 2019-01-15 清华大学 6D Attitude estimation network training method and device based on deep learning Iterative matching
US11200696B2 (en) 2018-09-25 2021-12-14 Tsinghua University Method and apparatus for training 6D pose estimation network based on deep learning iterative matching
CN109784296A (en) * 2019-01-27 2019-05-21 武汉星巡智能科技有限公司 Bus occupant quantity statistics method, device and computer readable storage medium
CN110008915A (en) * 2019-04-11 2019-07-12 电子科技大学 The system and method for dense human body attitude estimation is carried out based on mask-RCNN
CN110008915B (en) * 2019-04-11 2023-02-03 电子科技大学 System and method for estimating dense human body posture based on mask-RCNN
CN111950321A (en) * 2019-05-14 2020-11-17 杭州海康威视数字技术股份有限公司 Gait recognition method and device, computer equipment and storage medium
CN111950321B (en) * 2019-05-14 2023-12-05 杭州海康威视数字技术股份有限公司 Gait recognition method, device, computer equipment and storage medium
CN113255420A (en) * 2020-02-11 2021-08-13 辉达公司 3D body pose estimation using unlabeled multi-view data trained models
CN112336342A (en) * 2020-10-29 2021-02-09 深圳市优必选科技股份有限公司 Hand key point detection method and device and terminal equipment
CN112336342B (en) * 2020-10-29 2023-10-24 深圳市优必选科技股份有限公司 Hand key point detection method and device and terminal equipment
CN113408433A (en) * 2021-06-22 2021-09-17 华侨大学 Intelligent monitoring gesture recognition method, device, equipment and storage medium
CN113408433B (en) * 2021-06-22 2023-12-05 华侨大学 Intelligent monitoring gesture recognition method, device, equipment and storage medium
CN113436058A (en) * 2021-06-24 2021-09-24 深圳市赛维网络科技有限公司 Character virtual clothes changing method, terminal equipment and storage medium
CN113436058B (en) * 2021-06-24 2023-10-20 深圳市赛维网络科技有限公司 Character virtual clothes changing method, terminal equipment and storage medium
CN114322946A (en) * 2021-12-30 2022-04-12 杭州环木信息科技有限责任公司 Method for converting optical data into inertial data with high fidelity
CN114322946B (en) * 2021-12-30 2024-01-09 杭州环木信息科技有限责任公司 Method for converting optical data into inertial data with high fidelity
DE102022119865A1 (en) 2022-08-08 2024-02-08 Audi Aktiengesellschaft Method for estimating positions of pivot points and control device for a motor vehicle

Similar Documents

Publication Publication Date Title
CN106897697A (en) A kind of personage and pose detection method based on visualization compiler
US11816907B2 (en) Systems and methods for extracting information about objects from scene information
Häne et al. Dense semantic 3d reconstruction
KR20200040665A (en) Systems and methods for detecting a point of interest change using a convolutional neural network
CN109176512A (en) A kind of method, robot and the control device of motion sensing control robot
CN101243470A (en) Object tracking system
Miclea et al. Monocular depth estimation with improved long-range accuracy for UAV environment perception
CN109752855A (en) A kind of method of hot spot emitter and detection geometry hot spot
US10885708B2 (en) Automated costume augmentation using shape estimation
Wang et al. Robust AUV visual loop-closure detection based on variational autoencoder network
Zhong et al. WF-SLAM: A robust VSLAM for dynamic scenarios via weighted features
Zhao et al. Real-time visual-inertial localization using semantic segmentation towards dynamic environments
Zhu et al. Fusing panoptic segmentation and geometry information for robust visual slam in dynamic environments
Singh et al. Fast semantic-aware motion state detection for visual slam in dynamic environment
Kim et al. CT-Loc: Cross-domain visual localization with a channel-wise transformer
Lee et al. Visual compiler: synthesizing a scene-specific pedestrian detector and pose estimator
JP7304235B2 (en) Trained model, learning device, learning method, and learning program
Li et al. Dynamic objects recognizing and masking for RGB-D SLAM
Xu et al. Indoor localization using region-based convolutional neural network
Doğan et al. An augmented crowd simulation system using automatic determination of navigable areas
Rimboux et al. Smart IoT cameras for crowd analysis based on augmentation for automatic pedestrian detection, simulation and annotation
Guo et al. 3D object detection and tracking based on streaming data
CN113191462A (en) Information acquisition method, image processing method and device and electronic equipment
Zeng Deep Building Recognition: Interior Layouts and Exterior Planes
CN113642395B (en) Building scene structure extraction method for city augmented reality information labeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20170627