CN109190537A - A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning - Google Patents

A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning Download PDF

Info

Publication number
CN109190537A
CN109190537A CN201810968949.9A CN201810968949A CN109190537A CN 109190537 A CN109190537 A CN 109190537A CN 201810968949 A CN201810968949 A CN 201810968949A CN 109190537 A CN109190537 A CN 109190537A
Authority
CN
China
Prior art keywords
network
attitude estimation
personage
mask
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810968949.9A
Other languages
Chinese (zh)
Other versions
CN109190537B (en
Inventor
田彦
王勋
吴佳辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yunqi Smart Vision Technology Co Ltd
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN201810968949.9A priority Critical patent/CN109190537B/en
Publication of CN109190537A publication Critical patent/CN109190537A/en
Application granted granted Critical
Publication of CN109190537B publication Critical patent/CN109190537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning, this method constructs more personage's Attitude estimation models first, and more personage's Attitude estimation models are made of three sub- networks of deeply learning network and single Attitude estimation network of the detection network of acquisition detection block and mask, raising positioning accuracy;Then more personage's Attitude estimation models are trained using training sample;Image to be detected is inputted in trained more personage's Attitude estimation models when test, obtains personage's posture in all detection blocks of image to be detected.Mask information is introduced deeply learning network and single Attitude estimation network by the method for the present invention, improves the effect in the two stages, and quotes residual error structure and solve gradient disappearance and gradient explosion issues.The method of the present invention is more competitive compared with other advanced more personage's Attitude estimation methods.

Description

A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning
Technical field
The present invention relates to human body attitude estimation techniques, and in particular to a kind of more people based on mask perceived depth intensified learning Object Attitude estimation method.
Background technique
As a large amount of multi-media sensor is disposed, Fashion Design, clinical analysis, human-computer interaction, Activity recognition, movement The extensive use of the motion captures technology such as rehabilitation, the hot spot that human body attitude is estimated for multimedia industry concern.
Recently, single Attitude estimation is made to achieve significant progress by using the framework based on deep learning.However, more People's Attitude estimation is the posture for judging multiple personages in image, the especially individual in estimation crowd, is still an arduous times Business.The Major Difficulties of the task are as follows: firstly, the number in image is unknown, and personage is likely to occur in the picture Any position exists in any proportion.Secondly, between personage in image, there are certain type of interactions, such as block, and hand over Stream, touch etc., this to estimate more difficult.Third, as the number in image increases, computation complexity increases therewith, this So that designing efficient algorithm also becomes a challenge.Shown in Major Difficulties such as Fig. 1 (a)-(d).
It is top-down and it is bottom-up be handle human body attitude estimation main method.Top-down method utilizes one Detector and single attitude estimator carry out testing and evaluation to each detected personage.However, when distance between personage It crosses closely, will lead to single detector failure, and computation complexity increases with the increase of number in picture.Bottom-up side Method in contrast, first detects artis, and the posture of people is judged in conjunction with local environmental information, needs the overall situation due to finally parsing Information, this method cannot be guaranteed efficiency.
Due to the testing result position inaccurate of top-down and bottom-up two methods, This further reduces more people The accuracy rate of object Attitude estimation.Testing result and human body attitude estimated result relational graph such as Fig. 1 (e) are shown.In target detection side To, based on deep learning mode generate detection block and true frame friendship and than meet be greater than 0.5.However, the detection of redundancy As a result it is unfavorable for human body attitude estimation.We need to correct detection block according to original testing result.Deeply learns A kind of effective mode, it can select best means to obtain optimal value according to environmental information.
Summary of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of based on mask perceived depth intensified learning More personage's Attitude estimation methods, this method can effectively propose personage's Attitude estimation accuracy rate.
The purpose of the present invention is achieved through the following technical solutions: a kind of based on mask perceived depth intensified learning More personage's Attitude estimation methods, method includes the following steps:
(1) construct more personage's Attitude estimation models: more personage's Attitude estimation models are by acquisition detection block and mask Detect network, the deeply learning network for improving positioning accuracy and the sub- network composition of three, single Attitude estimation network;
(1.1) it detects network: obtaining the human body in the detection block and detection block of original image by multi-task learning network Binary mask;
(1.2) deeply learning network: it is different from the existing mode based on sampling calibration for calibrating positioning result, Calibrating mode is expressed as markov decision process by the present invention;Detection is updated by recursive reward or punishment learning process Frame;The purpose of this part is one optimal policy function of study, which is mapped to state S in behavior A;
In computer vision field, most of deeply study are using using characteristic pattern as the method for state vector;So And mixed and disorderly background can generate high activation value in characteristic pattern, this meeting interference calibration result was estimated to influence human body attitude Journey;In the present invention, ambient condition is defined as a tuple (h, i), wherein h comes from the history decision in decision networks Vector, i are the characteristic patterns with mask;By using the convolutional neural networks model f of pre-training1Original spy is extracted from image x Sign figure, then characteristic pattern is passed to multitask network f2, wherein extracting multitask network as a concern figure with mask Characteristic pattern;The expression formula of i is as follows:
I=f2(f1(x))⊙f1(x)
Wherein, ⊙ is Hadamard product.
When using accurate foreground mask, the redundancy in characteristic pattern will be removed such as;Characteristic pattern with mask mentions Inferior grade information such as shape, profile, the posture of high-grade information such as human body are supplied;This facilitates calibration process.
Human body binary mask and the size adjusting that network obtains be will test as the full connection with deeply learning network After the matched detection block image of layer is multiplied, the input as deeply learning network;The output of deeply learning network is The reward value of 11 kinds of detection block adjustment behaviors.
It includes four classes that detection block, which adjusts behavior: scaling behavior (including reduce and amplify), translation behavior are (including up and down The translation of four direction), termination behavior (whether terminate frame retrieval adjustment), the ratio of width to height adjustment behavior (increase and decrease of width direction and The increase and decrease of short transverse).Front window is worked as into window movement as a result, each behavior can be set to keep detector generation metastable 0.1 times of mouth size.
It selects the maximum behavior of reward value to adjust detection block, the detection block Image Iterative newly obtained is inputted into deeply Learning network, until the maximum behavior of reward value be termination behavior, output calibration after detection block.
(1.3) the single Attitude estimation network is specific as follows: the detection block image after mask and calibration is passed to one Attitude estimation network obtains single posture;
(2) more personage's Attitude estimation models are trained using training sample;Image to be detected is inputted trained In more personage's Attitude estimation models, personage's posture in all detection blocks of image to be detected is obtained.
Further, the detection network uses two-stage (two-stage) processing mode: in the first stage, using depth Layer residual error network extracts the characteristic pattern of original image and generates several candidate frames by RPN network;In second stage, candidate frame is passed Enter three branches and carry out multi-task learnings, respectively obtains that classification confidence, detection block offset, human body binary system is covered in detection block Code.
Further, the detection network uses following associated losses function in each branch of second stage:
L=Lcls1Lbox2Lmask
Wherein, LclsFor Classification Loss, indicated using cross entropy;LboxFor positioning loss, detection block is measured using L1 norm With the difference between true frame;LmaskFor segmentation loss, indicated using average two-value cross entropy;α1With α2For three kinds of losses of balance Proportionality coefficient.
Further, the deeply learning network includes sequentially connected 8 × 8 convolutional layer, 4 × 4 convolutional layers, 3 × 3 Convolutional layer, for the output tool of 3 × 3 convolutional layer there are two branch, a branch obtains the excellent of 11 dimensions by the 11 full articulamentums of dimension Potential function A (a, s;θ, a), another branch obtain state value function V (s by the 1 full articulamentum of dimension;θ, β), wherein θ is shared Convolution layer parameter, α, β are the respective full connection layer parameters of Liang Ge branch, and a is that detection block adjusts behavior, and s is deeply study The input of network;Advantage function is added with state value function and obtains Q function, the reward value of each behavior is calculated by Q function. Q (s, a;θ, α, β)=V (s;θ, β)+A (a, s;θ, α).
Further, in the deeply learning network, the reward value r's of current iteration is expressed as follows:
R (s, a)=(IoU (w 'i, gi)-IoU(wi, gi))+λb′/B′
Wherein, first item IoU (w 'i, gi)-IoU(wi, gi) it is traditional reward item, Section 2 λ b '/B ' is to constrain Detection block size and increased regular terms, wiWith w 'iRespectively indicate detection block of target i before and after behavior a, giIt indicates True frame, the cross sectional area of detection block and true frame of the b ' expression after behavior a, inspection of the B ' expression after behavior a Frame area is surveyed, cross sectional area is sought in IoU expression, and λ be that control rewards the scale factor of item and regular terms (λ occurrence is by testing It is determined when parameter adjusts, generally takes 1~10).
Termination behavior is an additional behavior, it will not move detection block, only judges that optimal result is in intensified learning No to be found, its reward value is defined as follows:
Wherein, τ is friendship and the threshold value decision reward of ratio is positive and negative, and η is corresponding reward value.
According to Q function housing choice behavior a, Q, (s, a) function representation is currently cumulative with the following reward value.
A=argmaxaQ (s, a)
Q (s, a)=r+ γ maxa' Q (s ', a ')
The loss function loss (θ) of training Q function is expressed as follows:
Loss (θ)=E (r+ γ maxa' Q (s ', a ', θ)-Q (s, a, θ))
Wherein, θ is the parameter of deeply learning network, and s and a are the defeated of current iteration deeply learning network respectively Enter and adjust behavior with detection block, s ' and a ' are input and the detection block adjustment row of next iteration deeply learning network respectively For, Q (s, a, θ) be all reward values that current iteration starts and, Q (s ', a ', θ) be since next iteration with lottery It encourages value and r is the reward value of current iteration, and γ is discount factor, and E is to take expectation to the penalty values under all iteration.
Further, in the deeply learning network, in order to improve the learning efficiency of parameter θ, in the following ways:
(a) in order to promote study stability, target network is introduced, it separates with online network, updates in each iteration Online network, updates target network at intervals;
(b) in order to avoid falling into local minimum, using ε-greedy strategy as action strategy;
(c) to solve the problems, such as data dependence, use experience plays back (experience replay), (s, a, r, s ') quilt It stores in the buffer, in training, from the sample of fixed quantity is randomly choosed in caching to reduce the correlation between data.
Further, the deeply learning network, using dueling DQN structure, the structure is in the Decision Evaluation phase Between can quickly identify correct behavior.As the Q network of mark, dueling structured training only needs backpropagation, does not need Supervised learning or algorithm modification can estimate V (s automatically;θ, β) and A (a, s;θ, α).
Further, the single Attitude estimation network will test human body binary mask and cascade gold that network obtains Word tower network (CPN) combines, and carries out human body attitude estimation, and the loss function of single Attitude estimation network is as follows:
L=Linf+kLmask
Wherein LinfFor the single posture of prediction and the error term of true posture, LmaskFor indicate prediction single posture with The regular terms of human body binary mask error, k be balance two scale factor (k is obtained according to practical experience, generally take 1~ 5);Lmask=∑pLp, p is human joint points number, wherein
Wherein,Indicate that for p node in the predicted value of the position l, l is the maximum position of activation value in activation figure in activation figure; mlIt is the human body binary mask on the position l, 1 indicates to indicate in human region, 0 in background area;If node is not in human body As a result region pays for, otherwise loss function is unaffected.
Further, more personage's Attitude estimation model training stages are calculated using GPU.
Preferably, the training details of detection network are as follows: α in loss function1With α24.0 and 10.0 are taken respectively.Whole network Use momentum for 0.9 stochastic gradient descent algorithm, weight decays to 0.0005.Preceding 60,000 iteration, learning rate 0.01, rear 2 Ten thousand iteration, learning rate 0.001.In every batch of data, take 48 positive samples from 4 trained pictures, 48 negative samples This is from mixed and disorderly background.In Qualify Phase, threshold confidence is set as 0.7, friendship used for positioning and ratio is set as 0.6.
In the calibration process learnt based on deeply, 10,000 data are cached as one, the number of batch of data It is 32 according to amount.λ in loss function takes 1~10.In the experimental stage, ε-greedy strategy is used.In training, training 5,000 time Afterwards, ε drops to 0.05 from 0.3.Discount factor γ is 0.9.
In the single Attitude estimation stage, the k in loss function is set as 0.4, and model uses stochastic gradient descent algorithm, initially Learning rate is 0.0005, every to have traversed 10 data set learning rates reduction half.Weight attenuation rate is 0.00005, and is used Batch normalization (Batch Normalization).
Compared with the prior art, the device have the advantages that are as follows:
1. more personage's Attitude estimation methods proposed by the present invention based on mask perceived depth intensified learning increase detection Accuracy rate.
2. mask information is used to eliminate negative effect brought by mixed and disorderly background information, and most according to reward function selection Good behavior.
3. increasing regularization term in human body attitude estimation stages to punish the node outside human body contour outline.
4. more personage's Attitude estimation models are tested on MPII test set, Average Accuracy (mean Average Precision, mAP) than prior art model 1.1 are improved, it is tested on MS-COCO test development data set, it is average accurate Degree has reached 73.0.
Detailed description of the invention
Fig. 1 is more personage's Attitude estimation difficult point schematic diagrames provided in an embodiment of the present invention;
Fig. 2 is that more personage's Attitude estimation frames provided in an embodiment of the present invention based on mask perceived depth intensified learning show It is intended to;
Fig. 3 is the activation figure in detection-phase detection block provided in an embodiment of the present invention;
Fig. 4 is the schematic diagram under different behaviors;
Fig. 5 is the schematic diagram of depth Q network provided in an embodiment of the present invention;
Fig. 6 is the Attitude estimation block schematic illustration of mask perception provided in an embodiment of the present invention;
Fig. 7 is the accuracy rate curve of state provided in an embodiment of the present invention and reward function;
Fig. 8 is the testing result in MPII data set provided in an embodiment of the present invention.
Specific embodiment
In order to more specifically describe the present invention, with reference to the accompanying drawing and specific embodiment is to technical solution of the present invention It is described in detail.
More personage's Attitude estimation methods provided in this embodiment can obtain the personage position of on-fixed quantity in piece image It sets and posture information, and can be applied to clinical analysis, human-computer interaction, the multimedia industries such as Activity recognition.
Multi-task learning network is based on using present embodiment and obtains detection and localization frame and mask, is learnt using deeply Network calibration positioning finally carries out human body attitude estimation using personage of the single Attitude estimation network to detection block.Below with reference to Attached drawing is illustrated this specific embodiment of the invention.
Fig. 1 is more personage's Attitude estimation difficult point schematic diagrames provided in an embodiment of the present invention, and (a) shows the number of personage in picture Amount and position are unknown.(b) (c) (d), which is respectively indicated, blocks, and exchanges, and contact embodies the interaction between personage, (e) embodies The relationship of detection block detection and human body attitude estimation.
Fig. 2 is that more personage's Attitude estimation frames provided in an embodiment of the present invention based on mask perceived depth intensified learning show It is intended to, the acquisition detection block and personage's mask of multitask Network Synchronization calibrate positioning result using deeply learning network.Most Eventually, the posture of the network-evaluated each personage of hourglass is utilized.Mask information is utilized in calibration and estimation stages.
Fig. 3 is the activation figure in detection-phase detection block provided in an embodiment of the present invention, and original image (a) is passed through convolution Neural network obtains activation figure (b), it will be seen that mixed and disorderly background information equally produces high activation value in figure (b). Fig. 3 (c) indicates that redundancy when using accurate foreground mask, in characteristic pattern will be removed.
Fig. 4 is the schematic diagram under different behaviors, respectively indicates scaling behavior, translation behavior, termination behavior, the ratio of width to height adjustment 4 class behavior of behavior.
Fig. 5 is the schematic diagram of depth Q network provided in an embodiment of the present invention, including sequentially connected 8 × 8 convolutional layer, 4 × 4 convolutional layers, 3 × 3 convolutional layers, there are two branch, a branch is obtained the output tool of 3 × 3 convolutional layers by the 11 full articulamentums of dimension Advantage function A (a, the s of 11 dimensions;θ, α), another branch obtains state value function V (s by the 1 full articulamentum of dimension;θ, β), wherein θ is shared convolutional layer parameter, and α, β are the respective full connection layer parameters of Liang Ge branch, and a is that detection block adjusts behavior, and s is that depth is strong Change the input of learning network;Advantage function is added with state value function and obtains Q function, each behavior is calculated by Q function Reward value.
Fig. 6 is the Attitude estimation block schematic illustration of combination mask provided in an embodiment of the present invention, the single Attitude estimation Network will test human body binary mask and cascade pyramid network (CPN) combination that network obtains, carry out human body attitude and estimate Meter, the loss function of single Attitude estimation network are as follows:
L=Linf+kLmask
Wherein LinfFor the single posture of prediction and the error term of true posture, LmaskFor indicate prediction single posture with The regular terms of human body binary mask error, k be balance two scale factor (k is obtained according to practical experience, generally take 1~ 5);Lmask=∑pLp, p is human joint points number, wherein
Wherein,Indicate that for p node in the predicted value of the position l, l is the maximum position of activation value in activation figure in activation figure; mlIt is the human body binary mask on the position l, 1 indicates to indicate in human region, 0 in background area;If node is not in human body As a result region pays for, otherwise loss function is unaffected.
Fig. 7 is the accuracy rate curve of state provided in an embodiment of the present invention and reward function, and (a) is that the training of state is accurate Rate curve is (b) the test accuracy rate curve of state, (c) is the training accuracy rate curve of reward function, (d) is reward function Test accuracy rate curve.
More personage's Attitude estimations are carried out to image using the present embodiment, experimental result such as Fig. 8 institute on MPII data set Show, (a) indicates prediction successfully as a result, (b) indicating the result of prediction of failure.From the result of prediction of failure, we can be total (1) is born although detection method is improved, the method downward from item is still promised to undertake (early by early stage Commitment influence).(2) our method appears in the feelings in estimation range and with less interaction suitable for personage Condition.
Technical solution of the present invention and beneficial effect is described in detail in above-described specific embodiment, Ying Li Solution is not intended to restrict the invention the foregoing is merely presently most preferred embodiment of the invention, all in principle model of the invention Interior done any modification, supplementary, and equivalent replacement etc. are enclosed, should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning, which is characterized in that this method includes Following steps:
(1) construct more personage's Attitude estimation models: more personage's Attitude estimation models by acquisition detection block and mask detection Network, the deeply learning network for improving positioning accuracy and the sub- network composition of three, single Attitude estimation network;
The detection network is specific as follows: the people in the detection block and detection block of original image is obtained by multi-task learning network Body binary mask;
The deeply learning network is specific as follows: will test human body binary mask that network obtains and size adjusting be with After the matched detection block image of the full articulamentum of deeply learning network is multiplied, the input as deeply learning network; The output of deeply learning network is the reward value that 11 kinds of detection blocks adjust behavior;The detection block adjustment behavior includes four Class: scaling behavior, translation behavior, termination behavior, the ratio of width to height adjust behavior;The selection maximum behavior of reward value detects to adjust The detection block Image Iterative newly obtained is inputted deeply learning network by frame, until the maximum behavior of reward value is termination row To export the detection block after calibrating;
The single Attitude estimation network is specific as follows: the detection block image after mask and calibration is passed to single Attitude estimation net Network obtains single posture;
(2) more personage's Attitude estimation models are trained using training sample;Image to be detected is inputted into trained more people In object Attitude estimation model, personage's posture in all detection blocks of image to be detected is obtained.
2. as described in claim 1 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists In the detection network uses two-stage processing mode: in the first stage, the spy of original image is extracted using deep layer residual error network Sign figure simultaneously generates several candidate frames by RPN network;In second stage, candidate frame is passed to three branches and carries out multi-task learning, Respectively obtain classification confidence, detection block offset, human body binary mask in detection block.
3. as claimed in claim 2 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists In the detection network uses following associated losses function in each branch of second stage:
L=Lcls1Lbox2Lmask
Wherein, LclsFor Classification Loss, indicated using cross entropy;LboxIt is lost for positioning, using L1 norm measurement detection block and very Difference between real frame;LmaskFor segmentation loss, indicated using average two-value cross entropy;α1With α2For the ratio of three kinds of losses of balance Example coefficient.
4. as described in claim 1 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists In, the deeply learning network include sequentially connected 8 × 8 convolutional layer, 4 × 4 convolutional layers, 3 × 3 convolutional layers, described 3 × There are two branch, a branches, and advantage function A (a, the s of 11 dimensions are obtained by the 11 full articulamentums of dimension for the output tool of 3 convolutional layers;θ, α), another branch obtains state value function V (s by the 1 full articulamentum of dimension;θ, β), wherein θ is shared convolutional layer parameter, α, β It is the respective full connection layer parameter of Liang Ge branch, a is that detection block adjusts behavior, and s is the input of deeply learning network;It will be excellent Potential function is added with state value function obtains Q function, and the reward value of each behavior is calculated by Q function.
5. as claimed in claim 4 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists In in the deeply learning network, the loss function loss (θ) of Q function is expressed as follows:
Loss (θ)=E (r+ γ maxa′Q (s ', a ', θ)-Q (s, a, θ))
Wherein, θ is the parameter of deeply learning network, s and a be respectively current iteration deeply learning network input and Detection block adjusts behavior, and s ' and a ' are the input and detection block adjustment behavior of next iteration deeply learning network, Q respectively (s, a, θ) be all reward values for starting of current iteration and, Q (s ', a ', θ) is all reward values since next iteration It is the reward value of current iteration with, r, γ is discount factor, and E is to take expectation to the penalty values under all iteration.
6. as claimed in claim 5 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists In in the deeply learning network, the reward value r's of current iteration is expressed as follows:
R (s, a)=(IoU (w 'i, gi)-IoU(wi, gi))+λb′/B′
Wherein, first item IoU (w 'i, gi)-IoU(wi, gi) it is traditional reward item, Section 2 λ b '/B ' is to constrain detection Frame size and increased regular terms, wiWith w 'iRespectively indicate detection block of target i before and after behavior a, giIndicate true Frame, the cross sectional area of detection block and true frame of the b ' expression after behavior a, detection block of the B ' expression after behavior a Cross sectional area is sought in area, IoU expression, and λ is the scale factor of control reward item and regular terms.
7. as claimed in claim 5 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists In in the deeply learning network, in order to improve the learning efficiency of parameter θ, in the following ways:
(a) in order to promote study stability, target network is introduced, it is separated with online network, is updated in each iteration online Network updates target network at intervals;
(b) in order to avoid falling into local minimum, using ε-greedy strategy as action strategy;
(c) to solve the problems, such as data dependence, use experience playback, (s, a, r, s ') is stored in the buffer, in training, from The sample of fixed quantity is randomly choosed in caching to reduce the correlation between data.
8. as described in claim 1 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists In the deeply learning network, using dueling DQN structure, which can quickly identify just during Decision Evaluation True behavior.
9. as described in claim 1 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists In, the single Attitude estimation network will test the human body binary mask and cascade pyramid network integration that network obtains, into The loss function of pedestrian's body Attitude estimation, single Attitude estimation network is as follows:
L=Linf+kLmask
Wherein LinfFor the single posture of prediction and the error term of true posture, LmaskFor the single posture and human body two for indicating prediction The regular terms of system mask error, k are the scale factors for balancing two;Lmask=∑pLp, p is human joint points number;
Wherein,Indicate that for p node in the predicted value of the position l, l is the maximum position of activation value in activation figure in activation figure;mlIt is l Human body binary mask on position, 1 indicates to indicate in human region, 0 in background area;If node not in human region, As a result it pays for, otherwise loss function is unaffected.
10. as described in claim 1 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists In more personage's Attitude estimation model training stages are calculated using GPU.
CN201810968949.9A 2018-08-23 2018-08-23 Mask perception depth reinforcement learning-based multi-person attitude estimation method Active CN109190537B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810968949.9A CN109190537B (en) 2018-08-23 2018-08-23 Mask perception depth reinforcement learning-based multi-person attitude estimation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810968949.9A CN109190537B (en) 2018-08-23 2018-08-23 Mask perception depth reinforcement learning-based multi-person attitude estimation method

Publications (2)

Publication Number Publication Date
CN109190537A true CN109190537A (en) 2019-01-11
CN109190537B CN109190537B (en) 2020-09-29

Family

ID=64919381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810968949.9A Active CN109190537B (en) 2018-08-23 2018-08-23 Mask perception depth reinforcement learning-based multi-person attitude estimation method

Country Status (1)

Country Link
CN (1) CN109190537B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766887A (en) * 2019-01-16 2019-05-17 中国科学院光电技术研究所 A kind of multi-target detection method based on cascade hourglass neural network
CN109784296A (en) * 2019-01-27 2019-05-21 武汉星巡智能科技有限公司 Bus occupant quantity statistics method, device and computer readable storage medium
CN110008915A (en) * 2019-04-11 2019-07-12 电子科技大学 The system and method for dense human body attitude estimation is carried out based on mask-RCNN
CN110188219A (en) * 2019-05-16 2019-08-30 复旦大学 Deeply de-redundancy hash algorithm towards image retrieval
CN110197163A (en) * 2019-06-04 2019-09-03 中国矿业大学 A kind of target tracking sample extending method based on pedestrian's search
CN110210402A (en) * 2019-06-03 2019-09-06 北京卡路里信息技术有限公司 Feature extracting method, device, terminal device and storage medium
CN110222636A (en) * 2019-05-31 2019-09-10 中国民航大学 The pedestrian's attribute recognition approach inhibited based on background
CN110415332A (en) * 2019-06-21 2019-11-05 上海工程技术大学 Complex textile surface three dimensional reconstruction system and method under a kind of non-single visual angle
CN110569719A (en) * 2019-07-30 2019-12-13 中国科学技术大学 animal head posture estimation method and system
CN110866872A (en) * 2019-10-10 2020-03-06 北京邮电大学 Pavement crack image preprocessing intelligent selection method and device and electronic equipment
CN111415389A (en) * 2020-03-18 2020-07-14 清华大学 Label-free six-dimensional object posture prediction method and device based on reinforcement learning
CN111695457A (en) * 2020-05-28 2020-09-22 浙江工商大学 Human body posture estimation method based on weak supervision mechanism
CN111738091A (en) * 2020-05-27 2020-10-02 复旦大学 Posture estimation and human body analysis system based on multi-task deep learning
CN112052886A (en) * 2020-08-21 2020-12-08 暨南大学 Human body action attitude intelligent estimation method and device based on convolutional neural network
CN112184802A (en) * 2019-07-05 2021-01-05 杭州海康威视数字技术股份有限公司 Calibration frame adjusting method and device and storage medium
CN112241976A (en) * 2019-07-19 2021-01-19 杭州海康威视数字技术股份有限公司 Method and device for training model
CN113012229A (en) * 2021-03-26 2021-06-22 北京华捷艾米科技有限公司 Method and device for positioning human body joint points
CN113361570A (en) * 2021-05-25 2021-09-07 东南大学 3D human body posture estimation method based on joint data enhancement and network training model
CN113436633A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Speaker recognition method, speaker recognition device, computer equipment and storage medium
CN113537070A (en) * 2021-07-19 2021-10-22 中国第一汽车股份有限公司 Detection method, detection device, electronic equipment and storage medium
CN114143710A (en) * 2021-11-22 2022-03-04 武汉大学 Wireless positioning method and system based on reinforcement learning
CN116721471A (en) * 2023-08-10 2023-09-08 中国科学院合肥物质科学研究院 Multi-person three-dimensional attitude estimation method based on multi-view angles

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150544A (en) * 2011-08-30 2013-06-12 精工爱普生株式会社 Method and apparatus for object pose estimation
CN106780536A (en) * 2017-01-13 2017-05-31 深圳市唯特视科技有限公司 A kind of shape based on object mask network perceives example dividing method
CN106780569A (en) * 2016-11-18 2017-05-31 深圳市唯特视科技有限公司 A kind of human body attitude estimates behavior analysis method
CN106897697A (en) * 2017-02-24 2017-06-27 深圳市唯特视科技有限公司 A kind of personage and pose detection method based on visualization compiler
CN106951512A (en) * 2017-03-17 2017-07-14 深圳市唯特视科技有限公司 A kind of end-to-end session control method based on hybrid coding network
CN107392118A (en) * 2017-07-04 2017-11-24 竹间智能科技(上海)有限公司 The recognition methods of reinforcing face character and the system of generation network are resisted based on multitask
US20180096478A1 (en) * 2016-09-30 2018-04-05 Siemens Healthcare Gmbh Atlas-based contouring of organs at risk for radiation therapy
CN107944443A (en) * 2017-11-16 2018-04-20 深圳市唯特视科技有限公司 One kind carries out object consistency detection method based on end-to-end deep learning
US20180151090A1 (en) * 2015-04-22 2018-05-31 Jeffrey B. Matthews Visual and kinesthetic method and educational kit for solving algebraic linear equations involving an unknown variable
CN108256489A (en) * 2018-01-24 2018-07-06 清华大学 Behavior prediction method and device based on deeply study
CN108304795A (en) * 2018-01-29 2018-07-20 清华大学 Human skeleton Activity recognition method and device based on deeply study

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150544A (en) * 2011-08-30 2013-06-12 精工爱普生株式会社 Method and apparatus for object pose estimation
US20180151090A1 (en) * 2015-04-22 2018-05-31 Jeffrey B. Matthews Visual and kinesthetic method and educational kit for solving algebraic linear equations involving an unknown variable
US20180096478A1 (en) * 2016-09-30 2018-04-05 Siemens Healthcare Gmbh Atlas-based contouring of organs at risk for radiation therapy
CN106780569A (en) * 2016-11-18 2017-05-31 深圳市唯特视科技有限公司 A kind of human body attitude estimates behavior analysis method
CN106780536A (en) * 2017-01-13 2017-05-31 深圳市唯特视科技有限公司 A kind of shape based on object mask network perceives example dividing method
CN106897697A (en) * 2017-02-24 2017-06-27 深圳市唯特视科技有限公司 A kind of personage and pose detection method based on visualization compiler
CN106951512A (en) * 2017-03-17 2017-07-14 深圳市唯特视科技有限公司 A kind of end-to-end session control method based on hybrid coding network
CN107392118A (en) * 2017-07-04 2017-11-24 竹间智能科技(上海)有限公司 The recognition methods of reinforcing face character and the system of generation network are resisted based on multitask
CN107944443A (en) * 2017-11-16 2018-04-20 深圳市唯特视科技有限公司 One kind carries out object consistency detection method based on end-to-end deep learning
CN108256489A (en) * 2018-01-24 2018-07-06 清华大学 Behavior prediction method and device based on deeply study
CN108304795A (en) * 2018-01-29 2018-07-20 清华大学 Human skeleton Activity recognition method and device based on deeply study

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KOIZUMI, YUMA 等: "DNN-BASED SOURCE ENHANCEMENT SELF-OPTIMIZED BY REINFORCEMENT LEARNING USING SOUND QUALITY MEASUREMENTS", 《2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
YAN TIAN 等: "Canonical Locality Preserving Latent Variable Model for Discriminative Pose Inference", 《IMAGE AND VISION COMPUTING》 *
卢湖川 等: "目标跟踪算法综述", 《模式识别与人工智能》 *
苏延超 等: "图像和视频中基于部件检测器的人体姿态估计", 《电子与信息学报》 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766887B (en) * 2019-01-16 2022-11-11 中国科学院光电技术研究所 Multi-target detection method based on cascaded hourglass neural network
CN109766887A (en) * 2019-01-16 2019-05-17 中国科学院光电技术研究所 A kind of multi-target detection method based on cascade hourglass neural network
CN109784296A (en) * 2019-01-27 2019-05-21 武汉星巡智能科技有限公司 Bus occupant quantity statistics method, device and computer readable storage medium
CN110008915A (en) * 2019-04-11 2019-07-12 电子科技大学 The system and method for dense human body attitude estimation is carried out based on mask-RCNN
CN110008915B (en) * 2019-04-11 2023-02-03 电子科技大学 System and method for estimating dense human body posture based on mask-RCNN
CN110188219A (en) * 2019-05-16 2019-08-30 复旦大学 Deeply de-redundancy hash algorithm towards image retrieval
CN110188219B (en) * 2019-05-16 2023-01-06 复旦大学 Depth-enhanced redundancy-removing hash method for image retrieval
CN110222636A (en) * 2019-05-31 2019-09-10 中国民航大学 The pedestrian's attribute recognition approach inhibited based on background
CN110210402A (en) * 2019-06-03 2019-09-06 北京卡路里信息技术有限公司 Feature extracting method, device, terminal device and storage medium
CN110197163A (en) * 2019-06-04 2019-09-03 中国矿业大学 A kind of target tracking sample extending method based on pedestrian's search
CN110197163B (en) * 2019-06-04 2021-02-12 中国矿业大学 Target tracking sample expansion method based on pedestrian search
CN110415332A (en) * 2019-06-21 2019-11-05 上海工程技术大学 Complex textile surface three dimensional reconstruction system and method under a kind of non-single visual angle
CN112184802A (en) * 2019-07-05 2021-01-05 杭州海康威视数字技术股份有限公司 Calibration frame adjusting method and device and storage medium
CN112184802B (en) * 2019-07-05 2023-10-20 杭州海康威视数字技术股份有限公司 Calibration frame adjusting method, device and storage medium
CN112241976A (en) * 2019-07-19 2021-01-19 杭州海康威视数字技术股份有限公司 Method and device for training model
CN110569719B (en) * 2019-07-30 2022-05-17 中国科学技术大学 Animal head posture estimation method and system
CN110569719A (en) * 2019-07-30 2019-12-13 中国科学技术大学 animal head posture estimation method and system
CN110866872A (en) * 2019-10-10 2020-03-06 北京邮电大学 Pavement crack image preprocessing intelligent selection method and device and electronic equipment
CN110866872B (en) * 2019-10-10 2022-07-29 北京邮电大学 Pavement crack image preprocessing intelligent selection method and device and electronic equipment
CN111415389A (en) * 2020-03-18 2020-07-14 清华大学 Label-free six-dimensional object posture prediction method and device based on reinforcement learning
WO2021184530A1 (en) * 2020-03-18 2021-09-23 清华大学 Reinforcement learning-based label-free six-dimensional item attitude prediction method and device
CN111415389B (en) * 2020-03-18 2023-08-29 清华大学 Label-free six-dimensional object posture prediction method and device based on reinforcement learning
CN111738091A (en) * 2020-05-27 2020-10-02 复旦大学 Posture estimation and human body analysis system based on multi-task deep learning
CN111695457A (en) * 2020-05-28 2020-09-22 浙江工商大学 Human body posture estimation method based on weak supervision mechanism
CN111695457B (en) * 2020-05-28 2023-05-09 浙江工商大学 Human body posture estimation method based on weak supervision mechanism
CN112052886B (en) * 2020-08-21 2022-06-03 暨南大学 Intelligent human body action posture estimation method and device based on convolutional neural network
CN112052886A (en) * 2020-08-21 2020-12-08 暨南大学 Human body action attitude intelligent estimation method and device based on convolutional neural network
CN113012229A (en) * 2021-03-26 2021-06-22 北京华捷艾米科技有限公司 Method and device for positioning human body joint points
CN113361570A (en) * 2021-05-25 2021-09-07 东南大学 3D human body posture estimation method based on joint data enhancement and network training model
CN113361570B (en) * 2021-05-25 2022-11-01 东南大学 3D human body posture estimation method based on joint data enhancement and network training model
CN113436633A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Speaker recognition method, speaker recognition device, computer equipment and storage medium
CN113436633B (en) * 2021-06-30 2024-03-12 平安科技(深圳)有限公司 Speaker recognition method, speaker recognition device, computer equipment and storage medium
CN113537070A (en) * 2021-07-19 2021-10-22 中国第一汽车股份有限公司 Detection method, detection device, electronic equipment and storage medium
CN114143710B (en) * 2021-11-22 2022-10-04 武汉大学 Wireless positioning method and system based on reinforcement learning
CN114143710A (en) * 2021-11-22 2022-03-04 武汉大学 Wireless positioning method and system based on reinforcement learning
CN116721471A (en) * 2023-08-10 2023-09-08 中国科学院合肥物质科学研究院 Multi-person three-dimensional attitude estimation method based on multi-view angles

Also Published As

Publication number Publication date
CN109190537B (en) 2020-09-29

Similar Documents

Publication Publication Date Title
CN109190537A (en) A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning
CN110135459B (en) Zero sample classification method based on double-triple depth measurement learning network
CN108596327B (en) Seismic velocity spectrum artificial intelligence picking method based on deep learning
CN110033473B (en) Moving target tracking method based on template matching and depth classification network
CN109101938B (en) Multi-label age estimation method based on convolutional neural network
CN109101865A (en) A kind of recognition methods again of the pedestrian based on deep learning
CN106951825A (en) A kind of quality of human face image assessment system and implementation method
CN108961308B (en) Residual error depth characteristic target tracking method for drift detection
CN109151995B (en) Deep learning regression fusion positioning method based on signal intensity
CN106023257A (en) Target tracking method based on rotor UAV platform
CN110716792B (en) Target detector and construction method and application thereof
CN111259735B (en) Single-person attitude estimation method based on multi-stage prediction feature enhanced convolutional neural network
CN108492298A (en) Based on the multispectral image change detecting method for generating confrontation network
CN109598220A (en) A kind of demographic method based on the polynary multiple dimensioned convolution of input
CN110210380B (en) Analysis method for generating character based on expression recognition and psychological test
CN108717548B (en) Behavior recognition model updating method and system for dynamic increase of sensors
CN112633257A (en) Potato disease identification method based on improved convolutional neural network
CN107492114A (en) The heavy detecting method used when monocular is long during the tracking failure of visual tracking method
CN111144462B (en) Unknown individual identification method and device for radar signals
CN108182410A (en) A kind of joint objective zone location and the tumble recognizer of depth characteristic study
CN115346272A (en) Real-time tumble detection method based on depth image sequence
CN110516700B (en) Fine-grained image classification method based on metric learning
CN109583456B (en) Infrared surface target detection method based on feature fusion and dense connection
CN114154530A (en) Training method and device for atrial fibrillation detection model of electrocardio timing signals
CN115511012B (en) Class soft label identification training method with maximum entropy constraint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210929

Address after: 310000 Room 401, building 2, No.16, Zhuantang science and technology economic block, Xihu District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou yunqi smart Vision Technology Co., Ltd

Address before: 310018, No. 18 Jiao Tong Street, Xiasha Higher Education Park, Hangzhou, Zhejiang

Patentee before: ZHEJIANG GONGSHANG University