CN109190537A - A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning - Google Patents
A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning Download PDFInfo
- Publication number
- CN109190537A CN109190537A CN201810968949.9A CN201810968949A CN109190537A CN 109190537 A CN109190537 A CN 109190537A CN 201810968949 A CN201810968949 A CN 201810968949A CN 109190537 A CN109190537 A CN 109190537A
- Authority
- CN
- China
- Prior art keywords
- network
- attitude estimation
- personage
- mask
- behavior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning, this method constructs more personage's Attitude estimation models first, and more personage's Attitude estimation models are made of three sub- networks of deeply learning network and single Attitude estimation network of the detection network of acquisition detection block and mask, raising positioning accuracy;Then more personage's Attitude estimation models are trained using training sample;Image to be detected is inputted in trained more personage's Attitude estimation models when test, obtains personage's posture in all detection blocks of image to be detected.Mask information is introduced deeply learning network and single Attitude estimation network by the method for the present invention, improves the effect in the two stages, and quotes residual error structure and solve gradient disappearance and gradient explosion issues.The method of the present invention is more competitive compared with other advanced more personage's Attitude estimation methods.
Description
Technical field
The present invention relates to human body attitude estimation techniques, and in particular to a kind of more people based on mask perceived depth intensified learning
Object Attitude estimation method.
Background technique
As a large amount of multi-media sensor is disposed, Fashion Design, clinical analysis, human-computer interaction, Activity recognition, movement
The extensive use of the motion captures technology such as rehabilitation, the hot spot that human body attitude is estimated for multimedia industry concern.
Recently, single Attitude estimation is made to achieve significant progress by using the framework based on deep learning.However, more
People's Attitude estimation is the posture for judging multiple personages in image, the especially individual in estimation crowd, is still an arduous times
Business.The Major Difficulties of the task are as follows: firstly, the number in image is unknown, and personage is likely to occur in the picture
Any position exists in any proportion.Secondly, between personage in image, there are certain type of interactions, such as block, and hand over
Stream, touch etc., this to estimate more difficult.Third, as the number in image increases, computation complexity increases therewith, this
So that designing efficient algorithm also becomes a challenge.Shown in Major Difficulties such as Fig. 1 (a)-(d).
It is top-down and it is bottom-up be handle human body attitude estimation main method.Top-down method utilizes one
Detector and single attitude estimator carry out testing and evaluation to each detected personage.However, when distance between personage
It crosses closely, will lead to single detector failure, and computation complexity increases with the increase of number in picture.Bottom-up side
Method in contrast, first detects artis, and the posture of people is judged in conjunction with local environmental information, needs the overall situation due to finally parsing
Information, this method cannot be guaranteed efficiency.
Due to the testing result position inaccurate of top-down and bottom-up two methods, This further reduces more people
The accuracy rate of object Attitude estimation.Testing result and human body attitude estimated result relational graph such as Fig. 1 (e) are shown.In target detection side
To, based on deep learning mode generate detection block and true frame friendship and than meet be greater than 0.5.However, the detection of redundancy
As a result it is unfavorable for human body attitude estimation.We need to correct detection block according to original testing result.Deeply learns
A kind of effective mode, it can select best means to obtain optimal value according to environmental information.
Summary of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of based on mask perceived depth intensified learning
More personage's Attitude estimation methods, this method can effectively propose personage's Attitude estimation accuracy rate.
The purpose of the present invention is achieved through the following technical solutions: a kind of based on mask perceived depth intensified learning
More personage's Attitude estimation methods, method includes the following steps:
(1) construct more personage's Attitude estimation models: more personage's Attitude estimation models are by acquisition detection block and mask
Detect network, the deeply learning network for improving positioning accuracy and the sub- network composition of three, single Attitude estimation network;
(1.1) it detects network: obtaining the human body in the detection block and detection block of original image by multi-task learning network
Binary mask;
(1.2) deeply learning network: it is different from the existing mode based on sampling calibration for calibrating positioning result,
Calibrating mode is expressed as markov decision process by the present invention;Detection is updated by recursive reward or punishment learning process
Frame;The purpose of this part is one optimal policy function of study, which is mapped to state S in behavior A;
In computer vision field, most of deeply study are using using characteristic pattern as the method for state vector;So
And mixed and disorderly background can generate high activation value in characteristic pattern, this meeting interference calibration result was estimated to influence human body attitude
Journey;In the present invention, ambient condition is defined as a tuple (h, i), wherein h comes from the history decision in decision networks
Vector, i are the characteristic patterns with mask;By using the convolutional neural networks model f of pre-training1Original spy is extracted from image x
Sign figure, then characteristic pattern is passed to multitask network f2, wherein extracting multitask network as a concern figure with mask
Characteristic pattern;The expression formula of i is as follows:
I=f2(f1(x))⊙f1(x)
Wherein, ⊙ is Hadamard product.
When using accurate foreground mask, the redundancy in characteristic pattern will be removed such as;Characteristic pattern with mask mentions
Inferior grade information such as shape, profile, the posture of high-grade information such as human body are supplied;This facilitates calibration process.
Human body binary mask and the size adjusting that network obtains be will test as the full connection with deeply learning network
After the matched detection block image of layer is multiplied, the input as deeply learning network;The output of deeply learning network is
The reward value of 11 kinds of detection block adjustment behaviors.
It includes four classes that detection block, which adjusts behavior: scaling behavior (including reduce and amplify), translation behavior are (including up and down
The translation of four direction), termination behavior (whether terminate frame retrieval adjustment), the ratio of width to height adjustment behavior (increase and decrease of width direction and
The increase and decrease of short transverse).Front window is worked as into window movement as a result, each behavior can be set to keep detector generation metastable
0.1 times of mouth size.
It selects the maximum behavior of reward value to adjust detection block, the detection block Image Iterative newly obtained is inputted into deeply
Learning network, until the maximum behavior of reward value be termination behavior, output calibration after detection block.
(1.3) the single Attitude estimation network is specific as follows: the detection block image after mask and calibration is passed to one
Attitude estimation network obtains single posture;
(2) more personage's Attitude estimation models are trained using training sample;Image to be detected is inputted trained
In more personage's Attitude estimation models, personage's posture in all detection blocks of image to be detected is obtained.
Further, the detection network uses two-stage (two-stage) processing mode: in the first stage, using depth
Layer residual error network extracts the characteristic pattern of original image and generates several candidate frames by RPN network;In second stage, candidate frame is passed
Enter three branches and carry out multi-task learnings, respectively obtains that classification confidence, detection block offset, human body binary system is covered in detection block
Code.
Further, the detection network uses following associated losses function in each branch of second stage:
L=Lcls+α1Lbox+α2Lmask
Wherein, LclsFor Classification Loss, indicated using cross entropy;LboxFor positioning loss, detection block is measured using L1 norm
With the difference between true frame;LmaskFor segmentation loss, indicated using average two-value cross entropy;α1With α2For three kinds of losses of balance
Proportionality coefficient.
Further, the deeply learning network includes sequentially connected 8 × 8 convolutional layer, 4 × 4 convolutional layers, 3 × 3
Convolutional layer, for the output tool of 3 × 3 convolutional layer there are two branch, a branch obtains the excellent of 11 dimensions by the 11 full articulamentums of dimension
Potential function A (a, s;θ, a), another branch obtain state value function V (s by the 1 full articulamentum of dimension;θ, β), wherein θ is shared
Convolution layer parameter, α, β are the respective full connection layer parameters of Liang Ge branch, and a is that detection block adjusts behavior, and s is deeply study
The input of network;Advantage function is added with state value function and obtains Q function, the reward value of each behavior is calculated by Q function.
Q (s, a;θ, α, β)=V (s;θ, β)+A (a, s;θ, α).
Further, in the deeply learning network, the reward value r's of current iteration is expressed as follows:
R (s, a)=(IoU (w 'i, gi)-IoU(wi, gi))+λb′/B′
Wherein, first item IoU (w 'i, gi)-IoU(wi, gi) it is traditional reward item, Section 2 λ b '/B ' is to constrain
Detection block size and increased regular terms, wiWith w 'iRespectively indicate detection block of target i before and after behavior a, giIt indicates
True frame, the cross sectional area of detection block and true frame of the b ' expression after behavior a, inspection of the B ' expression after behavior a
Frame area is surveyed, cross sectional area is sought in IoU expression, and λ be that control rewards the scale factor of item and regular terms (λ occurrence is by testing
It is determined when parameter adjusts, generally takes 1~10).
Termination behavior is an additional behavior, it will not move detection block, only judges that optimal result is in intensified learning
No to be found, its reward value is defined as follows:
Wherein, τ is friendship and the threshold value decision reward of ratio is positive and negative, and η is corresponding reward value.
According to Q function housing choice behavior a, Q, (s, a) function representation is currently cumulative with the following reward value.
A=argmaxaQ (s, a)
Q (s, a)=r+ γ maxa' Q (s ', a ')
The loss function loss (θ) of training Q function is expressed as follows:
Loss (θ)=E (r+ γ maxa' Q (s ', a ', θ)-Q (s, a, θ))
Wherein, θ is the parameter of deeply learning network, and s and a are the defeated of current iteration deeply learning network respectively
Enter and adjust behavior with detection block, s ' and a ' are input and the detection block adjustment row of next iteration deeply learning network respectively
For, Q (s, a, θ) be all reward values that current iteration starts and, Q (s ', a ', θ) be since next iteration with lottery
It encourages value and r is the reward value of current iteration, and γ is discount factor, and E is to take expectation to the penalty values under all iteration.
Further, in the deeply learning network, in order to improve the learning efficiency of parameter θ, in the following ways:
(a) in order to promote study stability, target network is introduced, it separates with online network, updates in each iteration
Online network, updates target network at intervals;
(b) in order to avoid falling into local minimum, using ε-greedy strategy as action strategy;
(c) to solve the problems, such as data dependence, use experience plays back (experience replay), (s, a, r, s ') quilt
It stores in the buffer, in training, from the sample of fixed quantity is randomly choosed in caching to reduce the correlation between data.
Further, the deeply learning network, using dueling DQN structure, the structure is in the Decision Evaluation phase
Between can quickly identify correct behavior.As the Q network of mark, dueling structured training only needs backpropagation, does not need
Supervised learning or algorithm modification can estimate V (s automatically;θ, β) and A (a, s;θ, α).
Further, the single Attitude estimation network will test human body binary mask and cascade gold that network obtains
Word tower network (CPN) combines, and carries out human body attitude estimation, and the loss function of single Attitude estimation network is as follows:
L=Linf+kLmask
Wherein LinfFor the single posture of prediction and the error term of true posture, LmaskFor indicate prediction single posture with
The regular terms of human body binary mask error, k be balance two scale factor (k is obtained according to practical experience, generally take 1~
5);Lmask=∑pLp, p is human joint points number, wherein
Wherein,Indicate that for p node in the predicted value of the position l, l is the maximum position of activation value in activation figure in activation figure;
mlIt is the human body binary mask on the position l, 1 indicates to indicate in human region, 0 in background area;If node is not in human body
As a result region pays for, otherwise loss function is unaffected.
Further, more personage's Attitude estimation model training stages are calculated using GPU.
Preferably, the training details of detection network are as follows: α in loss function1With α24.0 and 10.0 are taken respectively.Whole network
Use momentum for 0.9 stochastic gradient descent algorithm, weight decays to 0.0005.Preceding 60,000 iteration, learning rate 0.01, rear 2
Ten thousand iteration, learning rate 0.001.In every batch of data, take 48 positive samples from 4 trained pictures, 48 negative samples
This is from mixed and disorderly background.In Qualify Phase, threshold confidence is set as 0.7, friendship used for positioning and ratio is set as 0.6.
In the calibration process learnt based on deeply, 10,000 data are cached as one, the number of batch of data
It is 32 according to amount.λ in loss function takes 1~10.In the experimental stage, ε-greedy strategy is used.In training, training 5,000 time
Afterwards, ε drops to 0.05 from 0.3.Discount factor γ is 0.9.
In the single Attitude estimation stage, the k in loss function is set as 0.4, and model uses stochastic gradient descent algorithm, initially
Learning rate is 0.0005, every to have traversed 10 data set learning rates reduction half.Weight attenuation rate is 0.00005, and is used
Batch normalization (Batch Normalization).
Compared with the prior art, the device have the advantages that are as follows:
1. more personage's Attitude estimation methods proposed by the present invention based on mask perceived depth intensified learning increase detection
Accuracy rate.
2. mask information is used to eliminate negative effect brought by mixed and disorderly background information, and most according to reward function selection
Good behavior.
3. increasing regularization term in human body attitude estimation stages to punish the node outside human body contour outline.
4. more personage's Attitude estimation models are tested on MPII test set, Average Accuracy (mean Average
Precision, mAP) than prior art model 1.1 are improved, it is tested on MS-COCO test development data set, it is average accurate
Degree has reached 73.0.
Detailed description of the invention
Fig. 1 is more personage's Attitude estimation difficult point schematic diagrames provided in an embodiment of the present invention;
Fig. 2 is that more personage's Attitude estimation frames provided in an embodiment of the present invention based on mask perceived depth intensified learning show
It is intended to;
Fig. 3 is the activation figure in detection-phase detection block provided in an embodiment of the present invention;
Fig. 4 is the schematic diagram under different behaviors;
Fig. 5 is the schematic diagram of depth Q network provided in an embodiment of the present invention;
Fig. 6 is the Attitude estimation block schematic illustration of mask perception provided in an embodiment of the present invention;
Fig. 7 is the accuracy rate curve of state provided in an embodiment of the present invention and reward function;
Fig. 8 is the testing result in MPII data set provided in an embodiment of the present invention.
Specific embodiment
In order to more specifically describe the present invention, with reference to the accompanying drawing and specific embodiment is to technical solution of the present invention
It is described in detail.
More personage's Attitude estimation methods provided in this embodiment can obtain the personage position of on-fixed quantity in piece image
It sets and posture information, and can be applied to clinical analysis, human-computer interaction, the multimedia industries such as Activity recognition.
Multi-task learning network is based on using present embodiment and obtains detection and localization frame and mask, is learnt using deeply
Network calibration positioning finally carries out human body attitude estimation using personage of the single Attitude estimation network to detection block.Below with reference to
Attached drawing is illustrated this specific embodiment of the invention.
Fig. 1 is more personage's Attitude estimation difficult point schematic diagrames provided in an embodiment of the present invention, and (a) shows the number of personage in picture
Amount and position are unknown.(b) (c) (d), which is respectively indicated, blocks, and exchanges, and contact embodies the interaction between personage, (e) embodies
The relationship of detection block detection and human body attitude estimation.
Fig. 2 is that more personage's Attitude estimation frames provided in an embodiment of the present invention based on mask perceived depth intensified learning show
It is intended to, the acquisition detection block and personage's mask of multitask Network Synchronization calibrate positioning result using deeply learning network.Most
Eventually, the posture of the network-evaluated each personage of hourglass is utilized.Mask information is utilized in calibration and estimation stages.
Fig. 3 is the activation figure in detection-phase detection block provided in an embodiment of the present invention, and original image (a) is passed through convolution
Neural network obtains activation figure (b), it will be seen that mixed and disorderly background information equally produces high activation value in figure (b).
Fig. 3 (c) indicates that redundancy when using accurate foreground mask, in characteristic pattern will be removed.
Fig. 4 is the schematic diagram under different behaviors, respectively indicates scaling behavior, translation behavior, termination behavior, the ratio of width to height adjustment
4 class behavior of behavior.
Fig. 5 is the schematic diagram of depth Q network provided in an embodiment of the present invention, including sequentially connected 8 × 8 convolutional layer, 4 ×
4 convolutional layers, 3 × 3 convolutional layers, there are two branch, a branch is obtained the output tool of 3 × 3 convolutional layers by the 11 full articulamentums of dimension
Advantage function A (a, the s of 11 dimensions;θ, α), another branch obtains state value function V (s by the 1 full articulamentum of dimension;θ, β), wherein
θ is shared convolutional layer parameter, and α, β are the respective full connection layer parameters of Liang Ge branch, and a is that detection block adjusts behavior, and s is that depth is strong
Change the input of learning network;Advantage function is added with state value function and obtains Q function, each behavior is calculated by Q function
Reward value.
Fig. 6 is the Attitude estimation block schematic illustration of combination mask provided in an embodiment of the present invention, the single Attitude estimation
Network will test human body binary mask and cascade pyramid network (CPN) combination that network obtains, carry out human body attitude and estimate
Meter, the loss function of single Attitude estimation network are as follows:
L=Linf+kLmask
Wherein LinfFor the single posture of prediction and the error term of true posture, LmaskFor indicate prediction single posture with
The regular terms of human body binary mask error, k be balance two scale factor (k is obtained according to practical experience, generally take 1~
5);Lmask=∑pLp, p is human joint points number, wherein
Wherein,Indicate that for p node in the predicted value of the position l, l is the maximum position of activation value in activation figure in activation figure;
mlIt is the human body binary mask on the position l, 1 indicates to indicate in human region, 0 in background area;If node is not in human body
As a result region pays for, otherwise loss function is unaffected.
Fig. 7 is the accuracy rate curve of state provided in an embodiment of the present invention and reward function, and (a) is that the training of state is accurate
Rate curve is (b) the test accuracy rate curve of state, (c) is the training accuracy rate curve of reward function, (d) is reward function
Test accuracy rate curve.
More personage's Attitude estimations are carried out to image using the present embodiment, experimental result such as Fig. 8 institute on MPII data set
Show, (a) indicates prediction successfully as a result, (b) indicating the result of prediction of failure.From the result of prediction of failure, we can be total
(1) is born although detection method is improved, the method downward from item is still promised to undertake (early by early stage
Commitment influence).(2) our method appears in the feelings in estimation range and with less interaction suitable for personage
Condition.
Technical solution of the present invention and beneficial effect is described in detail in above-described specific embodiment, Ying Li
Solution is not intended to restrict the invention the foregoing is merely presently most preferred embodiment of the invention, all in principle model of the invention
Interior done any modification, supplementary, and equivalent replacement etc. are enclosed, should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning, which is characterized in that this method includes
Following steps:
(1) construct more personage's Attitude estimation models: more personage's Attitude estimation models by acquisition detection block and mask detection
Network, the deeply learning network for improving positioning accuracy and the sub- network composition of three, single Attitude estimation network;
The detection network is specific as follows: the people in the detection block and detection block of original image is obtained by multi-task learning network
Body binary mask;
The deeply learning network is specific as follows: will test human body binary mask that network obtains and size adjusting be with
After the matched detection block image of the full articulamentum of deeply learning network is multiplied, the input as deeply learning network;
The output of deeply learning network is the reward value that 11 kinds of detection blocks adjust behavior;The detection block adjustment behavior includes four
Class: scaling behavior, translation behavior, termination behavior, the ratio of width to height adjust behavior;The selection maximum behavior of reward value detects to adjust
The detection block Image Iterative newly obtained is inputted deeply learning network by frame, until the maximum behavior of reward value is termination row
To export the detection block after calibrating;
The single Attitude estimation network is specific as follows: the detection block image after mask and calibration is passed to single Attitude estimation net
Network obtains single posture;
(2) more personage's Attitude estimation models are trained using training sample;Image to be detected is inputted into trained more people
In object Attitude estimation model, personage's posture in all detection blocks of image to be detected is obtained.
2. as described in claim 1 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists
In the detection network uses two-stage processing mode: in the first stage, the spy of original image is extracted using deep layer residual error network
Sign figure simultaneously generates several candidate frames by RPN network;In second stage, candidate frame is passed to three branches and carries out multi-task learning,
Respectively obtain classification confidence, detection block offset, human body binary mask in detection block.
3. as claimed in claim 2 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists
In the detection network uses following associated losses function in each branch of second stage:
L=Lcls+α1Lbox+α2Lmask
Wherein, LclsFor Classification Loss, indicated using cross entropy;LboxIt is lost for positioning, using L1 norm measurement detection block and very
Difference between real frame;LmaskFor segmentation loss, indicated using average two-value cross entropy;α1With α2For the ratio of three kinds of losses of balance
Example coefficient.
4. as described in claim 1 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists
In, the deeply learning network include sequentially connected 8 × 8 convolutional layer, 4 × 4 convolutional layers, 3 × 3 convolutional layers, described 3 ×
There are two branch, a branches, and advantage function A (a, the s of 11 dimensions are obtained by the 11 full articulamentums of dimension for the output tool of 3 convolutional layers;θ,
α), another branch obtains state value function V (s by the 1 full articulamentum of dimension;θ, β), wherein θ is shared convolutional layer parameter, α, β
It is the respective full connection layer parameter of Liang Ge branch, a is that detection block adjusts behavior, and s is the input of deeply learning network;It will be excellent
Potential function is added with state value function obtains Q function, and the reward value of each behavior is calculated by Q function.
5. as claimed in claim 4 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists
In in the deeply learning network, the loss function loss (θ) of Q function is expressed as follows:
Loss (θ)=E (r+ γ maxa′Q (s ', a ', θ)-Q (s, a, θ))
Wherein, θ is the parameter of deeply learning network, s and a be respectively current iteration deeply learning network input and
Detection block adjusts behavior, and s ' and a ' are the input and detection block adjustment behavior of next iteration deeply learning network, Q respectively
(s, a, θ) be all reward values for starting of current iteration and, Q (s ', a ', θ) is all reward values since next iteration
It is the reward value of current iteration with, r, γ is discount factor, and E is to take expectation to the penalty values under all iteration.
6. as claimed in claim 5 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists
In in the deeply learning network, the reward value r's of current iteration is expressed as follows:
R (s, a)=(IoU (w 'i, gi)-IoU(wi, gi))+λb′/B′
Wherein, first item IoU (w 'i, gi)-IoU(wi, gi) it is traditional reward item, Section 2 λ b '/B ' is to constrain detection
Frame size and increased regular terms, wiWith w 'iRespectively indicate detection block of target i before and after behavior a, giIndicate true
Frame, the cross sectional area of detection block and true frame of the b ' expression after behavior a, detection block of the B ' expression after behavior a
Cross sectional area is sought in area, IoU expression, and λ is the scale factor of control reward item and regular terms.
7. as claimed in claim 5 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists
In in the deeply learning network, in order to improve the learning efficiency of parameter θ, in the following ways:
(a) in order to promote study stability, target network is introduced, it is separated with online network, is updated in each iteration online
Network updates target network at intervals;
(b) in order to avoid falling into local minimum, using ε-greedy strategy as action strategy;
(c) to solve the problems, such as data dependence, use experience playback, (s, a, r, s ') is stored in the buffer, in training, from
The sample of fixed quantity is randomly choosed in caching to reduce the correlation between data.
8. as described in claim 1 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists
In the deeply learning network, using dueling DQN structure, which can quickly identify just during Decision Evaluation
True behavior.
9. as described in claim 1 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists
In, the single Attitude estimation network will test the human body binary mask and cascade pyramid network integration that network obtains, into
The loss function of pedestrian's body Attitude estimation, single Attitude estimation network is as follows:
L=Linf+kLmask
Wherein LinfFor the single posture of prediction and the error term of true posture, LmaskFor the single posture and human body two for indicating prediction
The regular terms of system mask error, k are the scale factors for balancing two;Lmask=∑pLp, p is human joint points number;
Wherein,Indicate that for p node in the predicted value of the position l, l is the maximum position of activation value in activation figure in activation figure;mlIt is l
Human body binary mask on position, 1 indicates to indicate in human region, 0 in background area;If node not in human region,
As a result it pays for, otherwise loss function is unaffected.
10. as described in claim 1 based on more personage's Attitude estimation methods of mask perceived depth intensified learning, feature exists
In more personage's Attitude estimation model training stages are calculated using GPU.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810968949.9A CN109190537B (en) | 2018-08-23 | 2018-08-23 | Mask perception depth reinforcement learning-based multi-person attitude estimation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810968949.9A CN109190537B (en) | 2018-08-23 | 2018-08-23 | Mask perception depth reinforcement learning-based multi-person attitude estimation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109190537A true CN109190537A (en) | 2019-01-11 |
CN109190537B CN109190537B (en) | 2020-09-29 |
Family
ID=64919381
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810968949.9A Active CN109190537B (en) | 2018-08-23 | 2018-08-23 | Mask perception depth reinforcement learning-based multi-person attitude estimation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109190537B (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109766887A (en) * | 2019-01-16 | 2019-05-17 | 中国科学院光电技术研究所 | A kind of multi-target detection method based on cascade hourglass neural network |
CN109784296A (en) * | 2019-01-27 | 2019-05-21 | 武汉星巡智能科技有限公司 | Bus occupant quantity statistics method, device and computer readable storage medium |
CN110008915A (en) * | 2019-04-11 | 2019-07-12 | 电子科技大学 | The system and method for dense human body attitude estimation is carried out based on mask-RCNN |
CN110188219A (en) * | 2019-05-16 | 2019-08-30 | 复旦大学 | Deeply de-redundancy hash algorithm towards image retrieval |
CN110197163A (en) * | 2019-06-04 | 2019-09-03 | 中国矿业大学 | A kind of target tracking sample extending method based on pedestrian's search |
CN110210402A (en) * | 2019-06-03 | 2019-09-06 | 北京卡路里信息技术有限公司 | Feature extracting method, device, terminal device and storage medium |
CN110222636A (en) * | 2019-05-31 | 2019-09-10 | 中国民航大学 | The pedestrian's attribute recognition approach inhibited based on background |
CN110415332A (en) * | 2019-06-21 | 2019-11-05 | 上海工程技术大学 | Complex textile surface three dimensional reconstruction system and method under a kind of non-single visual angle |
CN110569719A (en) * | 2019-07-30 | 2019-12-13 | 中国科学技术大学 | animal head posture estimation method and system |
CN110866872A (en) * | 2019-10-10 | 2020-03-06 | 北京邮电大学 | Pavement crack image preprocessing intelligent selection method and device and electronic equipment |
CN111415389A (en) * | 2020-03-18 | 2020-07-14 | 清华大学 | Label-free six-dimensional object posture prediction method and device based on reinforcement learning |
CN111695457A (en) * | 2020-05-28 | 2020-09-22 | 浙江工商大学 | Human body posture estimation method based on weak supervision mechanism |
CN111738091A (en) * | 2020-05-27 | 2020-10-02 | 复旦大学 | Posture estimation and human body analysis system based on multi-task deep learning |
CN112052886A (en) * | 2020-08-21 | 2020-12-08 | 暨南大学 | Human body action attitude intelligent estimation method and device based on convolutional neural network |
CN112184802A (en) * | 2019-07-05 | 2021-01-05 | 杭州海康威视数字技术股份有限公司 | Calibration frame adjusting method and device and storage medium |
CN112241976A (en) * | 2019-07-19 | 2021-01-19 | 杭州海康威视数字技术股份有限公司 | Method and device for training model |
CN113012229A (en) * | 2021-03-26 | 2021-06-22 | 北京华捷艾米科技有限公司 | Method and device for positioning human body joint points |
CN113361570A (en) * | 2021-05-25 | 2021-09-07 | 东南大学 | 3D human body posture estimation method based on joint data enhancement and network training model |
CN113436633A (en) * | 2021-06-30 | 2021-09-24 | 平安科技(深圳)有限公司 | Speaker recognition method, speaker recognition device, computer equipment and storage medium |
CN113537070A (en) * | 2021-07-19 | 2021-10-22 | 中国第一汽车股份有限公司 | Detection method, detection device, electronic equipment and storage medium |
CN114143710A (en) * | 2021-11-22 | 2022-03-04 | 武汉大学 | Wireless positioning method and system based on reinforcement learning |
CN116721471A (en) * | 2023-08-10 | 2023-09-08 | 中国科学院合肥物质科学研究院 | Multi-person three-dimensional attitude estimation method based on multi-view angles |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150544A (en) * | 2011-08-30 | 2013-06-12 | 精工爱普生株式会社 | Method and apparatus for object pose estimation |
CN106780536A (en) * | 2017-01-13 | 2017-05-31 | 深圳市唯特视科技有限公司 | A kind of shape based on object mask network perceives example dividing method |
CN106780569A (en) * | 2016-11-18 | 2017-05-31 | 深圳市唯特视科技有限公司 | A kind of human body attitude estimates behavior analysis method |
CN106897697A (en) * | 2017-02-24 | 2017-06-27 | 深圳市唯特视科技有限公司 | A kind of personage and pose detection method based on visualization compiler |
CN106951512A (en) * | 2017-03-17 | 2017-07-14 | 深圳市唯特视科技有限公司 | A kind of end-to-end session control method based on hybrid coding network |
CN107392118A (en) * | 2017-07-04 | 2017-11-24 | 竹间智能科技(上海)有限公司 | The recognition methods of reinforcing face character and the system of generation network are resisted based on multitask |
US20180096478A1 (en) * | 2016-09-30 | 2018-04-05 | Siemens Healthcare Gmbh | Atlas-based contouring of organs at risk for radiation therapy |
CN107944443A (en) * | 2017-11-16 | 2018-04-20 | 深圳市唯特视科技有限公司 | One kind carries out object consistency detection method based on end-to-end deep learning |
US20180151090A1 (en) * | 2015-04-22 | 2018-05-31 | Jeffrey B. Matthews | Visual and kinesthetic method and educational kit for solving algebraic linear equations involving an unknown variable |
CN108256489A (en) * | 2018-01-24 | 2018-07-06 | 清华大学 | Behavior prediction method and device based on deeply study |
CN108304795A (en) * | 2018-01-29 | 2018-07-20 | 清华大学 | Human skeleton Activity recognition method and device based on deeply study |
-
2018
- 2018-08-23 CN CN201810968949.9A patent/CN109190537B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150544A (en) * | 2011-08-30 | 2013-06-12 | 精工爱普生株式会社 | Method and apparatus for object pose estimation |
US20180151090A1 (en) * | 2015-04-22 | 2018-05-31 | Jeffrey B. Matthews | Visual and kinesthetic method and educational kit for solving algebraic linear equations involving an unknown variable |
US20180096478A1 (en) * | 2016-09-30 | 2018-04-05 | Siemens Healthcare Gmbh | Atlas-based contouring of organs at risk for radiation therapy |
CN106780569A (en) * | 2016-11-18 | 2017-05-31 | 深圳市唯特视科技有限公司 | A kind of human body attitude estimates behavior analysis method |
CN106780536A (en) * | 2017-01-13 | 2017-05-31 | 深圳市唯特视科技有限公司 | A kind of shape based on object mask network perceives example dividing method |
CN106897697A (en) * | 2017-02-24 | 2017-06-27 | 深圳市唯特视科技有限公司 | A kind of personage and pose detection method based on visualization compiler |
CN106951512A (en) * | 2017-03-17 | 2017-07-14 | 深圳市唯特视科技有限公司 | A kind of end-to-end session control method based on hybrid coding network |
CN107392118A (en) * | 2017-07-04 | 2017-11-24 | 竹间智能科技(上海)有限公司 | The recognition methods of reinforcing face character and the system of generation network are resisted based on multitask |
CN107944443A (en) * | 2017-11-16 | 2018-04-20 | 深圳市唯特视科技有限公司 | One kind carries out object consistency detection method based on end-to-end deep learning |
CN108256489A (en) * | 2018-01-24 | 2018-07-06 | 清华大学 | Behavior prediction method and device based on deeply study |
CN108304795A (en) * | 2018-01-29 | 2018-07-20 | 清华大学 | Human skeleton Activity recognition method and device based on deeply study |
Non-Patent Citations (4)
Title |
---|
KOIZUMI, YUMA 等: "DNN-BASED SOURCE ENHANCEMENT SELF-OPTIMIZED BY REINFORCEMENT LEARNING USING SOUND QUALITY MEASUREMENTS", 《2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 * |
YAN TIAN 等: "Canonical Locality Preserving Latent Variable Model for Discriminative Pose Inference", 《IMAGE AND VISION COMPUTING》 * |
卢湖川 等: "目标跟踪算法综述", 《模式识别与人工智能》 * |
苏延超 等: "图像和视频中基于部件检测器的人体姿态估计", 《电子与信息学报》 * |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109766887B (en) * | 2019-01-16 | 2022-11-11 | 中国科学院光电技术研究所 | Multi-target detection method based on cascaded hourglass neural network |
CN109766887A (en) * | 2019-01-16 | 2019-05-17 | 中国科学院光电技术研究所 | A kind of multi-target detection method based on cascade hourglass neural network |
CN109784296A (en) * | 2019-01-27 | 2019-05-21 | 武汉星巡智能科技有限公司 | Bus occupant quantity statistics method, device and computer readable storage medium |
CN110008915A (en) * | 2019-04-11 | 2019-07-12 | 电子科技大学 | The system and method for dense human body attitude estimation is carried out based on mask-RCNN |
CN110008915B (en) * | 2019-04-11 | 2023-02-03 | 电子科技大学 | System and method for estimating dense human body posture based on mask-RCNN |
CN110188219A (en) * | 2019-05-16 | 2019-08-30 | 复旦大学 | Deeply de-redundancy hash algorithm towards image retrieval |
CN110188219B (en) * | 2019-05-16 | 2023-01-06 | 复旦大学 | Depth-enhanced redundancy-removing hash method for image retrieval |
CN110222636A (en) * | 2019-05-31 | 2019-09-10 | 中国民航大学 | The pedestrian's attribute recognition approach inhibited based on background |
CN110210402A (en) * | 2019-06-03 | 2019-09-06 | 北京卡路里信息技术有限公司 | Feature extracting method, device, terminal device and storage medium |
CN110197163A (en) * | 2019-06-04 | 2019-09-03 | 中国矿业大学 | A kind of target tracking sample extending method based on pedestrian's search |
CN110197163B (en) * | 2019-06-04 | 2021-02-12 | 中国矿业大学 | Target tracking sample expansion method based on pedestrian search |
CN110415332A (en) * | 2019-06-21 | 2019-11-05 | 上海工程技术大学 | Complex textile surface three dimensional reconstruction system and method under a kind of non-single visual angle |
CN112184802A (en) * | 2019-07-05 | 2021-01-05 | 杭州海康威视数字技术股份有限公司 | Calibration frame adjusting method and device and storage medium |
CN112184802B (en) * | 2019-07-05 | 2023-10-20 | 杭州海康威视数字技术股份有限公司 | Calibration frame adjusting method, device and storage medium |
CN112241976A (en) * | 2019-07-19 | 2021-01-19 | 杭州海康威视数字技术股份有限公司 | Method and device for training model |
CN110569719B (en) * | 2019-07-30 | 2022-05-17 | 中国科学技术大学 | Animal head posture estimation method and system |
CN110569719A (en) * | 2019-07-30 | 2019-12-13 | 中国科学技术大学 | animal head posture estimation method and system |
CN110866872A (en) * | 2019-10-10 | 2020-03-06 | 北京邮电大学 | Pavement crack image preprocessing intelligent selection method and device and electronic equipment |
CN110866872B (en) * | 2019-10-10 | 2022-07-29 | 北京邮电大学 | Pavement crack image preprocessing intelligent selection method and device and electronic equipment |
CN111415389A (en) * | 2020-03-18 | 2020-07-14 | 清华大学 | Label-free six-dimensional object posture prediction method and device based on reinforcement learning |
WO2021184530A1 (en) * | 2020-03-18 | 2021-09-23 | 清华大学 | Reinforcement learning-based label-free six-dimensional item attitude prediction method and device |
CN111415389B (en) * | 2020-03-18 | 2023-08-29 | 清华大学 | Label-free six-dimensional object posture prediction method and device based on reinforcement learning |
CN111738091A (en) * | 2020-05-27 | 2020-10-02 | 复旦大学 | Posture estimation and human body analysis system based on multi-task deep learning |
CN111695457A (en) * | 2020-05-28 | 2020-09-22 | 浙江工商大学 | Human body posture estimation method based on weak supervision mechanism |
CN111695457B (en) * | 2020-05-28 | 2023-05-09 | 浙江工商大学 | Human body posture estimation method based on weak supervision mechanism |
CN112052886B (en) * | 2020-08-21 | 2022-06-03 | 暨南大学 | Intelligent human body action posture estimation method and device based on convolutional neural network |
CN112052886A (en) * | 2020-08-21 | 2020-12-08 | 暨南大学 | Human body action attitude intelligent estimation method and device based on convolutional neural network |
CN113012229A (en) * | 2021-03-26 | 2021-06-22 | 北京华捷艾米科技有限公司 | Method and device for positioning human body joint points |
CN113361570A (en) * | 2021-05-25 | 2021-09-07 | 东南大学 | 3D human body posture estimation method based on joint data enhancement and network training model |
CN113361570B (en) * | 2021-05-25 | 2022-11-01 | 东南大学 | 3D human body posture estimation method based on joint data enhancement and network training model |
CN113436633A (en) * | 2021-06-30 | 2021-09-24 | 平安科技(深圳)有限公司 | Speaker recognition method, speaker recognition device, computer equipment and storage medium |
CN113436633B (en) * | 2021-06-30 | 2024-03-12 | 平安科技(深圳)有限公司 | Speaker recognition method, speaker recognition device, computer equipment and storage medium |
CN113537070A (en) * | 2021-07-19 | 2021-10-22 | 中国第一汽车股份有限公司 | Detection method, detection device, electronic equipment and storage medium |
CN114143710B (en) * | 2021-11-22 | 2022-10-04 | 武汉大学 | Wireless positioning method and system based on reinforcement learning |
CN114143710A (en) * | 2021-11-22 | 2022-03-04 | 武汉大学 | Wireless positioning method and system based on reinforcement learning |
CN116721471A (en) * | 2023-08-10 | 2023-09-08 | 中国科学院合肥物质科学研究院 | Multi-person three-dimensional attitude estimation method based on multi-view angles |
Also Published As
Publication number | Publication date |
---|---|
CN109190537B (en) | 2020-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109190537A (en) | A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning | |
CN110135459B (en) | Zero sample classification method based on double-triple depth measurement learning network | |
CN108596327B (en) | Seismic velocity spectrum artificial intelligence picking method based on deep learning | |
CN110033473B (en) | Moving target tracking method based on template matching and depth classification network | |
CN109101938B (en) | Multi-label age estimation method based on convolutional neural network | |
CN109101865A (en) | A kind of recognition methods again of the pedestrian based on deep learning | |
CN106951825A (en) | A kind of quality of human face image assessment system and implementation method | |
CN108961308B (en) | Residual error depth characteristic target tracking method for drift detection | |
CN109151995B (en) | Deep learning regression fusion positioning method based on signal intensity | |
CN106023257A (en) | Target tracking method based on rotor UAV platform | |
CN110716792B (en) | Target detector and construction method and application thereof | |
CN111259735B (en) | Single-person attitude estimation method based on multi-stage prediction feature enhanced convolutional neural network | |
CN108492298A (en) | Based on the multispectral image change detecting method for generating confrontation network | |
CN109598220A (en) | A kind of demographic method based on the polynary multiple dimensioned convolution of input | |
CN110210380B (en) | Analysis method for generating character based on expression recognition and psychological test | |
CN108717548B (en) | Behavior recognition model updating method and system for dynamic increase of sensors | |
CN112633257A (en) | Potato disease identification method based on improved convolutional neural network | |
CN107492114A (en) | The heavy detecting method used when monocular is long during the tracking failure of visual tracking method | |
CN111144462B (en) | Unknown individual identification method and device for radar signals | |
CN108182410A (en) | A kind of joint objective zone location and the tumble recognizer of depth characteristic study | |
CN115346272A (en) | Real-time tumble detection method based on depth image sequence | |
CN110516700B (en) | Fine-grained image classification method based on metric learning | |
CN109583456B (en) | Infrared surface target detection method based on feature fusion and dense connection | |
CN114154530A (en) | Training method and device for atrial fibrillation detection model of electrocardio timing signals | |
CN115511012B (en) | Class soft label identification training method with maximum entropy constraint |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20210929 Address after: 310000 Room 401, building 2, No.16, Zhuantang science and technology economic block, Xihu District, Hangzhou City, Zhejiang Province Patentee after: Hangzhou yunqi smart Vision Technology Co., Ltd Address before: 310018, No. 18 Jiao Tong Street, Xiasha Higher Education Park, Hangzhou, Zhejiang Patentee before: ZHEJIANG GONGSHANG University |